Using dputrace tool =================== ``dputrace`` provides an exact representation of the low-level execution of a DPU program. Each executed assembly instruction is printed, with detailed information on the different registers modified by the instruction. This tool is mainly a low-level/assembly debugging helper. How to generate trace files ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Trace files can only be generated by a DPU running on the functional simulator backend. By setting the environment variable ``UPMEM_TRACE_DIR``, the simulator will output a binary trace file, named ``trace-XXXX-YY``, where ``XXXX`` is the DPU rank ID and ``YY`` the DPU ID in the rank, in the directory specified by ``UPMEM_TRACE_DIR``. Theses trace files are directly usable by the ``dputrace`` tool. Example ~~~~~~~ We are going to consider a rather simple DPU program: .. literalinclude:: ../../../endtests/documentation/dputrace_example/dputrace_example.c :language: c Let's build the program: .. literalinclude:: ../../../endtests/documentation/dputrace_example/dputrace_example.compile We now need to execute this program. Here, we are using ``dpu-lldb`` as a runner. Any host application based on the Host API would work. From a terminal, we are first setting ``UPMEM_TRACE_DIR``, then running the program, making sure that we run it on a functional simulator: .. literalinclude:: ../../../endtests/documentation/dputrace_example/dputrace_example.exports We can now launch ``dpu-lldb``: .. literalinclude:: ../../../endtests/documentation/dputrace_example/dputrace_example.lldb_script A new file, ``trace-0000-00``, is available in the ``UPMEM_TRACE_DIR`` directory (in this example, the current directory). Now let's execute ``dputrace``, focusing on the thread 1 (hence the ``-t 1`` option): .. literalinclude:: ../../../endtests/documentation/dputrace_example/dputrace_example.command .. literalinclude:: ../../../endtests/documentation/dputrace_example/dputrace_example.output_reference Each trace follows this schema:: Trace counter | Thread @ PC | Flags | Disassembled instruction | Function information The trace counter should not be interpreted as a cycle counter, as the functional simulator is not cycle-accurate. It is mainly used as a potential marker, to search and find a particular point from an execution program to another. Currently, the only flag is the "replay" flag, represented by an ``*``. It indicates that the instruction was played twice, because of hardware constraints (cf :ref:`scheduling-explanation-label` for more details). The disassembled instruction embeds the value of the input registers before executing the instruction, and the value of the output register, if any, after executing the instruction. When the symbols are available, ``dputrace`` will try to give some context from the function calls, and provide some information when entering and exiting a function. Host traces ~~~~~~~~~~~ When running ``dputrace`` without specifying a thread, some other traces are displayed, describing the host actions. Here is a fast description of the different possibles traces: Loading program:: [ LOAD PROGRAM ] main.dpu Signals that a program is going to be loaded in the DPU, giving the path to the executable. Writing IRAM instruction:: [ WRITE IRAM ][CI@0x80000000] jnz id, 0x80000020 Signals that an instruction has been written by the host, giving the instruction address and its disassembled form. Watch mode ~~~~~~~~~~ In the previous example, the execution program finished before starting to use ``dputrace``. Sometimes, this is not a flexible workflow (for example when running a very long application, or when using ``lldb`` to have an interactive execution). In these cases, one can use the ``-w`` option that enables the watch mode. In that mode, ``dputrace`` will not exit at the end of the file. Instead, it will wait for more input and process it on the fly. Advanced options ~~~~~~~~~~~~~~~~ Some parts of the DPU execution are hidden from the``dputrace`` output. They correspond to programs executed by the Host API, which are usually of no interest for an end-user. One can manage this setting with the ``-enable-event`` and ``-disable-event`` options. A list of the different possible values can be printed with the ``-list-events`` option.