An example of debugging a DPU booted by a host application

Context

The following example will show how to attach to a DPU booted by a host application on the several debug scenarios mentioned before.

For all these examples we are going to use the following source files:

The DPU program (in host_debug_example_dpu.c):

#include <mram.h>

__mram uint64_t wait_value;

int main() {

  __dma_aligned volatile uint64_t wait = wait_value;
  while (wait) // loops forever
    ;

  *((int *)0xffffffff) = 0; // generates a memory fault

  return 0;
}

It contains two bugs:

  • it will loop forever if the host set the wait variable to 1

  • it will generate a memory fault when trying to write at a forbidden address

The host application (in host_debug_example_host.c):

#include <assert.h>
#include <dpu.h>
#include <stdlib.h>

#ifndef DPU_BINARY
#define DPU_BINARY "./host_debug_example_dpu"
#endif

int main(int argc, char **argv) {
  struct dpu_set_t set;
  struct dpu_set_t dpu;

  assert(argc == 2 && "usage: ./host_debug_example <0|1>");

  uint64_t wait = atoi(argv[1]);

  DPU_ASSERT(dpu_alloc(1, NULL, &set));

  DPU_FOREACH(set, dpu) {
    DPU_ASSERT(dpu_load(dpu, DPU_BINARY, NULL));
    DPU_ASSERT(dpu_copy_to(dpu, "wait_value", 0, &wait, sizeof(wait)));
    dpu_launch(dpu, DPU_SYNCHRONOUS); // No assert, because we don't want to stop the program when the DPU fails
  }

  DPU_ASSERT(dpu_free(set));

  return 0;
}

Let’s compile both programs:

dpu-upmem-dpurte-clang -O2 -o host_debug_example_dpu host_debug_example_dpu.c
gcc --std=c99 -O2 -g -o host_debug_example_host host_debug_example_host.c `dpu-pkg-config --cflags --libs dpu`

We are now ready for the scenario.

Attaching before the start of an application

To attach before the start of an application, we need to use the dpu_attach command before the host’s call to dpu_launch. Then we will be able to check the MRAM content:

file host_debug_example_host
settings set target.run-args 1
breakpoint set --source-pattern-regexp "dpu_launch"
process launch
dpu_list -v
dpu_attach 0.0.0
process status
memory read 0x08000000 -c 4
dpu_detach
exit
(lldb) memory read 0x08000000 -c 4
0x08000000: 01 00 00 00                                      ....

Attaching when the DPU is booting

To attach when a DPU is booting, we need to use the dpu_attach_on_boot command. Then we will be able to check the content of the MRAM and let the DPU run until the main function:

file host_debug_example_host
settings set target.run-args 1
process launch --stop-at-entry
dpu_attach_on_boot
memory read 0x08000000 -c 4
breakpoint set -n main
process continue
thread backtrace
dpu_detach
exit
(lldb) memory read  -c 4
0x08000000: 01 00 00 00                                      ....
(lldb) breakpoint set -n main
Breakpoint 1: where = host_debug_example_dpu`main at host_debug_example_dpu.c:7:42, address = 
(lldb) process continue
Process 1000000 resuming
Process 1000000 stopped
* thread #1, name = 'DPUthread0', stop reason = breakpoint 1.1
    frame #0:  host_debug_example_dpu`main at host_debug_example_dpu.c:7:42
   4   	
   5   	int main() {
   6   	
-> 7   	  __dma_aligned volatile uint64_t wait = wait_value;
    	                                         ^
   8   	  while (wait) // loops forever
   9   	    ;
   10  	

(lldb) thread backtrace
* thread #1, name = 'DPUthread0', stop reason = breakpoint 1.1
  * frame #0: 0x80000090 host_debug_example_dpu`main at host_debug_example_dpu.c:7:42
    frame #1: 0x80000078 host_debug_example_dpu`__bootstrap at crt0.c:36:5

Attaching when the DPU is running

To attach when a DPU is running, we need to use the dpu_attach command. Then we will be able to check why the DPU program does not end and fix it on the fly:

file host_debug_example_host
settings set target.run-args 1
process launch
dpu_list -v
dpu_attach 0.0.0
frame variable
expr wait = 0
process continue
dpu_detach
exit
(lldb) frame variable
(volatile uint64_t) wait = 1
(lldb) expr wait = 0
(volatile uint64_t) $0 = 0
(lldb) process continue
Process 1000000 resuming
Process 1000000 stopped
* thread #1, name = 'DPUthread0', stop reason = memory fault
    frame #0:  host_debug_example_dpu`main at host_debug_example_dpu.c:11:24
   8   	  while (wait) // loops forever
   9   	    ;
   10  	
-> 11  	  *((int *)) = 0; // generates a memory fault
    	                       ^
   12  	
   13  	  return 0;
   14  	}

Attaching after the end of the DPU program

To attach after the end of the DPU program, we need to use the dpu_attach command. Then we will be able to check the state of the threads and disassemble the instruction:

file host_debug_example_host
settings set target.run-args 0
breakpoint set --source-pattern-regexp "dpu_free"
process launch
dpu_list -v
dpu_attach 0.0.0
thread backtrace
disassemble
dpu_detach
exit
(lldb) thread backtrace
* thread #1, name = 'DPUthread0', stop reason = memory fault
  * frame #0: 0x800000a8 host_debug_example_dpu`main at host_debug_example_dpu.c:11:24
    frame #1: 0x80000050 host_debug_example_dpu`__bootstrap at crt0.c:36:5
(lldb) disassemble
host_debug_example_dpu`main:
    0x80000058 <+0>:  move   r0, 0x0
    0x80000060 <+8>:  add    r1, id8, 0x4e8
    0x80000068 <+16>: ldma   r1, r0, 0x0
    0x80000070 <+24>: ld     d0, r1, 0x0
    0x80000078 <+32>: sd     r22, 0x0, d0
    0x80000080 <+40>: move.s d2, 0x0
    0x80000088 <+48>: ld     d0, r22, 0x0
    0x80000090 <+56>: jneq   r1, r3, 0x80000088        ; <+48> at host_debug_example_dpu.c:8:10
    0x80000098 <+64>: jneq   r0, r2, 0x80000088        ; <+48> at host_debug_example_dpu.c:8:10
    0x800000a0 <+72>: move   r0, 0x0

Also, as mentionned when running:

./host_debug_example_host 0

It is also possible to attach to the DPU after the execution of the host application by using dpu-lldb-attach-dpu to get the same information (see section Attaching to a DPU without having a host application for more information on dpu-lldb-attach-dpu):

* thread #1, name = 'DPUthread0', stop reason = memory fault
    frame #0:  host_debug_example_dpu`main at host_debug_example_dpu.c:11:24
   8   	  while (wait) // loops forever
   9   	    ;
   10  	
-> 11  	  *((int *)) = 0; // generates a memory fault
    	                       ^
   12  	
   13  	  return 0;
   14  	}