Verifying memory accesses with dpugrind
dpugrind tries to provide the same memory checks as valgrind. It can warn of:
uninitialized values when reading memory
accesses to not-allocated memory
stack overflows and underflows
accesses to another thread stack area
unaligned accesses when doing DMA transfers
Notes
dpugrind takes the same input files as dputrace (cf Using dputrace tool to know how to generate the trace files).
It also provides the same Watch Mode, with the -w option.
Example
We are going to consider a rather simple, but faulty, DPU program:
int array[5] = { 1, 2, 3, 4, 5 };
__attribute__((noinline))
int foo(int x) {
int r = x;
for (int i = 0; i < 6; ++i) {
r *= array[i];
}
return r;
}
int main() {
return foo(4) + 2;
}
The array has 5 elements, but the program tries to access the 6th element.
Let’s build the program:
dpu-upmem-dpurte-clang dpugrind_example.c -o dpugrind_example -O2
We now need to execute this program. Here, we are using dpu-lldb as a runner. Any host application based on the Host API would work.
From a terminal, we are first setting UPMEM_TRACE_DIR, then running the program, making sure that we run it on a functional
simulator:
export UPMEM_TRACE_DIR=.
export UPMEM_PROFILE_BASE=backend=simulator
We can now launch dpu-lldb:
file dpugrind_example
process launch
exit
The exit value is not the expected one, because of the out-of-bound access. We can use dpugrind to try and find the error.
A new file, trace-0000-00, is available in the UPMEM_TRACE_DIR directory (in this example, the current directory).
Now let’s execute dpugrind:
dpugrind -i trace-0000-00
Invalid WRAM read of size 4 starting at 0x00000458
by thread 00
at 0x800001a0: foo
by 0x80000098: main
dpugrind correctly detected the expected error and provide some information concerning the error location.
Limitations
trace input files can only be generated when using a functional simulator
out-of-bound array accesses may not be detected when another variable is located just after the array
currently, memory areas obtained with
buddy_alloc.hare not correctly tracked