Examples of an assembly program

This section gives concrete examples of how to develop, build, and test pure assembly programs for the DPU.

Saying hello to the world

This first illustration is similar to the “hello world” introductory program:

  • Declare a string equal to “hello world”

  • Compute the checksum of these characters and store the result into r0

The source code

The string declaration is achieved with the help of .string directive. The code hereafter defines a “global variable” hello equal to this string. This variable must reside in the data section of the program (which automatically places it in WRAM):

.data
.global $hello
hello:
    .string "hello world"

The main function loops on each character of the string until finding zero. This main routine is in the text section of the program (which automatically places it in IRAM) and marked as global, so that the RTE can bootstrap it.

The hello world program (helloworld.S) is:

// Hello world, written in assembly: computes the checksum of "hello world"

.text
.globl __bootstrap
__bootstrap:
#define stringPointer r1
#define checksum r0
	move checksum, 0
	move stringPointer, hello

#define currentCharacter r2
	checksum_loop:
		// Load the current character
		lbu  currentCharacter , stringPointer , 0
		// And exit if this character is 0
		jz   currentCharacter, end_of_loop
		add checksum, checksum, currentCharacter
		// Move to next character
		add stringPointer, stringPointer, 1, true,	checksum_loop

	end_of_loop:
		stop

.data
.globl hello
hello:
    .string "hello world"

Building the program

Let’s assemble the program as usual using dpu-upmem-dpurte-clang:

dpu-upmem-dpurte-clang -nostartfiles -o helloworld helloworld.S

Notice that we compiled with -nostartfiles to define the entry point ourselves (__bootstrap).

Running the program

Let’s verify that the code above is correct, using dpu-lldb:

file helloworld
process launch --stop-at-entry
breakpoint set --source-pattern-regexp "stop"
process continue
register read r0
exit

When the program has terminated, verify that the return register is equal to “hello world“‘s checksum (i.e. 45c hexa-decimal):

r0 = 0x0000045c

Placing numerical values in memory

Many programs need some variables in memory (some static variables) to operate.

The basic directives to do so are .byte, .short, .long and quad. Such variables must be declared in a .data section of the code.

The next program uses two variables ‘a’ and ‘b’, fetched from the WRAM, and stores the sum of ‘a’ and ‘b’ into memory:

.text
.globl __bootstrap
__bootstrap:
    move r16, values
    lw r0,r16,0
    lw r1,r16,4
    add r0, r0,r1
    sw r16,8, r0
    stop

.data
.globl values
values:
.long 0x12345678    //a
.long 0x9abcdef0    //b
.long 0             // s=a+b

Once the program is built:

dpu-upmem-dpurte-clang -nostartfiles -o trivial_add trivial_add.S

The debugger easily allows to verify the result:

file trivial_add
process launch --stop-at-entry
breakpoint set --source-pattern-regexp "stop"
process continue
parray 3 &values
exit

The stored result is the sum of ‘a’ and ‘b’.

  (void *) [0] = 0x12345678
  (void *) [1] = 0x9abcdef0
  (void *) [2] = 0xacf13568

Useful common linker directives

The list of assembler directives, along with a comprehensive description can be found in Assembler syntax. The most commonly used are described hereafter.

Creating a static buffer of data

The “zero” directives allow creating a static buffer of data with an initial value. The DPU assembler repeats the specified value (zero by default) a certain number of times.

The DPU assembler defines:

  • .zero to create a buffer of bytes

  • .fill to create a buffer of words

The example below creates a static buffer of 7 bytes, equal to 42 hexa-decimal and returns puts the two words of this buffer into r0 and r1:

// Illustrates the creation and reference of a static
// buffer of memory in assembler.

.data

.globl static_buffer
.align 4
static_buffer:
	.fill 7, 1, 0x42
	.zero 1

.text
.globl __bootstrap
__bootstrap:
    lw r0 , zero, static_buffer
    move r1, 4
    lw r1 , r1, static_buffer
    stop

Build the program:

dpu-upmem-dpurte-clang -nostartfiles static_buffer.S -o static_buffer

Now, execute the program and verify that the registers and the memory match the expectations. Notice that the host is a little-endian machine in this test, implying that the 8th null byte in the buffer goes to the most significant bits of r1:

file static_buffer
process launch --stop-at-entry
breakpoint set --source-pattern-regexp "stop"
process continue
register read
parray 2 &static_buffer
exit

As expected, dpu-upmem-dpurte-clang places exactly 7 bytes in memory:

        r0 = 0x42424242
        r1 = 0x00424242
  (void *) [0] = 0x42424242
  (void *) [1] = 0x00424242

Notice that the buffer is aligned on 4 (bytes), which is necessary to perform a load word from it. If the buffer would have been used for DMA purposes, it would have needed to be aligned on 8 (bytes).

Useful tips and tricks

dpuasmdoc

This utility is the fastest way to remind the assembly syntax. By typing, for example:

dpuasmdoc ld

One can get the syntax of all the DPU instructions containing the keyword ld:

ld endian:e dc ra off:s24
	let @a = (ra + off)
	dc = (Load 8 bytes from WRAM at address @a with endianness endian)
ld endian:e dc sa off:s24
	cc = (ra & 0xffff) + off + 8 - (ra >> 16)
	if (const_cc_ge0 cc) then
		let @a = (ra & 0xffff) + off & 0xfff8
		dc = (Load 8 bytes from WRAM at address @a with endianness endian)
		raise exception(_memory_fault)
	else
		let @a = (ra & 0xffff) + off
		dc = (Load 8 bytes from WRAM at address @a with endianness endian)
ld dc ra off:s24
	let @a = (ra + off)
	dc = (Load 8 bytes from WRAM at address @a with endianness endian)
ldma ra rb immDma:u8
	let @w = (ra & 0xfffff8)
	let @m = (rb & 0xfffffff8)
	let N = (1 + (immDma:U32 + (ra >> 24) & 0xff) & 0xff) << 3
	Load N bytes from MRAM at address @m into WRAM at address @w
ldmai ra rb immDma:u8
	let @i = (ra & 0xfffff8)
	let @m = (rb & 0xfffffff8)
	let N = (1 + (immDma:U32 + (ra >> 24) & 0xff) & 0xff) << 3
	Load N bytes from MRAM at address @m into IRAM at address @w
lds dc ra off:s24
	cc = (ra & 0xffff) + off + 8 - (ra >> 16)
	if (const_cc_ge0 cc) then
		let @a = (ra & 0xffff) + off & 0xfff8
		dc = (Load 8 bytes from WRAM at address @a with endianness endian)
		raise exception(_memory_fault)
	else
		let @a = (ra & 0xffff) + off
		dc = (Load 8 bytes from WRAM at address @a with endianness endian)

To get a help on a specific keyword (e.g. the ld instruction specifically), use:

dpuasmdoc -match ld
ld endian:e dc ra off:s24
	let @a = (ra + off)
	dc = (Load 8 bytes from WRAM at address @a with endianness endian)
ld endian:e dc sa off:s24
	cc = (ra & 0xffff) + off + 8 - (ra >> 16)
	if (const_cc_ge0 cc) then
		let @a = (ra & 0xffff) + off & 0xfff8
		dc = (Load 8 bytes from WRAM at address @a with endianness endian)
		raise exception(_memory_fault)
	else
		let @a = (ra & 0xffff) + off
		dc = (Load 8 bytes from WRAM at address @a with endianness endian)
ld dc ra off:s24
	let @a = (ra + off)
	dc = (Load 8 bytes from WRAM at address @a with endianness endian)