Release notes
Latest
Version 2025.1.0
- Additions and improvements:
Implement udev cache (improves DPU allocation performance)
LLDB: reduce amount of memory transfer when transferring unaligned data during a debug session
dpu-diag: Add option to also disable DPUs which have IRAM or WRAM defects
- Bug fixes:
Fix some conditions in which a subset of valid DPUs would become unusable (avoid resetting faulty DPUs, do not allocate DPUs related to a non-functional Control Interface)
Fix memory leaks in corner cases
upmem_utilities: Add pyelftools as a dependency since it is needed for dpu_sections and dpu_statistics
Fix DPU toolchain file for CMake
- Documentation:
Improve documentation style
Clarify various comments in code
Clarify installation instructions
Operating system / DPU support matrix:
Operating system |
V1A DPUs |
V1B DPUs |
|---|---|---|
Debian 10 (1) |
Supported |
Supported |
Ubuntu 20.04 LTS (1) |
Supported |
Not Supported |
Ubuntu 22.04 LTS (2) |
Supported |
Supported |
Rocky 8 (1) |
Supported |
Supported |
(1) In future releases, all systems with a Linux kernel older than version 6.8 will be deprecated.
(2) In future releases, Ubuntu 22.04 LTS will only be supported with the Hardware enablement stack (HWE) and the kernel 6.8.
Previous
Version 2024.2.0
- Additions and improvements:
Implement WRAM access in parallel of the DPU execution
Add DPU v1B performance counters
Add chip versioning tags to DPU binaries
Fix lowlevel header files includes
IWYU users: dpu.h now exports most of the DPU API
Rely on dpu_types.h to get C standard integer types
- Utilities / dpu-diag:
Added
--no-update-vpdoptionImproved frequency measurement stability
- Utilities / upmem-dimm-configure:
Update MCU DB fields
General improvements
Functional simulator: remove broken nrDpusPerCi option
- Bug fixes:
Fix rare Control Interface issue on V1B
LLVM/Clang: minor bug fixes
- Run-time libraries:
sequential reader: fix correctness issue
Improvements to div32 and udiv32
mem-alloc: early clobber output variable
MCU firmware: fix utils alignment issue
- Documentation:
- Run-time library documentation now in doxygen format
Remove redundant dpurtldoc CLI tool
Use modern theme for doxygen sites
Support dark mode for documentation
Document Upmem PPA
Document UPMEM_PROFILE
Operating system / DPU support matrix:
Operating system |
V1A DPUs |
V1B DPUs |
|---|---|---|
Debian 10 |
Supported |
Supported |
Ubuntu 20.04 LTS |
Supported |
Not Supported |
Ubuntu 22.04 LTS |
Supported |
Supported |
Rocky 8 |
Supported |
Supported |
Version 2024.1.0
Add support for V1B UPMEM DIMMs on Ubuntu 22.04 and Debian 10
Fix performance degradation caused by Gather Data Sampling microcode mitigation Courtesy of Manuel Penschuck from Goethe University Frankfurt. For further performance improvements, check out his awesome work on his fork of the UPMEM backend (benchmarks)
Add
dpu_attach_firstcommand to the debuggerDrop support for Ubuntu 18.04
Drop support for CentOS 7
- Add documentation for:
DPU chips characteristics
Choosing the UPMEM chip version when compiling and running on simulator (cf. DPU Version Selection)
Configuring the UPMEM debugger port
- Bugfixes:
Memory leak in scatter-gather transfers
Memory leaks and sanitization in LLVM tools
Pyelftools version conflict in dpu_profiling
Wrong python libraries path on rocky8
Operating system / DPU support matrix:
Operating system |
V1A DPUs |
V1B DPUs |
|---|---|---|
Debian 10 |
Supported |
Supported |
Ubuntu 20.04 LTS |
Supported |
Not Supported |
Ubuntu 22.04 LTS |
Supported |
Supported |
Rocky 8 |
Supported |
Supported |
Version 2023.2.0
- Scatter-gather transfers update:
The
get_block_tstruct now needs an additionalargs_sizeparameter to store the size of the function contextThe SDK now copies and manages the lifetime of the
argscontext used by theget_block_func_tfunctionAdd C++ API for scatter-gather transfers
C++ API for scatter-gather transfers also accepts capturing lambdas
Add support for Ubuntu 22.04
Add support for Rocky 8
- Add documentation for:
Setting up permissions for users access UPMEM hardware and for profiling
Valid values for
nrDpusPerCiwith the functional simulatorUsing the clangd language server with the SDK
Fix for scatter-gather transfers in the functional simulator
Fix a bug in UFI config when counting faulty bits
Fix an edge-case compiler crash when using
-O1or higher withmram_readormram_write.
Version 2023.1.0
New APIs for scatter-gather memory transfers between host CPU and DPUs
Mutual exclusions extension : virtual mutexes and mutex pools
New APIs for partial support of unaligned MRAM accesses
Remove the dependency of dpu_profiling functions on UPMEM’s modified Linux perf package
Add compatibility of dpu_profiling functions with Perfetto UI
Add static assert to prevent usage of perfcounter_config with -pg option
Update the UPMEM driver to support Linux kernel 5.15
Firmware support for PCB P21F
Version 2021.4.0
Fix bug in calling convention when a DPU function has a large number of parameters
Fix bug with mux switch for sparse matrix MRAM transfer
Fix for putting DIMMs in power-saving mode
Static assertion when perfcounter_config is used with -pg option
Firmware support for PCB P21E
Version 2021.3.0
Upgrade compiler to LLVM-12.0.0
Add support for new P21D PCB
Update BIOS flashing procedure based on new flasher kit
Document functional simulator
Fix a corner case bug with Host <-> DPU MRAM transfer
Version 2021.2.0
New DPU profiling framework based on perfcounters
Fix Functional Simulator MRAM management when using multiple DPUs
Enhance Function Simulator instructions tracing
Enhance DPU 8-bit multiplication
Fix DPU 64-bit rotation
Fix MCU firmware bug
Update server installation documentation
Version 2021.1.0
Small improvements in Java host API (cf Java Library)
Upgrade compiler to LLVM-11.0.0
Fix various bugs in LLDB
Add
CLOCKS_PER_SECconstant in DPU libraryFix bugs in host API
Improve DPU program loading performance for multi-rank call
Fix bug in
dpugrindAdd documentation about C++/Java/Python APIs
Version 2020.5.0
Adding support for dual-socket servers
Adding an host Java API (cf Java Library)
Adding an host C++ API (cf C++ Host API)
Improving the performance of the Functional Simulator
Version 2020.4.0
Remove dependency on the JDK
DPU Runtime Library:
memcpy/memmove/memsetcan now be used with MRAM pointers as argumentsAdding asynchronism in the Host API:
dpu_push_xfernow has aDPU_XFER_ASYNCflag to copy asynchronously
dpu_broadcast_to, a new function with the same behavior asdpu_copy_to, but contrary to its predecessor, can be used asynchronously
dpu_callback, a new function to insert callback that will be called asynchronously during the execution flow of the DPUs.
Version 2020.3.0
Upgrade compiler to LLVM-10.0.0
Remove dependency on Python 2: only Python 3 is needed
Add Python host API (cf Python Library)
Add support for ubuntu 20.4
Remove support for debian 9
Update driver ABI (and version):
Gather sysfs attributes in a single directory
Remove region index from device nodes’ name
Host API improvement (parallelize transfer when using dpu_set_t of multiple ranks)
Add tool
dpu-profilingto unify all profiling features
Version 2020.2.0
New transparent management of the DPU SRAM repairs
Improved implementation of the
MRAMsequential readerCore dump binaries now contain all the debug sections
Both
-flto=fulland-flto=thinare supported for the DPU Runtime Library
Version 2020.1.0
Improve significantly communications performance including copy to/from MRAM
Add support for the
centos8Linux distributionAdd the
dpu-diagself-diagnostic tool to dump information & check the SDK installationUpdate the functions to access MRAM from the DPU (only use
mram_readandmram_write)Macros to get
barriers,mutexandsemaphore(<OBJECT>_GET) are no longer necessaryInterrogate the DIMM hardware to get the DPU clock frequency rather than declaring it through
UPMEM_PROFILE
Version 2019.4.0
Upgrade compiler to LLVM-9.0.1
dpu-lldbis now the default debuggerAdd tool
dputracefor low level execution tracing (cf Using dputrace tool)Add tool
dpu_stack_analyzerfor static analysis of the DPU threads stack size (cf Stack Analyzer)Remove tasklet mailbox concept
Refactor the Host API to be based on DPU sets
Experimental tool:
dpugrindis now available (cf Verifying memory accesses with dpugrind)
Version 2019.3.0
Unification of the DPU Host libraries
libdpuandlibdpucniUpgrade compiler to LLVM-8.0.1
Clang is now able to compile x86 code, the default target is x86
Add tool
dpu-upmem-dpurte-clangwhich uses clang to target dpu backend by defaultThe DPU runtime library is now compiled with debug information
Experimental tool: LLDB is now available for dpu
Version 2019.2.0
Adding DPU logging module based on MRAM buffer (cf
printfinstdio.h)
KTRACEandprintfsupport print of 64-bit integers/floats.The Assembler syntax changed:
%before a register is now optional and not print byllvm-objdump.
?and!before a condition is now optional and not print byllvm-objdump.Condition
in_buffXhas been replaced byncX.Stabilization of MRAM variables
Version 2019.1.0
DPU program binaries are entirely built by
clang,dpucchas been removedthe runtime library is pure C, there is no longer assembly generated by a configuration file and
dpukconfighas been removedthe tasklets configuration is done by declaring
TASKLETS_INITIALIZERand includingrt.hheader (cf Tasklet management and synchronization)the default optimization level for
clangis-O0, the developer must pass explicitly-O2the compiler handles functions with variable arguments properly
Changes in the assembly syntax
Experimental feature: add variables stored in MRAM
Experimental feature: a profiler for DPU binaries (cf Profiling DPU binary)
Version 2018.2.0
Upgrade compiler to LLVM-7.0
Experimental feature: DPUGrind tool
Version 2018.1.0-EAP (January 2018)
Version 2017.1.5-EAP
Support for
kloglogging source in the host APIFinely tune the profiling parameters to select the instrumented data and the output file
Version 2017.1.4-EAP
DPU program binary format is now standard ELF
DPU programs built with an older toolchain version need to be rebuilt
Optimized procedures in the compiler for 8x8 and 16x16 multiplications
Optimized procedures in the compiler for variable initialization
Enriching and optimizing
string.h(cf Standard library functions)Adding dynamic allocation with the implementation of a buddy allocator (cf Memory management)
Experimental feature: adding a Link Time Optimizer in
dpucc(enabled when building with-flto)
Version 2017.1.3-EAP
Bug fixes, including KI0284
ktracecorrectly handles%uformat specifier
Version 2017.1.2-EAP
64-bits types and operators support in the toolchain
Fixed-Size Block Allocator (available in
fsb_allocator.hor directly fromalloc.h)Make the DPU-side API of
ktracemimic printfBuild host APIs for
ktrace, so that a x86 program can collect DPU traces programmaticallyIRAM and WRAM sizes when using the simulator are now configurable
Software caches library functions optimized with 64-bits DMA instructions
dpuccaccepts static libraries as inputs
Version 2017.1.1-EAP
DMA instructions granularity and alignment on 64 bits
-Os support for code optimization in size
ktraceservice infrastructure to print execution string and variables values indpushellwithout using processor resources (cf Logging)
Version 0.7.5-EAP
-O2 is now the default optimization of dpucc
Command-line profiling tool
Host API now accepts an “unlimited” number of simulators
dpukconfighas the ‘-s’ option to run scripts
dpugeniavailable to create native interfaces to DPUsAssembler anonymous labels to easy asm code inlining
Version 0.7.4-EAP
Host API allows callback parameters to tune DPU allocation on the fly
Many optimizations and bug fixes of the C compiler
Actor programming model available (deprecated)
Software caches available to simplify prototyping
Version 0.7.3-EAP
Integrated debugger with dpushell (cf Introduction)
Sequential reader API
Built-in assembler functions and syscalls available (cf Integrating assembly code with C programs)
Version 0.7.2-EAP
-O1 is now the default optimization of
dpuccMany bug fixes and improvements
Version 0.7.1-EAP
First complete toolchain including: assembler, compiler, shell, kconfig and APIs. (cf The UPMEM DPU toolchain)
First programming model defined via tasklets
Host API (cf Controlling the execution of DPUs from host applications)