Allwinner V831 Reverse-engineered Neural Processor Unit (NPU)
When Sipeed presented the MAIX-II Dock AIoT vision development kit, asked for help from the community to help reverse engineer the Allwinner V831 NPU to create an open source AI toolchain based on NCNN.
Sipeed had already decoded the NPU registers, and Jasbir offered to help with the next step and received a free sample array to try it out. Good progress has been made and it is now possible to detect objects such as a boat using the cifar10 object recognition sample.
The Allwinner V831 NPU is based on a custom implementation of the open source architecture NVIDIA Deep Learning Accelerator (NVDLA), something Allwinner (via Sipeed) asked us to remove from the initial announcement, and after some work of reverse engineering, Jasbir determined the following key discovery:
- The NPU clock defaults to 400 MHz, but can be set between 100 and 1200 MHz
- NPU is implemented with the nv_small (NV Small Model) configuration and relies on shared system memory for all data operations.
- int8 and int16 are supported with int8 preferred for speed and limited onboard memory (64MB)
- 64 MAC (Atomic-C * Atomic-K)
- Programmable memory mapped register from user space
- Physical address locations are needed when referencing weights and I / O data locations, which means kernel memory must be allocated and physical addresses retrieved when accessed from user space .
- NPU weights and input / output data follow a similar layout to NVDLA private formats, so formats such as nhwc or nchw must be transformed before being sent to the NPU.
These discoveries allowed him to adapt the code for the cifar10 demo of Arm’s CMSIS_5 NN library, thus removing all closed binaries from Allwinner. You can find the source code on the v831-npu repository on Github and can check out Jasbir’s article on how to try it out as long as you have an Allwinner V831 board handy.
The current code supports direct convolutions, adding skew, reread / prelu, elementary operations, and max / average pooling, and there is still work to be done, including the development of a conversion utility. weight and data input / output and integration into an existing IA framework.
The good news is that the work should also benefit other features of the platform, an NVDLA-based AI accelerator, including Beagle V SBC, which has just started to end up in the hands of developers in the future. A few days.
Jean-Luc started CNX Software in 2010 part-time, before stepping down as head of software engineering and starting writing daily news and reviews full-time later in 2011.