On-device ML inference & Nerves

This should be a topic of someone's talk at some point but I wanted to give a quick overview of things that are currently available, in use and being explored, respectively for running Machine Learning inference on Nerves devices.

In the Elixir machine learning space we have a bunch of projects. The quick run-down is:

Nx - Fundamentals of doing Machine Learning via macros in Elixir with various backends for producing and running the resulting math. Provides the primitive ways of working with appropriate numeric data types.
Axon - Higher level library providing operations for neural networks.
Explorer - Rust NIF-based data-wrangling library using Apache Arrow data frames. See also ADBC for reading data frames from databases.
Ortex - A Rust NIF-based ONNX runtime. ONNX is a format for packaging ML models so they can be easily run on a variety of platforms and accelerators. It interoperates with Nx but does not run models in the way Nx does. Accelerator and platform support comes from ONNX Execution Providers.

If you've looked at embedded boards recently they all seem to ship an "NPU" of some sort now. This just means a specialized math co-processor of some sort. Some of them are mystery meat where they have some scary SDK and two very specific examples for using it. "Look, it works". I hear Rockchip is actually upstreaming a bunch of their acceleration and it would then end up being a Linux Compute Accelerator device. Imagine if they were all standard. What a world.

If you've watched the space for a bit you've probably seen the Coral TPU fly by. It has aged. The most talked about chip right now seems to be what Raspberry Pi ships as their AI Kit and that part is showing up elsewhere as well. The Hailo 8L and Hailo 8.

Bumblebee

I've run a bunch of Whisper transcriptions and it'd be the same with other Bumblebee models. If they run ok on CPU you can just use them. I hope Nx will have a backend that works across more devices eventually. The new MLIR work should enable general translation to other accelerator frameworks. I've seen projects that seem to take MLIR and produce the Vulkan that a Compute Accelerator under Linux DRM would need for example. No idea how difficult it is to string together.

But yeah. Bumblebee is really nice to set up for the models that fit in embedded use.

OpenCV / evision

The evision library by Cocoa Xu is an awesome tool for running computer vision workloads and it can do a ton of things. Because OpenCV can do a ton of things. It also includes a bunch of models and support for many model formats, for machine vision applications. This works. I believe we even have pre-compiled builds for a bunch of Nerves-friendly platforms. There are a ton of examples you can run in Livebook.

TFLite

Also supported by Cocoa we get tflite_elixir. Tensor Flow Lite or TFLite is a simpler variant of Tensor Flow suited to embedded devices and constrained accelerators. Notably the TFLite tooling supports the Coral TPU. For examples of what models are available Qualcomm released 80 models, a lot of those are available in TFLite format if not all of them. Some would work with the Coral, I don't know if it needs to fit into the accelerator working memory or if it gets clever. Beyond that TFLite is a target for other accelerators as well. Someone can make an execution provider for TFLite and make a TFLite model run on their accelerator.

pythonx (experimental)

Unclear how well this would support inference on embedded devices. But Cocoa built this wild prototype which might enable using the entire Python ML ecosystem if you like. In addition with Explorer and Nx you can shift raw data sideways between Python and those tools in their NIFs without using Elixir as a slow go-between.

Hailo support (RPi AI Kit)

There was a concerted effort by Gus, Cocoa, Paulo, Vittoria and myself to get the fundamentals working. Someone needs to wire up the libraries, maybe try an execution provider that's appropriate. To actually run inference on the thing. But the Hailo drivers and runtime are usable on Nerves right now. All of it is WIP and needs polish and packaging but it is quite close.

I saw some mention of TFLite models being convertible to Hailo's proprietary format.

Cuda

Nope. The tooling is an absolute hellscape. Just impressively painful. Even their Docker tooling requires a bunch of custom madness. I guess you're doing Ubuntu if you want a Jetson/Orin/etc. It sucks but it is not just a problem for Nerves.

Ortex / ONNX on Elixir

Abelino at Redwire Labs just solved the mysterious and painful build problem I had with Ortex to enable running ONNX-based inference on Nerves devices. This opens a lot of doors. ONNX is a fairly common format for models now and with many models quite usable on embedded devices.

I am currently building the Nerves system I want to use this with and you can follow along in various online spaces with how that goes.

It also supports people developing additional execution providers (along with a bunch of built-in ones). Hailo has a fork of onnx which seems to include a Hailo provider. Someone should wire that up.

I think that's enough of an overview of the current stuff. The Hailo and Ortex progress is what I think will provide the fastest path to running inference easily on devices in Elixir-land. Let me know what you think or if you have questions.

Project Updates

circuits_sim, v0.1.2

Changes
- Support I2C devices returning errors and add experimental support for this on the SHT4X
- Fix I2CServer.send_message/1 with SimpleI2CDevice (@bithium)

circuits_i2c, v2.1.0

New features
- Support setting a timeout on I2C bus transactions. Support for this depends on the backend and drivers, but it’s at least possible to set it. The default is usually 1 second with Linux.
Improvements
- Various documentation and spec cleanup

circuits_i2c, v2.0.7

Improvements
- Raise earlier when bad values are passed for the :retries option
- Update copyrights and license info for REUSE compliance

Nerves Meetup (remote)

Gus Workman is coming to the February meetup to talk about his Soleil project! Check out the event page and contact them if you want to present!

Got questions?

Trouble-shooting is best done on the Nerves Forum over at Elixir's Forum. But if you have big-picture questions you would like to ask around Nerves, feel free to send them in and we might just have ourselves a column here.

The Nestlet device

Steven Fuchs shares his build of a device to wrangle his Nest thermostat.

Nerves shirts can be bought oswag.org. Stickers with every purchase! Elixir shirts on pre-order :)
-Lars

Participating in the community

The Nerves community is found wherever Elixirists gather. Try any of the following:

Questions are best asked on the Elixir Forum.

Social conversation and banter:

Elixir Slack, #nerves channel
Elixir Discord, #nerves channel

How you can help Nerves

Contribute in the way that works for you:

Send corrections or improvements for documentation wherever it fails to help you.
Write about Nerves, give talks about Nerves. Make videos about Nerves. It is all good.
Write or port a new hardware library and include it in the Elixir Circuits collection.
Get in touch about taking over maintenance duties for some libraries, we might be able to provide you hardware.
Apply for an EEF stipend on something Nerves-related and build it. We can help if you have questions about this.

Finally, if you have questions about the newsletter or want to suggest something you can simply respond to this email.

- Lars