Inference on devices?
February 14, 2025 • Issue 88
.. No images? Click here Get the details at nervesconf.us Read more at nervesconf.eu On-device ML inference & NervesThis should be a topic of someone's talk at some point but I wanted to give a quick overview of things that are currently available, in use and being explored, respectively for running Machine Learning inference on Nerves devices. In the Elixir machine learning space we have a bunch of projects. The quick run-down is:
If you've looked at embedded boards recently they all seem to ship an "NPU" of some sort now. This just means a specialized math co-processor of some sort. Some of them are mystery meat where they have some scary SDK and two very specific examples for using it. "Look, it works". I hear Rockchip is actually upstreaming a bunch of their acceleration and it would then end up being a Linux Compute Accelerator device. Imagine if they were all standard. What a world. If you've watched the space for a bit you've probably seen the Coral TPU fly by. It has aged. The most talked about chip right now seems to be what Raspberry Pi ships as their AI Kit and that part is showing up elsewhere as well. The Hailo 8L and Hailo 8. BumblebeeI've run a bunch of Whisper transcriptions and it'd be the same with other Bumblebee models. If they run ok on CPU you can just use them. I hope Nx will have a backend that works across more devices eventually. The new MLIR work should enable general translation to other accelerator frameworks. I've seen projects that seem to take MLIR and produce the Vulkan that a Compute Accelerator under Linux DRM would need for example. No idea how difficult it is to string together. But yeah. Bumblebee is really nice to set up for the models that fit in embedded use. OpenCV / evisionThe evision library by Cocoa Xu is an awesome tool for running computer vision workloads and it can do a ton of things. Because OpenCV can do a ton of things. It also includes a bunch of models and support for many model formats, for machine vision applications. This works. I believe we even have pre-compiled builds for a bunch of Nerves-friendly platforms. There are a ton of examples you can run in Livebook. TFLiteAlso supported by Cocoa we get tflite_elixir. Tensor Flow Lite or TFLite is a simpler variant of Tensor Flow suited to embedded devices and constrained accelerators. Notably the TFLite tooling supports the Coral TPU. For examples of what models are available Qualcomm released 80 models, a lot of those are available in TFLite format if not all of them. Some would work with the Coral, I don't know if it needs to fit into the accelerator working memory or if it gets clever. Beyond that TFLite is a target for other accelerators as well. Someone can make an execution provider for TFLite and make a TFLite model run on their accelerator. pythonx (experimental)Unclear how well this would support inference on embedded devices. But Cocoa built this wild prototype which might enable using the entire Python ML ecosystem if you like. In addition with Explorer and Nx you can shift raw data sideways between Python and those tools in their NIFs without using Elixir as a slow go-between. Hailo support (RPi AI Kit)There was a concerted effort by Gus, Cocoa, Paulo, Vittoria and myself to get the fundamentals working. Someone needs to wire up the libraries, maybe try an execution provider that's appropriate. To actually run inference on the thing. But the Hailo drivers and runtime are usable on Nerves right now. All of it is WIP and needs polish and packaging but it is quite close. I saw some mention of TFLite models being convertible to Hailo's proprietary format. CudaNope. The tooling is an absolute hellscape. Just impressively painful. Even their Docker tooling requires a bunch of custom madness. I guess you're doing Ubuntu if you want a Jetson/Orin/etc. It sucks but it is not just a problem for Nerves. Ortex / ONNX on ElixirAbelino at Redwire Labs just solved the mysterious and painful build problem I had with Ortex to enable running ONNX-based inference on Nerves devices. This opens a lot of doors. ONNX is a fairly common format for models now and with many models quite usable on embedded devices. I am currently building the Nerves system I want to use this with and you can follow along in various online spaces with how that goes. It also supports people developing additional execution providers (along with a bunch of built-in ones). Hailo has a fork of onnx which seems to include a Hailo provider. Someone should wire that up.
I think that's enough of an overview of the current stuff. The Hailo and Ortex progress is what I think will provide the fastest path to running inference easily on devices in Elixir-land. Let me know what you think or if you have questions. Project Updates
Nerves Meetup (remote)Gus Workman is coming to the February meetup to talk about his Soleil project! Check out the event page and contact them if you want to present! Got questions?Trouble-shooting is best done on the Nerves Forum over at Elixir's Forum. But if you have big-picture questions you would like to ask around Nerves, feel free to send them in and we might just have ourselves a column here. The Nestlet deviceSteven Fuchs shares his build of a device to wrangle his Nest thermostat. Nerves shirts can be bought oswag.org. Stickers with every purchase! Elixir shirts on pre-order :) Participating in the communityThe Nerves community is found wherever Elixirists gather. Try any of the following: Questions are best asked on the Elixir Forum. Social conversation and banter:
How you can help NervesContribute in the way that works for you:
Finally, if you have questions about the newsletter or want to suggest something you can simply respond to this email. - Lars |