vitoplantamura/OnnxStream
A lightweight C++ inference library for ONNX models, enabling low-memory execution of large AI models like Stable Diffusion XL and Mistral 7B on diverse hardware, from Raspberry Pi Zero 2 to servers.
Core Features
Detailed Introduction
OnnxStream is a C++ inference library designed to address the high memory consumption of traditional machine learning frameworks, particularly for large models like Stable Diffusion. It achieves significantly reduced RAM usage—up to 55x less than OnnxRuntime in some cases—by decoupling the inference engine from the model weight provider. This allows for efficient execution of complex AI models on resource-constrained devices like Raspberry Pi Zero 2, as well as desktops and servers, across various architectures including ARM, x86, WASM, and RISC-V, making advanced AI accessible on edge devices.