vitoplantamura/OnnxStream
A lightweight C++ inference library designed to run large ONNX-based AI models like Stable Diffusion XL and Mistral 7B on resource-constrained devices with minimal memory footprint.
Core Features
Detailed Introduction
OnnxStream is a highly optimized C++ inference library specifically engineered to enable the execution of large ONNX-formatted AI models on devices with severely limited memory, such as the Raspberry Pi Zero 2. Unlike traditional ML frameworks that prioritize latency or throughput at the expense of RAM, OnnxStream focuses on minimizing memory consumption, achieving up to 55x less memory usage than OnnxRuntime. It decouples the inference engine from weight provision, allowing for flexible data loading strategies. This makes it ideal for deploying complex models like Stable Diffusion XL and Mistral 7B in edge computing and embedded systems.