OSS Alternative - Discover Top Open Source Alternatives to Popular Software

vitoplantamura/OnnxStream

A lightweight C++ inference library designed to run large ONNX-based AI models like Stable Diffusion XL and Mistral 7B on resource-constrained devices with minimal memory footprint.

Core Features

Extremely low memory footprint for ONNX model inference (up to 55x less than OnnxRuntime).

Supports a wide range of AI models including Stable Diffusion, LLMs (Mistral 7B, TinyLlama), YOLOv8, and Whisper.

Broad platform and architecture compatibility (ARM, x86, WASM, RISC-V).

Provides Python, C#, and JavaScript (WASM) bindings for easy integration.

Optimized for performance on resource-constrained devices, accelerated by XNNPACK.

Detailed Introduction

OnnxStream is a highly optimized C++ inference library specifically engineered to enable the execution of large ONNX-formatted AI models on devices with severely limited memory, such as the Raspberry Pi Zero 2. Unlike traditional ML frameworks that prioritize latency or throughput at the expense of RAM, OnnxStream focuses on minimizing memory consumption, achieving up to 55x less memory usage than OnnxRuntime. It decouples the inference engine from weight provision, allowing for flexible data loading strategies. This makes it ideal for deploying complex models like Stable Diffusion XL and Mistral 7B in edge computing and embedded systems.