AI Inference Library
2.1k 2026-04-18

vitoplantamura/OnnxStream

A lightweight C++ inference library for ONNX models, enabling low-memory execution of large AI models like Stable Diffusion XL and Mistral 7B on diverse hardware, from Raspberry Pi Zero 2 to servers.

Core Features

Ultra-low memory footprint, running complex models like SDXL on devices with as little as 298MB RAM.
Extensive platform compatibility including ARM, x86, WASM, and RISC-V architectures.
Supports a wide range of AI models, including Stable Diffusion, Mistral 7B, YOLOv8, and Whisper.
Accelerated inference performance via XNNPACK integration.
Flexible model weight loading through a decoupled `WeightsProvider` interface.

Detailed Introduction

OnnxStream is a C++ inference library designed to address the high memory consumption of traditional machine learning frameworks, particularly for large models like Stable Diffusion. It achieves significantly reduced RAM usage—up to 55x less than OnnxRuntime in some cases—by decoupling the inference engine from the model weight provider. This allows for efficient execution of complex AI models on resource-constrained devices like Raspberry Pi Zero 2, as well as desktops and servers, across various architectures including ARM, x86, WASM, and RISC-V, making advanced AI accessible on edge devices.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.