OSS Alternative - Discover Top Open Source Alternatives to Popular Software

GetStream/Vision-Agents

Build low-latency, multi-modal AI agents that process real-time video and audio using various LLMs and vision models.

Core Features

Real-time Multi-modal AI (Video & Voice)

Ultra-low Latency via Stream's Edge Network

Pluggable Video Processing Pipeline (YOLO, Roboflow)

Native LLM API Integrations (OpenAI, Gemini, Claude)

Tool Calling, RAG, and Persistent Memory

Quick Start

uv add vision-agents

Detailed Introduction

Vision Agents by Stream is an open-source framework designed for building intelligent, low-latency multi-modal AI agents that can watch, listen, and understand real-time video. It provides building blocks to integrate various LLMs (OpenAI, Gemini) and vision models (YOLO, Roboflow) with Stream's ultra-low-latency edge network. The platform offers features like real-time WebRTC, pluggable video processing, tool calling, RAG, and memory, enabling developers to create interactive AI experiences for diverse applications such as sports coaching, drone monitoring, and physical therapy.