AI Agent Development Framework
7.7k 2026-05-01
GetStream/Vision-Agents
Build low-latency, multi-modal AI agents that process real-time video and audio using various LLMs and vision models.
Core Features
Real-time Multi-modal AI (Video & Voice)
Ultra-low Latency via Stream's Edge Network
Pluggable Video Processing Pipeline (YOLO, Roboflow)
Native LLM API Integrations (OpenAI, Gemini, Claude)
Tool Calling, RAG, and Persistent Memory
Quick Start
uv add vision-agentsDetailed Introduction
Vision Agents by Stream is an open-source framework designed for building intelligent, low-latency multi-modal AI agents that can watch, listen, and understand real-time video. It provides building blocks to integrate various LLMs (OpenAI, Gemini) and vision models (YOLO, Roboflow) with Stream's ultra-low-latency edge network. The platform offers features like real-time WebRTC, pluggable video processing, tool calling, RAG, and memory, enabling developers to create interactive AI experiences for diverse applications such as sports coaching, drone monitoring, and physical therapy.