AI Agent Development Framework
7.7k 2026-04-18
GetStream/Vision-Agents
A framework for building intelligent, low-latency multi-modal AI agents that can process real-time video and audio using various LLMs and vision models.
Core Features
Real-time Video AI integration (YOLO, Roboflow, LLMs)
Ultra-low latency audio/video processing via Stream's edge network
Pluggable video processing pipeline for custom models
Native API access to leading LLMs (OpenAI, Gemini, Claude)
Advanced agent capabilities: Tool Calling, RAG, Memory, Turn Detection
Quick Start
uv add vision-agentsDetailed Introduction
Vision Agents by Stream provides a robust framework for developers to rapidly create sophisticated multi-modal AI agents. Leveraging Stream's ultra-low-latency edge network, it enables real-time processing of video and audio, integrating seamlessly with popular LLMs like OpenAI and Gemini, alongside vision models such as YOLO. This platform is ideal for applications requiring instant visual understanding and interactive voice capabilities, from sports coaching to drone surveillance, offering a production-ready solution for complex AI agent deployments.