AI Agent Development Framework
7.7k 2026-04-18

GetStream/Vision-Agents

A framework for building intelligent, low-latency multi-modal AI agents that can process real-time video and audio using various LLMs and vision models.

Core Features

Real-time Video AI integration (YOLO, Roboflow, LLMs)
Ultra-low latency audio/video processing via Stream's edge network
Pluggable video processing pipeline for custom models
Native API access to leading LLMs (OpenAI, Gemini, Claude)
Advanced agent capabilities: Tool Calling, RAG, Memory, Turn Detection

Quick Start

uv add vision-agents

Detailed Introduction

Vision Agents by Stream provides a robust framework for developers to rapidly create sophisticated multi-modal AI agents. Leveraging Stream's ultra-low-latency edge network, it enables real-time processing of video and audio, integrating seamlessly with popular LLMs like OpenAI and Gemini, alongside vision models such as YOLO. This platform is ideal for applications requiring instant visual understanding and interactive voice capabilities, from sports coaching to drone surveillance, offering a production-ready solution for complex AI agent deployments.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.