Tags: #multimodal-ai

AI Framework
Python
11.2k

pipecat-ai/pipecat

An open-source Python framework for building real-time, voice-first, and multimodal conversational AI agents with ultra-low latency.

AI Inference Framework
Python
25.7k

sgl-project/sglang

A high-performance serving framework designed to accelerate inference for large language models and multimodal AI models.

AI Agent Development Platform
Docker
4.3k

ModelEngine-Group/nexent

Nexent is a zero-code platform for auto-generating production-grade AI agents, unifying tools, skills, memory, and orchestration with built-in controls.

Local AI Inference Platform
Docker
45.1k

mudler/LocalAI

An open-source AI engine that allows running various AI models (LLMs, vision, voice, image, video) locally on any hardware, including CPU-only, with drop-in API compatibility for commercial services.

Data Lakehouse Format
Python
6.3k

lance-format/lance

An open lakehouse format designed for multimodal AI, offering high-performance vector search, lightning-fast random access, and robust data versioning capabilities.

Multimodal Large Language Model
3.3k

OpenGVLab/Ask-Anything

An AI project extending Large Language Models with video understanding capabilities, enabling conversational AI to process and respond to queries about video content.

Replaces:
Details
AI Data Platform
python
10.0k

lancedb/lancedb

An open-source, embedded retrieval library and multimodal AI lakehouse designed for fast, scalable vector search and data management in AI/ML applications.

AI Data Processing Framework
python
3.4k

towhee-io/towhee

Towhee is a cutting-edge framework designed to simplify and accelerate neural data processing pipelines, particularly for unstructured multimodal data and LLM orchestration.

AI Data Curation Toolkit
NVIDIA NeMo
1.5k

NVIDIA-NeMo/Curator

A GPU-accelerated, scalable toolkit for multimodal data preprocessing and curation, designed to train better AI models faster.

Machine Learning Fine-tuning Framework
python
2.7k

roboflow/maestro

A streamlined tool to accelerate the fine-tuning process for multimodal models like Florence-2, PaliGemma 2, and Qwen2.5-VL.

Open-source Large Language Model Framework
Hugging Face
8.3k

LianjiaTech/BELLE

BELLE is an open-source project dedicated to fostering the development of Chinese conversational large language models, aiming to make LLMs accessible to everyone.

Replaces:
Details
Deep Learning Alignment Framework
python
4.6k

PKU-Alignment/align-anything

A modular framework for aligning any-modality large models with human intentions and values using diverse fine-tuning and reinforcement learning methods.

AI Agent Toolset / Generative Media CLI
muapi-cli
3.0k

SamurAIGPT/Generative-Media-Skills

A multimodal toolset enabling AI agents to generate, edit, and display professional-grade images, videos, and audio using a CLI-powered architecture.

Multimodal AI Inference and Serving Framework
python
4.4k

vllm-project/vllm-omni

vLLM-Omni is an efficient, flexible, and easy-to-use framework extending vLLM to serve omni-modality models (text, image, video, audio) with high throughput and an OpenAI-compatible API.

Multimodal AI Data Platform
python
1.5k

pixeltable/pixeltable

A declarative, transactional Python library for building multimodal AI applications with incremental data storage, transformation, indexing, and orchestration.

AI/ML Evaluation Framework
python
4.0k

EvolvingLMMs-Lab/lmms-eval

A unified, reproducible, and efficient multimodal evaluation toolkit for large language models across text, image, video, and audio tasks.

On-device Multimodal AI Application
python
1.6k

fikrikarim/parlor

Parlor is an on-device, real-time multimodal AI that enables natural voice and vision conversations, running entirely on your local machine.

Replaces:
Details
Machine Learning Research Framework
pytorch
5.6k

facebookresearch/mmf

A modular PyTorch-based framework from Facebook AI Research for state-of-the-art vision and language multimodal AI research.

Desktop AI Agent Application
29.4k

bytedance/UI-TARS-desktop

UI-TARS Desktop is an open-source application that provides a native GUI Agent, enabling AI to control local and remote computers and browsers through the UI-TARS model.

AI Model Compilation
2.0k

chenking2020/FindTheChatGPTer

A curated directory of open-source alternatives to ChatGPT and GPT-4, encompassing text and multimodal large language models, designed to assist users in navigating the AI landscape.

AI Research Hub
22.1k

microsoft/unilm

A comprehensive research initiative by Microsoft focusing on large-scale self-supervised pre-training to develop advanced foundation models across diverse tasks, languages, and modalities.

Multimodal AI System
2.9k

InternLM/InternLM-XComposer

A comprehensive multimodal AI system specializing in long-term streaming video and audio interactions, offering advanced vision-language understanding and composition.

Replaces:
Details
AI Service Framework
docker
21.9k

jina-ai/serve

A cloud-native framework for building, deploying, and scaling multimodal AI applications and services with gRPC, HTTP, and WebSockets.

Multimodal AI Model
17.7k

deepseek-ai/Janus

Janus-Series is a family of unified autoregressive multimodal AI models designed for both understanding and generating content across various modalities, featuring a novel decoupled visual encoding strategy.

AI/ML Inference SDK
android
8.0k

NexaAI/nexa-sdk

A high-performance local inference framework for running frontier multimodal AI models on various devices with minimal energy consumption.

Multimodal AI Model Suite
Python
10.0k

OpenGVLab/InternVL

A pioneering open-source multimodal AI model family designed to serve as a high-performance alternative to commercial models like GPT-4o and GPT-5.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.