Tags: #multimodal

AI Model Inference Serving Platform

9.3k

xorbitsai/inference

A unified, production-ready inference API for deploying and serving open-source language, speech, and multimodal AI models on various infrastructures.

llm inference model-serving

Replaces:

OpenAI API

Details

AI-powered Research Content Automation Platform

Python

2.1k

OpenDCAI/Paper2Any

An AI-driven platform that transforms research papers, text, or topics into editable scientific figures, technical diagrams, and presentation slides with universal file support.

ai research-tool document-automation

Replaces:

Microsoft PowerPoint Google Slides...

Details

AI/ML Embedding and Retrieval Toolkit

Hugging Face

11.6k

FlagOpen/FlagEmbedding

FlagEmbedding (BGE) is a comprehensive toolkit offering state-of-the-art embedding models for efficient search, Retrieval-Augmented Generation (RAG), and various multimodal AI applications.

embedding rag llm

Details

AI Agent Benchmarking Platform

python

2.8k

xlang-ai/OSWorld

OSWorld is a benchmark and environment for evaluating multimodal AI agents on open-ended tasks within real computer operating systems.

ai agents benchmarking multimodal

Details

Technical Guide

Python

6.6k

datawhalechina/all-in-rag

A comprehensive, full-stack guide to Retrieval Augmented Generation (RAG) technology for large language model application development, covering theory, practice, and engineering best practices.

rag llm ai development

Details

LLM Memory Management System

python

3.3k

aiming-lab/SimpleMem

SimpleMem offers an efficient, lifelong, and multimodal memory solution for LLM agents, featuring semantic lossless compression for diverse data types.

llm agents lifelong memory multimodal

Details

Resource Collection

8.2k

WangRongsheng/awesome-LLM-resources

A comprehensive, continuously updated collection of the best resources for Large Language Models (LLMs), covering various aspects from data processing to advanced applications.

llm awesome-list resources

Details

AI/ML Inference Serving Framework

Hugging Face

4.6k

vllm-project/vllm-omni

A framework for efficient, fast, and cheap serving of omni-modality (text, image, video, audio) AI models.

multimodal inference serving

Details

High-Performance Data Engine

Python

5.4k

Eventual-Inc/Daft

A high-performance data engine for AI and multimodal workloads, processing diverse data types at scale with Python and Rust.

data engine ai multimodal

Details

AI Model Fine-tuning Framework

HuggingFace

1.8k

2U1/Qwen-VL-Series-Finetune

An open-source implementation for efficiently fine-tuning Alibaba Cloud's Qwen-VL series of multimodal large language models using HuggingFace and Liger-Kernel.

finetuning qwen-vl llm

Details

AI-native Multimodal Data Platform

Python

3.6k

morphik-org/morphik-core

A comprehensive AI-native toolset for accurate document search and storage, designed to integrate complex context from visually rich and multimodal data into AI applications.

ai rag multimodal

Details

Multimodal AI Agent Desktop Application

29.6k

bytedance/UI-TARS-desktop

An open-source desktop application providing a native GUI Agent for human-like task completion through multimodal AI, enabling local and remote computer/browser automation.

ai agent multimodal desktop app

Replaces:

RPA Software

Details

AI-powered Multimodal Data Extraction Library

python

1.5k

emcf/thepipe

A Python library for extracting clean markdown, multimodal media, and structured data from complex documents using vision-language models.

data extraction document processing vlm

Details

AI Agent Development Platform

Python

5.7k

PySpur-Dev/pyspur

PySpur is a visual playground designed to accelerate the iteration, debugging, and deployment of AI agents, helping engineers overcome common challenges like prompt hell and workflow blindspots.

ai agents llm orchestration visual debugging

Details

Research Resource Collection

3.6k

atfortes/Awesome-LLM-Reasoning

A meticulously curated collection of academic papers and resources focused on enhancing and understanding the reasoning abilities of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs).

llm reasoning research

Details

Multimodal AI Model Suite

HuggingFace

10.0k

OpenGVLab/InternVL

A pioneering open-source multimodal large language model family aiming to match or exceed commercial models like GPT-4o/GPT-5 in performance.

multimodal llm open-source

Replaces:

GPT-4o GPT-5

Details

AI Audio Content Generation Platform

Python

6.3k

souzatharsis/podcastfy

An open-source Python package that transforms multi-modal content into captivating multilingual audio conversations using GenAI, serving as a programmatic alternative to tools like NotebookLM.

genai podcast-generation multimodal

Replaces:

NotebookLM

Details