Tags: #multimodal
HKUDS/LightRAG
LightRAG is a simple and fast Retrieval-Augmented Generation (RAG) system designed for efficient and scalable knowledge retrieval and generation with Large Language Models.
xorbitsai/inference
A unified, production-ready inference API for effortlessly deploying and serving open-source language, speech, and multimodal AI models across various environments.
OpenDCAI/Paper2Any
An AI-driven platform that transforms research papers, text, or topics into editable scientific figures, technical diagrams, and presentation slides with universal file support.
FlagOpen/FlagEmbedding
A comprehensive toolkit providing state-of-the-art embedding and reranker models for efficient information retrieval and Retrieval-Augmented Generation (RAG) applications.
google-gemini/genai-processors
A lightweight Python library for building modular, asynchronous, and composable AI pipelines, enabling efficient, parallel, and multimodal content processing for Generative AI applications.
TEN-framework/ten-framework
An open-source framework for building real-time multimodal conversational AI agents.
datawhalechina/all-in-rag
A comprehensive, full-stack guide to Retrieval-Augmented Generation (RAG) technology, covering theory, practice, and engineering best practices for building LLM applications.
activeloopai/deeplake
Deep Lake is an AI data runtime and database optimized for deep learning, offering multimodal data storage, querying, vector search, and streaming for LLM and deep learning applications.
aiming-lab/SimpleMem
SimpleMem provides an efficient, lifelong, and multimodal memory solution for LLM agents, featuring semantic lossless compression and retrieval.
WangRongsheng/awesome-LLM-resources
A comprehensive and continuously updated collection of the world's best resources for Large Language Models (LLMs), covering various aspects from data to advanced applications.
Eventual-Inc/Daft
A high-performance data engine for AI and multimodal workloads, processing diverse data types at scale with Python and Rust.
2U1/Qwen-VL-Series-Finetune
An open-source implementation for efficiently fine-tuning Alibaba Cloud's Qwen-VL series of multimodal large language models using HuggingFace and Liger-Kernel.
morphik-org/morphik-core
Morphik Core is an AI-native platform providing accurate document search and storage for building robust AI applications, specifically designed to handle complex, visually rich, and multimodal data, overcoming the limitations of traditional RAG.
emcf/thepipe
A Python library for extracting clean markdown, multimodal media, and structured data from complex documents using vision-language models.
OpenGVLab/InternVideo
A series of video foundation models and large-scale datasets designed for comprehensive multimodal video understanding and generation.
atfortes/Awesome-LLM-Reasoning
A comprehensive, curated collection of research papers and resources focused on enhancing and understanding the reasoning abilities of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs).