Tags: #vlm
hiyouga/LlamaFactory
A unified and efficient framework for fine-tuning over 100 large language models (LLMs) and vision-language models (VLMs) with both CLI and Web UI.
OpenRLHF/OpenRLHF
An easy-to-use, scalable, and high-performance open-source framework for Reinforcement Learning from Human Feedback (RLHF) based on Ray and vLLM.
heshengtao/comfyui_LLM_party
A ComfyUI extension providing a comprehensive LLM agent framework for building custom AI assistants, integrating diverse AI models, and automating complex workflows.
stas00/ml-engineering
An open collection of methodologies, tools, and step-by-step instructions for successfully training, fine-tuning, and inferencing large language and multi-modal models.
oumi-ai/oumi
An end-to-end platform for fine-tuning, evaluating, and deploying open-source Large Language Models (LLMs) and Vision Language Models (VLMs).
roboflow/maestro
A streamlined tool to accelerate the fine-tuning process for multimodal models like Florence-2, PaliGemma 2, and Qwen2.5-VL.
emcf/thepipe
A Python library for extracting clean markdown, multimodal media, and structured data from complex documents using vision-language models.
om-ai-lab/VLM-R1
A stable and generalizable R1-style Large Vision-Language Model (VLM) framework that enhances visual understanding tasks through reinforced learning, outperforming SFT models in generalization.
om-ai-lab/OmAgent
A Python library simplifying the development of multimodal language agents by abstracting complex engineering and providing native multimodal support.
NexaAI/nexa-sdk
A high-performance local inference framework for running frontier multimodal AI models on various devices with minimal energy consumption.