haotian-liu/LLaVA
An open-source large language and vision assistant (LLaVA) that achieves GPT-4V level multimodal capabilities through visual instruction tuning.
Core Features
Detailed Introduction
LLaVA is a pioneering open-source project dedicated to developing large language and vision assistants. It leverages innovative visual instruction tuning techniques to achieve multimodal capabilities comparable to state-of-the-art commercial models like GPT-4V. The project has seen continuous advancements through iterations such as LLaVA-NeXT, LLaVA-Plus, and LLaVA-Interactive, expanding its ability to process visual information, understand complex instructions, and perform diverse tasks, including tool integration and interactive experiences. LLaVA aims to significantly contribute to open-source multimodal AI research and practical applications.