Tags: #vision-language-model
Multimodal AI Model
24.8k
haotian-liu/LLaVA
An open-source large language and vision assistant (LLaVA) that achieves GPT-4V level multimodal capabilities through visual instruction tuning.
An open-source large language and vision assistant (LLaVA) that achieves GPT-4V level multimodal capabilities through visual instruction tuning.