OpenGVLab/InternVideo
A series of video foundation models and large-scale datasets designed for comprehensive multimodal video understanding and generation.
Core Features
Detailed Introduction
InternVideo is a pioneering project dedicated to advancing video foundation models and multimodal understanding. It encompasses a growing family of models, including InternVideo, InternVideo2, InternVideo2.5, and InternVideo-Next, each pushing the boundaries of generative and discriminative learning for video. Complementing these models is InternVid, a massive video-text dataset crucial for training robust multimodal AI systems. The project aims to enable sophisticated video analysis, generation, and interaction, laying the groundwork for future applications in genuine world understanding and video-centric AI.