OpenGVLab/InternVideo
A series of video foundation models and large-scale datasets designed for comprehensive multimodal video understanding and generation.
Core Features
Detailed Introduction
InternVideo is a pioneering series of video foundation models and associated large-scale datasets, designed to advance multimodal video understanding. Originating from ECCV 2024, this project explores generative and discriminative learning approaches to build robust models capable of processing and interpreting complex video content. It provides a comprehensive ecosystem, including various model iterations (InternVideo, InternVideo2, InternVideo2.5, InternVideo-Next) and extensive video-text datasets (InternVid), empowering researchers and developers to tackle challenges in video analysis, generation, and multimodal dialogue systems.