AI/ML Foundation Model
2.2k 2026-04-18

OpenGVLab/InternVideo

A series of video foundation models and large-scale datasets designed for comprehensive multimodal video understanding and generation.

Core Features

Comprehensive Video Foundation Models: Includes InternVideo, InternVideo2, InternVideo2.5, and InternVideo-Next for various video understanding tasks.
Large-scale Video-Text Datasets: Provides InternVid and InternVid2, facilitating multimodal learning and generation.
Multimodal Understanding: Supports both generative and discriminative learning for diverse video analysis.
Scalable Architectures: Offers models ranging from smaller distilled versions to large 8B parameter models for different computational needs.
Continuous Development: Actively updated with new models, datasets, and technical reports.

Detailed Introduction

InternVideo is a pioneering series of video foundation models and associated large-scale datasets, designed to advance multimodal video understanding. Originating from ECCV 2024, this project explores generative and discriminative learning approaches to build robust models capable of processing and interpreting complex video content. It provides a comprehensive ecosystem, including various model iterations (InternVideo, InternVideo2, InternVideo2.5, InternVideo-Next) and extensive video-text datasets (InternVid), empowering researchers and developers to tackle challenges in video analysis, generation, and multimodal dialogue systems.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.