Tags: #computer-vision

Deep Learning Library / Machine Learning Framework
159.3k

huggingface/transformers

A unified framework providing state-of-the-art machine learning models for text, vision, audio, and multimodal tasks, optimized for both inference and training.

AI-powered Web Automation Platform
python
21.1k

Skyvern-AI/skyvern

Automates complex browser-based workflows using LLMs and computer vision, providing a resilient and adaptive solution for web interaction.

Multimodal Large Language Model
3.3k

OpenGVLab/Ask-Anything

An AI project extending Large Language Models with video understanding capabilities, enabling conversational AI to process and respond to queries about video content.

Replaces:
Details
Educational Resource & Code Examples
python
22.9k

spmallick/learnopencv

A comprehensive repository offering C++ and Python code examples for computer vision, deep learning, and AI research, complementing articles on LearnOpenCV.com.

AI Agent / Computer Automation Model
python
4.9k

microsoft/fara

An ultra-compact 7B parameter AI agent designed by Microsoft to automate multi-step computer tasks through visual perception and direct interface interaction.

AI/ML Model Fine-tuning Tool
conda
2.0k

adobe-research/custom-diffusion

Enables fast and efficient multi-concept customization of text-to-image diffusion models like Stable Diffusion using a few images.

Deep Learning Adaptation Framework
Python
1.5k

tianrun-chen/SAM-Adapter-PyTorch

A PyTorch-based framework to adapt Meta AI's Segment Anything Model (SAM) for improved performance on challenging downstream computer vision tasks using adapters and prompts.

Data Labeling and Annotation Platform
Docker
1.2k

xtreme1-io/xtreme1

An all-in-one open-source platform for multimodal data labeling and annotation, supporting 3D LiDAR, image, and LLM training data with AI-fueled tools.

AI Utility / Prompt Engineering Tool
python
3.7k

Hunyuan-PromptEnhancer/PromptEnhancer

A prompt rewriting tool that refines user prompts into clearer, structured versions to enhance the quality of text-to-image generation and image-to-image editing.

AI/ML Foundation Model
2.2k

OpenGVLab/InternVideo

A series of video foundation models and large-scale datasets designed for comprehensive multimodal video understanding and generation.

Deep Learning Library / Computer Vision Library
PaddlePaddle
3.6k

ZhaoJ9014/face.evoLVe

A high-performance, comprehensive face recognition library built on PaddlePaddle and PyTorch.

Large Vision-Language Model Framework
xllm
5.9k

om-ai-lab/VLM-R1

A stable and generalizable R1-style Large Vision-Language Model (VLM) framework that enhances visual understanding tasks through reinforced learning, outperforming SFT models in generalization.

Resource Collection / Awesome List
2.4k

Yutong-Zhou-cv/Awesome-Text-to-Image

A comprehensive curated list of resources, papers, datasets, and projects focused on text-to-image generation and manipulation.

AI Research Hub
22.1k

microsoft/unilm

A comprehensive research initiative by Microsoft focusing on large-scale self-supervised pre-training to develop advanced foundation models across diverse tasks, languages, and modalities.

Machine Learning Automation Framework
python
2.7k

autodistill/autodistill

Autodistill automates the process of training small, fast supervised models from unlabeled images by leveraging large foundation models, eliminating the need for manual data labeling.

AI Model / Research Project
2.5k

X-PLUG/mPLUG-Owl

A family of powerful multi-modal large language models (MLLMs) designed to advance AI's understanding and generation capabilities across various data types.

AI Inference Toolkit
mnn
4.4k

xlite-dev/lite.ai.toolkit

A lightweight C++ toolkit for deploying over 100 diverse AI models with multiple inference engines.

Curated Research Collection
2.2k

wangkai930418/awesome-diffusion-categorized

A meticulously categorized collection of research papers on diffusion models, organized by diverse subareas such as visual illusion, color in generation, image restoration, and text-guided editing.

AI-powered Image Restoration and Upscaling Tool
python
5.5k

Fanghua-Yu/SUPIR

SUPIR is an AI-driven project focused on developing practical algorithms for photo-realistic image restoration and upscaling in real-world scenarios.

Multimodal AI Model Suite
Python
10.0k

OpenGVLab/InternVL

A pioneering open-source multimodal AI model family designed to serve as a high-performance alternative to commercial models like GPT-4o and GPT-5.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.