Tags: #computer-vision
huggingface/transformers
A unified framework providing state-of-the-art machine learning models for text, vision, audio, and multimodal tasks, optimized for both inference and training.
Skyvern-AI/skyvern
Automates complex browser-based workflows using LLMs and computer vision, providing a resilient and adaptive solution for web interaction.
OpenGVLab/Ask-Anything
An AI project extending Large Language Models with video understanding capabilities, enabling conversational AI to process and respond to queries about video content.
spmallick/learnopencv
A comprehensive repository offering C++ and Python code examples for computer vision, deep learning, and AI research, complementing articles on LearnOpenCV.com.
microsoft/fara
An ultra-compact 7B parameter AI agent designed by Microsoft to automate multi-step computer tasks through visual perception and direct interface interaction.
adobe-research/custom-diffusion
Enables fast and efficient multi-concept customization of text-to-image diffusion models like Stable Diffusion using a few images.
tianrun-chen/SAM-Adapter-PyTorch
A PyTorch-based framework to adapt Meta AI's Segment Anything Model (SAM) for improved performance on challenging downstream computer vision tasks using adapters and prompts.
xtreme1-io/xtreme1
An all-in-one open-source platform for multimodal data labeling and annotation, supporting 3D LiDAR, image, and LLM training data with AI-fueled tools.
Hunyuan-PromptEnhancer/PromptEnhancer
A prompt rewriting tool that refines user prompts into clearer, structured versions to enhance the quality of text-to-image generation and image-to-image editing.
OpenGVLab/InternVideo
A series of video foundation models and large-scale datasets designed for comprehensive multimodal video understanding and generation.
ZhaoJ9014/face.evoLVe
A high-performance, comprehensive face recognition library built on PaddlePaddle and PyTorch.
om-ai-lab/VLM-R1
A stable and generalizable R1-style Large Vision-Language Model (VLM) framework that enhances visual understanding tasks through reinforced learning, outperforming SFT models in generalization.
Yutong-Zhou-cv/Awesome-Text-to-Image
A comprehensive curated list of resources, papers, datasets, and projects focused on text-to-image generation and manipulation.
microsoft/unilm
A comprehensive research initiative by Microsoft focusing on large-scale self-supervised pre-training to develop advanced foundation models across diverse tasks, languages, and modalities.
autodistill/autodistill
Autodistill automates the process of training small, fast supervised models from unlabeled images by leveraging large foundation models, eliminating the need for manual data labeling.
X-PLUG/mPLUG-Owl
A family of powerful multi-modal large language models (MLLMs) designed to advance AI's understanding and generation capabilities across various data types.
xlite-dev/lite.ai.toolkit
A lightweight C++ toolkit for deploying over 100 diverse AI models with multiple inference engines.
wangkai930418/awesome-diffusion-categorized
A meticulously categorized collection of research papers on diffusion models, organized by diverse subareas such as visual illusion, color in generation, image restoration, and text-guided editing.
Fanghua-Yu/SUPIR
SUPIR is an AI-driven project focused on developing practical algorithms for photo-realistic image restoration and upscaling in real-world scenarios.
OpenGVLab/InternVL
A pioneering open-source multimodal AI model family designed to serve as a high-performance alternative to commercial models like GPT-4o and GPT-5.