Tags: #computer-vision

AI Framework

159.9k

huggingface/transformers

A comprehensive library providing state-of-the-art pre-trained models for various machine learning tasks across text, vision, audio, and multimodal domains, facilitating both inference and training.

machine-learning deep-learning nlp

Details

AI-powered UI Automation Framework

Node.js

12.8k

web-infra-dev/midscene

An AI-powered, vision-driven UI automation framework that enables natural language control and scripting across web, mobile, and custom interfaces.

ai ui-automation cross-platform

Replaces:

UiPath Blue Prism

Details

AI-powered Web Automation Platform

python

21.4k

Skyvern-AI/skyvern

Automates complex browser-based workflows using LLMs and computer vision, providing a resilient and adaptive solution for web interaction.

llm computer-vision web-automation

Details

Multimodal AI Chatbot Framework

3.3k

OpenGVLab/Ask-Anything

An advanced multimodal AI chatbot framework that enables conversational interaction and deep understanding of video and image content, integrating various large language models.

video understanding multimodal ai large language models

Replaces:

ChatGPT

Details

Educational Code Repository

opencv

22.9k

spmallick/learnopencv

A comprehensive repository offering C++ and Python code examples for computer vision, deep learning, and AI research articles from LearnOpenCV.com.

computer-vision deep-learning opencv

Details

Machine Learning Data Library

Python

21.5k

huggingface/datasets

A lightweight library providing one-line dataloaders and efficient pre-processing tools for a vast hub of AI datasets, supporting various ML frameworks.

ai machine learning datasets

Details

AI Agent / Computer Automation Model

python

4.9k

microsoft/fara

An ultra-compact 7B parameter AI agent designed by Microsoft to automate multi-step computer tasks through visual perception and direct interface interaction.

ai-agent slm automation

Details

AI/ML Model Fine-tuning Tool

conda

2.0k

adobe-research/custom-diffusion

Enables fast and efficient multi-concept customization of text-to-image diffusion models like Stable Diffusion using a few images.

diffusion models text-to-image fine-tuning

Details

Deep Learning Adaptation Framework

Python

1.5k

tianrun-chen/SAM-Adapter-PyTorch

A PyTorch-based framework to adapt Meta AI's Segment Anything Model (SAM) for improved performance on challenging downstream computer vision tasks using adapters and prompts.

segmentation computer-vision pytorch

Details

Data Labeling and Annotation Platform

Docker

1.2k

xtreme1-io/xtreme1

An all-in-one open-source platform for multimodal data labeling and annotation, supporting 3D LiDAR, image, and LLM training data with AI-fueled tools.

data labeling data annotation multimodal data

Details

AI Utility / Prompt Engineering Tool

python

3.7k

Hunyuan-PromptEnhancer/PromptEnhancer

A prompt rewriting tool that refines user prompts into clearer, structured versions to enhance the quality of text-to-image generation and image-to-image editing.

prompt-engineering text-to-image image-generation

Details

AI/ML Foundation Model

2.3k

OpenGVLab/InternVideo

A series of video foundation models and large-scale datasets designed for comprehensive multimodal video understanding and generation.

video-foundation-models multimodal-ai deep-learning

Details

Deep Learning Library / Computer Vision Library

PaddlePaddle

3.6k

ZhaoJ9014/face.evoLVe

A high-performance, comprehensive face recognition library built on PaddlePaddle and PyTorch.

face-recognition deep-learning paddlepaddle

Details

Resource Collection / Awesome List

2.4k

Yutong-Zhou-cv/Awesome-Text-to-Image

A comprehensive curated list of resources, papers, datasets, and projects related to text-to-image generation and manipulation.

text-to-image generative-ai computer-vision

Details

Foundation Model Research Hub

22.1k

microsoft/unilm

A comprehensive research hub for large-scale self-supervised pre-training of foundation models across diverse tasks, languages, and modalities.

foundation-models self-supervised-learning multimodal-ai

Details

Machine Learning Automation Framework

python

2.7k

autodistill/autodistill

Autodistill automates the process of training small, fast supervised models from unlabeled images by leveraging large foundation models, eliminating the need for manual data labeling.

computer vision auto labeling model distillation

Details

AI Model / Research Project

2.5k

X-PLUG/mPLUG-Owl

A family of powerful multi-modal large language models (MLLMs) designed to advance AI's understanding and generation capabilities across various data types.

multi-modal llm ai

Details

AI Inference Toolkit

MNN

4.4k

xlite-dev/lite.ai.toolkit

A lightweight C++ toolkit for deploying over 100 AI models across various inference engines.

c++ai-toolkit deep-learning

Details

Curated Research List

2.2k

wangkai930418/awesome-diffusion-categorized

A meticulously categorized collection of research papers on diffusion models, spanning various subareas from visual illusions to image restoration and text-guided editing.

diffusion models research papers computer vision

Details

AI-powered Image Restoration and Upscaling Tool

python

5.5k

Fanghua-Yu/SUPIR

SUPIR is an AI-driven project focused on developing practical algorithms for photo-realistic image restoration and upscaling in real-world scenarios.

image restoration upscaling ai

Replaces:

Adobe Photoshop Adobe Lightroom

Details

Multimodal AI Model Suite

HuggingFace

10.0k

OpenGVLab/InternVL

A pioneering open-source multimodal large language model family aiming to match or exceed commercial models like GPT-4o/GPT-5 in performance.

multimodal llm open-source

Replaces:

GPT-4o GPT-5

Details

Machine Learning Framework

pytorch

3.8k

open-mmlab/mmpretrain

MMPreTrain is an OpenMMLab project providing a comprehensive, open-source PyTorch-based toolbox for pre-training and benchmarking various computer vision and multi-modal models.

deep-learning pytorch computer-vision

Details