Data Processing CLI Tool
4.4k 2026-05-01
rom1504/img2dataset
A highly efficient command-line tool to download, resize, and package large sets of image URLs into machine learning datasets.
Core Features
Rapidly downloads and processes millions of image URLs.
Automatically resizes images to specified dimensions.
Supports saving associated captions for image-text datasets.
Respects web opt-out directives like X-Robots-Tag: noai.
Outputs datasets in structured folders or WebDataset .tar format.
Quick Start
pip install img2datasetDetailed Introduction
img2dataset is an indispensable command-line utility designed for machine learning practitioners and researchers who need to construct large-scale image datasets from URLs. It excels in efficiently downloading, resizing, and packaging vast quantities of images, including associated captions, into ready-to-use formats. Its high performance, capable of processing hundreds of millions of URLs on a single machine, significantly accelerates the data preparation phase for training advanced AI models, while also offering control over web scraping ethics by respecting opt-out directives.