Data Processing CLI Tool
4.4k 2026-04-18

rom1504/img2dataset

An efficient command-line tool to download, resize, and package vast collections of image URLs into ready-to-use datasets for machine learning.

Core Features

High-speed download and processing of millions of image URLs.
Automated image resizing and packaging into various formats (e.g., WebDataset).
Supports saving associated captions for image-text datasets.
Respects web opt-out directives (X-Robots-Tag) by default.
Scalable to billions of image-text pairs with distributed processing.

Quick Start

pip install img2dataset && img2dataset --url_list=myimglist.txt --output_folder=output_folder --thread_count=64 --image_size=256

Detailed Introduction

img2dataset is a powerful and highly optimized command-line interface tool designed to streamline the creation of large-scale image datasets from lists of URLs. It addresses the critical need for efficient data preparation in machine learning, enabling users to quickly download, resize, and organize millions or even billions of images and their corresponding captions. Its performance capabilities, such as processing 100 million URLs in 20 hours on a single machine, make it an invaluable asset for researchers and developers building data-intensive AI models.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.