rom1504/img2dataset - OSS Alternative - Discover Top Open Source Alternatives to Popular Software
Data Processing CLI Tool
4.4k 2026-05-01

rom1504/img2dataset

A highly efficient command-line tool to download, resize, and package large sets of image URLs into machine learning datasets.

Core Features

Rapidly downloads and processes millions of image URLs.
Automatically resizes images to specified dimensions.
Supports saving associated captions for image-text datasets.
Respects web opt-out directives like X-Robots-Tag: noai.
Outputs datasets in structured folders or WebDataset .tar format.

Quick Start

pip install img2dataset

Detailed Introduction

img2dataset is an indispensable command-line utility designed for machine learning practitioners and researchers who need to construct large-scale image datasets from URLs. It excels in efficiently downloading, resizing, and packaging vast quantities of images, including associated captions, into ready-to-use formats. Its high performance, capable of processing hundreds of millions of URLs on a single machine, significantly accelerates the data preparation phase for training advanced AI models, while also offering control over web scraping ethics by respecting opt-out directives.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.