Self-hosted Web Archiving Tool
27.2k 2026-04-19

ArchiveBox/ArchiveBox

An open-source, self-hosted web archiving tool designed to preserve web content from various sources in multiple durable formats for long-term access.

Core Features

Preserves web content in redundant formats like HTML, PDF, PNG, WARC, and more.
Supports diverse input sources including URLs, browser history, bookmarks, and social media feeds.
Offers multiple interaction methods: CLI, REST API, Python API, and a self-hosted web interface.
Automatically extracts embedded media, articles, and source code from archived pages.
Ensures data control and long-term readability of archived information.

Quick Start

pip install archivebox

Detailed Introduction

ArchiveBox is an open-source, self-hosted solution for digital preservation, addressing the inherent ephemerality of web content. It allows individuals and organizations to capture and store snapshots of websites, social media, and other online resources. By saving data in universally readable formats like HTML, PDF, and WARC, ArchiveBox ensures that archived information remains accessible and usable for decades. It provides robust control over personal data, offering a comprehensive toolkit for safeguarding digital heritage against link rot and content degradation.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.