CLI Tool, Utility Library, Document Converter
106.6k 2026-04-13
microsoft/markitdown
A Python utility for converting various file formats and office documents into structured Markdown, optimized for LLM consumption and text analysis.
Core Features
Converts diverse file types (PDF, Word, Excel, PowerPoint, Images, Audio, HTML, etc.) to Markdown.
Preserves important document structure (headings, lists, tables, links) in Markdown output.
Optimized for Large Language Model (LLM) applications and text analysis pipelines.
Supports optional dependencies for specific file formats, allowing flexible installations.
Offers both a command-line interface (CLI) and a Python API for integration.
Quick Start
pip install 'markitdown[all]'Detailed Introduction
MarkItDown is a lightweight Python utility designed to bridge the gap between diverse document formats and large language models (LLMs). It efficiently converts various files, including office documents, PDFs, images, and audio, into structured Markdown. Unlike general-purpose converters, its primary focus is on preserving essential document structure in a token-efficient Markdown format, making it ideal for LLM training, prompt engineering, and advanced text analysis pipelines. It aims to provide a robust, temporary-file-free conversion process, enhancing data preparation for AI applications.