doc-bon/markitdown — reverse-engineered prompt
Reverse engineered prompt
Build me a lightweight Python tool called MarkItDown that turns common documents and media into clean Markdown so the text can be used by AI tools and search pipelines.
I want it to work from the command line and also as a small Python library. A user should be able to give it a PDF, Word file, PowerPoint, Excel sheet, image, audio file, HTML page, CSV, JSON, XML, ZIP, YouTube URL, or EPUB and get back Markdown that keeps useful structure like headings, lists, tables, links, and readable text. It doesn’t need to make perfect pretty documents, the goal is useful Markdown for analysis.
Please include sensible optional support for heavier features like OCR, audio transcription, YouTube transcripts, and cloud document understanding without forcing every dependency on everyone. Also include a simple plugin system that is off by default but can be enabled. Make installation and basic usage clear, with examples for converting a file, piping input, and saving output.
Want more depth? Deep Reverse