MinerU Open API Ecosystem Toolkit

Reverse engineered prompt

Build me an ecosystem toolkit around the MinerU Open API so people can turn PDFs, Word files, PowerPoints, images, and web pages into clean Markdown or JSON with as little setup as possible.

I want a simple command line tool for quick parsing, including a fast no login mode for previews and a fuller token based mode for better quality, OCR, tables, formulas, multi language support, batch jobs, and web page crawling. I also want SDKs people can drop into Python, Go, and TypeScript projects, plus ready to use integrations for LangChain and LlamaIndex so this works well for RAG and document pipelines. Please include an MCP server and a few agent friendly skills so tools like Cursor or Claude style agents can parse documents on demand.

Make it feel polished and practical, with clear examples, sensible defaults, and output that follows reading order and handles scanned docs well. If anything is unclear, check the current MinerU API docs online and wire it up the modern way.

Want more depth? Deep Reverse

opendatalab/MinerU-Ecosystem — reverse-engineered prompt

Reverse engineered prompt