baidu/Unlimited-OCR — reverse-engineered prompt

Reverse engineered prompt

Build me a simple Python tool around Baidu Unlimited OCR that I can run on an NVIDIA GPU to extract and parse text from a single image, a folder of images, or a multi page PDF. I want it to feel like one shot long document OCR, not just page by page text grabbing, and it should save the results into an output folder automatically.

Please make the common path easy. I should be able to point it at an image or PDF and get parsed output without manual setup beyond installing the requirements. It should convert PDF pages to images when needed, support batch processing with some concurrency, and work well for long documents. If there are two useful image modes, keep both and make them easy to switch with a simple option.

If possible, also include an optional local server mode with a simple API for streaming results, but the main thing is a working command line experience. Use the model from Hugging Face by default, and look up the current docs online if you need to fill in any missing details.

Want more depth? Deep Reverse