Asprise/java-ocr-api — reverse-engineered prompt

Reverse engineered prompt

GitHub

Build me a Java OCR library that can take common document files and images like JPG, PNG, TIFF, and PDF, then read the text from them accurately even when the scan quality is not great.

I want it to support plain text output, XML output with coordinates for the detected words or layout, and the ability to turn images into searchable PDF or PDF/A files. It should try to preserve the original document layout as much as possible.

Please include barcode recognition too, including QR codes, UPC, EAN, Code 128, Code 39, and interleaved 2 of 5. It should also expose table and cell information when it can detect tables, since I want to use it for data capture.

Make it usable from Java code with a simple API and include a small example showing how to OCR a file, read a barcode, and create a searchable PDF.

Want more depth? Deep Reverse