dswang2011/DocLLM — reverse-engineered prompt

Reverse engineered prompt

GitHub

Build me a simple working prototype of DocLLM from this paper repo. I want to upload a document like an invoice, receipt, form, report, or contract, have the app read the text and understand where things are on the page, then let me ask questions about it or extract important fields.

Since this repo only has the paper and overview image, please create a practical version that uses OCR text plus bounding box layout information, then sends that structured document context to a language model. It should support asking questions, pulling key values like totals and dates, basic document classification, and showing the extracted text boxes on top of the page so I can see what it used.

Please keep it easy to run locally, include a small demo page, clear setup instructions, and a sample document workflow. Don’t try to train a huge model from scratch. Make a clean usable implementation inspired by the DocLLM idea. Look up current docs online if you need to.

Want more depth? Deep Reverse