LegalBench RAG: Legal Document Retrieval Benchmark Tool

Reverse engineered prompt

Build me a Python project called LegalBench RAG that lets someone test a legal document retrieval system against contract understanding questions.

I want it to work like a benchmark tool, not a chatbot. It should use a corpus of legal text files and benchmark JSON files where each question has exact answer snippets with file paths and character ranges. The tool should run retrieval results against those ground truth snippets and report clear precision and recall, including exact character level matching.

Please include an easy setup flow with a virtual environment, dependency install, and a credentials file copied from an example for any needed API keys. It should support using a downloaded dataset, and also include a script to regenerate the benchmark from source datasets like ContractNLI, CUAD, MAUD, and PrivacyQA, with a note that generated data may differ because LLMs are involved.

Make the main benchmark runnable from the command line, keep the README practical, and look up current docs online if you need to.

Want more depth? Deep Reverse

zeroentropy-ai/legalbenchrag — reverse-engineered prompt

Reverse engineered prompt