ahmeabd/italia-corpus — reverse-engineered prompt
Reverse engineered prompt
Build me a public GitHub project that creates and publishes a clean corpus of Italian legislation from Normattiva.
I want a Python based pipeline that can download the public collections, convert every legal act into a separate Markdown file, keep the text in UTF 8, and organize the output into clear folders by collection. It should be useful for people doing legal search, RAG, model training, research, or citation analysis.
Please make the updater safe to run every day. If nothing changed, it should not create noise. If the law text changed, it should commit only the real differences so people can track legal updates with normal git history.
Include a clear README in Italian and English that explains what the dataset is, where the data comes from, how to clone it, how to index the Markdown files, how updates work, and the legal notice that this is public data but not official legal advice. Look up current Normattiva docs online if needed.
Want more depth? Deep Reverse