brendanhogan/loophole — reverse-engineered prompt

Reverse engineered prompt

GitHub

Build me a Python command line app called Loophole. I want it to help people test rules, values, legal text, and chatbot prompts by having AI agents try to break them.

The main flow should let me write moral principles in normal language, turn them into a more formal rule set, then have one agent find things that are allowed but feel wrong, and another find things that are banned but should be okay. A judge agent should try to patch the rules, keep past decisions as precedent, and ask the user when there is a real conflict.

Also add a chatbot prompt testing mode where it generates or improves a system prompt, runs jailbreak and false refusal attacks, and shows the actual conversations. Add a reverse mode where I paste a legal document or policy and it extracts the hidden principles, then finds contradictions and gaps.

Include saved sessions, resume, list, and an HTML report. Use Anthropic by default, with config for other models if practical.

Want more depth? Deep Reverse