LLM Jailbreak and Safety Research Repository Template

Reverse engineered prompt

Build me a GitHub repo that works like an awesome list for LLM jailbreak and safety research. I want it to be a clean, well organized README that collects papers, code links, datasets, evaluations, and analysis links in one place for researchers.

Organize it into clear sections like jailbreak attacks, attacks on reasoning models, black box attacks, white box attacks, multi turn attacks, RAG attacks, multimodal attacks, jailbreak defenses, guard models, moderation APIs, evaluation, analysis, and applications. Each paper should sit in a simple table with date, title, venue, paper link, and code link if there is one.

Please include a short intro, bookmarks at the top, a citation section for the related GuardReasoner and FlipAttack papers, a friendly note asking people to star the repo, and a contribution note inviting PRs and issues. Keep the tone academic but readable, and make the README easy to update over time.

Want more depth? Deep Reverse

yueliu1999/Awesome-Jailbreak-on-LLMs — reverse-engineered prompt

Reverse engineered prompt