Automated Prompt Improvement Library for LLMs

Reverse engineered prompt

Build me a Python library that can improve prompts and other text based settings automatically. I want to give it a starting prompt, code snippet, config, or agent description, plus a way to score the result, and have it try better versions over several rounds.

The key idea should be that an LLM reads the full results from each run, including mistakes, logs, error messages, and reasoning traces, then explains what went wrong and creates a better candidate. Keep track of multiple good candidates instead of only one winner, so different versions can be strong on different examples.

Please include a simple optimize function for normal prompt optimization, and a more general optimize anything style function for arbitrary text artifacts. Make it easy to plug into other systems with adapters, including examples for basic prompts and AI pipelines like DSPy. Add clear docs, runnable examples, and tests so someone can install it and quickly see an optimized prompt improve on a small benchmark.

Want more depth? Deep Reverse

gepa-ai/gepa — reverse-engineered prompt

Reverse engineered prompt