china-qijizhifeng/agentic-harness-engineering — reverse-engineered prompt
Reverse engineered prompt
Build me a Python project that can automatically improve the harness around a coding agent over multiple rounds, without changing the underlying model itself. I want it to run an evaluate, analyze, improve loop, and evolve things like system prompts, tool descriptions, tool implementations, middleware, skills, sub agents, and long term memory based on what actually happened in prior runs.
It should keep good observability, so every harness change is easy to track and audit, raw execution traces can be turned into shorter debug reports, and the system can use those reports to propose evidence based edits for the next iteration. Please make it runnable from experiment configs, support background launches, and use sandboxed task rollouts with prebuilt templates for a dataset of coding tasks.
Set up the repo so I can install it, copy an env file, add my LLM, sandbox, and search API keys, then launch a single experiment or a batch. Optional observability hooks are nice too. Look up current docs online if you need to.
Want more depth? Deep Reverse