Python Tool for Optimizing Natural Language Skill Documents

Reverse engineered prompt

Build me a Python tool called SkillOpt that can train and improve reusable natural language “skill” documents for frozen LLM agents without changing the model weights.

I want to give it a benchmark config, a data split folder, API credentials, an optimizer model, and a target model. It should run training over epochs, collect scored attempts, propose small edits to one skill document, only keep edits that improve validation results, and save the best version as best_skill.md. It should also support resuming runs, keeping training history, step artifacts, skill snapshots, and a runtime state folder.

Include an eval only mode where I can point to an existing skill markdown file and test it on train, validation, test, or all splits. Support Azure OpenAI and OpenAI compatible endpoints, plus Claude style API credentials if needed.

Please make the command line experience clear, include example configs for benchmarks like SearchQA, LiveMathematicianBench, and ALFWorld, and add simple docs so a researcher can install it, train a skill, evaluate a skill, and understand the output files.

Want more depth? Deep Reverse

microsoft/SkillOpt — reverse-engineered prompt

Reverse engineered prompt