Open Source Toolkit for Verified Training Data Generation

Reverse engineered prompt

Build me an open source toolkit called CUA Gym for making verified training data for computer use agents. I want a Python pipeline where I can give it a topic, like LibreOffice Calc formatting tasks, and it creates task instructions, starting environment states, golden states, and a reward function that can automatically check if the task was solved.

Please include an orchestrator that runs the generator and reward writer, tests that the reward gives 1 for the golden state and 0 for the initial state, then saves the verified task bundle. Add a majority vote filter using an LLM so weak or ambiguous tasks can be rejected.

Also include a hub of local mock web apps that feel like real products, each with a simple session based state API for reset, inspect, update, upload, and diff. Make it easy to run one locally and use it in generated tasks.

Add clear setup docs, example commands, env file template for API keys, dataset download instructions from Hugging Face, and a few scripts so I can run the whole flow end to end.

Want more depth? Deep Reverse

xlang-ai/CUA-Gym — reverse-engineered prompt

Reverse engineered prompt