Qwen AgentWorld Local Inference and Evaluation Setup

Reverse engineered prompt

Build me a usable version of this Qwen AgentWorld repo in Python. I want something I can run locally that downloads the released Qwen AgentWorld 35B model from Hugging Face, or ModelScope if needed, starts a local server for inference, and gives me a simple way to test prompts against it.

I also want the evaluation side wired up so I can run AgentWorldBench across the seven domains and save readable results and logs. Please make the setup feel smooth, with a clear README, one command to start the model, one command to run evals, sensible defaults, and environment checks so I know what is missing. If a simple browser demo or terminal chat is easy, add that too, but keep the main goal on serving the model and benchmarking it. Feel free to look up current docs online for the serving options if you need to.

Want more depth? Deep Reverse

QwenLM/Qwen-AgentWorld — reverse-engineered prompt

Reverse engineered prompt