Python Toolkit for Standardizing Agent Training Datasets

Reverse engineered prompt

Build me a Python toolkit for turning messy agent training datasets into one clean standard format, then exporting them into formats that different AI agents can use for supervised fine tuning.

I want it to handle a simple three step flow, extract raw dataset records, convert them into a shared trajectory format with actions and observations, then convert that into agent specific training JSONL for OpenHands, SWE agent, and AgentLab. Include a clear schema layer with validation so bad data gets caught early, plus sample datasets and sample outputs so I can see what the expected files look like.

Please include command line scripts for running a full conversion or a small sample conversion, tests that check the standardized schema and final exports, and short docs that explain how to add a new dataset or a new agent format. Keep it practical and research friendly. If you need exact current conventions for these agents, look up the docs online.

Want more depth? Deep Reverse

gunwanth/agent-data-protocol — reverse-engineered prompt

Reverse engineered prompt