agenta-ai/agenta — reverse-engineered prompt

Reverse engineered prompt

Build me an open source LLM app ops platform that I can run locally and use through a browser.

I want one place where a team can test prompts in a playground, compare prompt versions side by side on saved test cases, and manage prompts and settings across environments. It should support trying different language models and also let people bring their own model setup if needed. I also want an evaluation area where we can create test sets from past runs or uploaded CSV files, run automated judges or custom evaluators, and collect human feedback and annotations.

Please also include production monitoring so we can see traces, latency, usage, and cost for LLM calls, and make it easy to debug workflows. A clean web UI is important, but there should also be API access for engineers.

Make it feel like a real self hosted product, with sensible defaults and a straightforward local setup, ideally something I can start and open at localhost. If anything is unclear, check the current docs online and fill in the gaps with practical choices.

Want more depth? Deep Reverse