Helicone/ai-gateway — reverse-engineered prompt

Reverse engineered prompt

Build me a fast, lightweight AI gateway that sits between my app and multiple LLM providers. I want one simple endpoint that works with the usual OpenAI style client code, so I can send the same kind of chat completion request and choose models like OpenAI, Anthropic, Google, Bedrock, and other providers without rewriting my app each time.

It should be production friendly and feel easy to self host. Please include smart routing so requests can go to the fastest, cheapest, or most reliable provider, plus fallbacks if one provider has issues. I also want response caching to cut cost and latency, rate limits so usage does not run away, and built in observability so I can inspect logs, metrics, traces, and request performance.

Make it easy to run locally and in Docker, with environment based provider keys and a simple config flow for custom routing rules. Include a small demo or example app showing how a normal OpenAI client would point to the gateway instead of a single provider. If you need details, look up the current docs online.

Want more depth? Deep Reverse