CoreLLM Server Architecture for Scalable AI Chatbot

Reverse engineered prompt

Build me a production ready AI chatbot backend called CoreLLM Server.

I want it to expose a simple streaming chat endpoint where a client sends a prompt, user id, and session id, then gets the answer back token by token in real time. It should remember short session history so follow up questions work.

Please make the backend scalable, not just a toy demo. The API should stay responsive while workers handle the model calls in the background. Use a message queue style flow, persistent logging for requests and sessions, and caching or idempotency protection so repeated requests are handled safely.

It should support both OpenAI models and local vLLM models, with an easy way to switch providers through environment settings. Package everything with Docker Compose so I can run the API, worker, Kafka, MongoDB, Redis, and model provider setup locally. Include an example env file and a curl command to test streaming. Look up current docs online if needed.

Want more depth? Deep Reverse

ozgurkaplanturgut/CoreLLM — reverse-engineered prompt

Reverse engineered prompt