UltimateRAG Backend Playground Architecture Overview

Reverse engineered prompt

Build me a production style RAG backend playground called UltimateRAG.

I want to upload large documents from a URL, have the system process them in the background, split them into chunks, create searchable embeddings, and then let users ask questions against those documents. The answers should stream back live token by token instead of waiting for the whole response.

Please set it up as a Python FastAPI service with background workers so the API doesn’t block while heavy AI work is running. Use Kafka as the job queue, Qdrant for vector search, Redis for upload state and caching, MongoDB for request logs, and Docker Compose so I can run everything locally. It should support document upload, streaming question answering, conversation history, retrying when context is weak, and deleting a document.

Use OpenAI for generation, include a simple env file example, and make sure it can run with or without a GPU, with reranking slower on CPU. Include clear curl examples so I can test upload, stream a question, continue a conversation, and delete a document.

Want more depth? Deep Reverse

ozgurkaplanturgut/UltimateRAG — reverse-engineered prompt

Reverse engineered prompt