Local Python RAG Assistant with Web Search Integration

Reverse engineered prompt

Build me a local Python RAG assistant that runs on a normal laptop CPU with no cloud services and no GPU. I want to ask it questions in a chat style, and it should decide when it can answer from the local model and when it needs to search the web for fresh information.

Use Ollama with Gemma locally, DuckDuckGo for web results, and a simple LangGraph flow with conversation summary, smarter search query creation, web search routing, retrieval, and final answering. For the search results, split pages into chunks, rank them with both embeddings and keyword search, remove repeated or similar chunks, then combine the rankings before sending context to the model.

Please keep the code understandable and debuggable. I want to see where each answer came from, including source URLs and chunk scoring details when possible. Add logging, retries, and a small demo query so I can run it locally and see the full flow working. Look up current docs online if you need to.

Want more depth? Deep Reverse

ranji-t/Gemma4-langgraph-Local — reverse-engineered prompt

Reverse engineered prompt