sov6f/Object-Detection-for-Visually-Impaired — reverse-engineered prompt
Reverse engineered prompt
Build me a simple Python web app that helps a visually impaired person understand what’s around them.
The app should open in the browser, use the device camera to take or upload a picture, detect objects in the image with a YOLO model, then send the detected objects and their positions to Google Gemini to create a clear natural language description of the scene. After that, convert the description to speech and play it back so the user can hear it.
Please make the interface very simple and accessible, with a clear capture button, detected image preview if useful, the written description, and an audio player. It should read the Gemini API key from a .env file. Use Python and Streamlit since this is meant to run locally with streamlit run app.py.
Keep the code clean and split helper logic into sensible files for detection, Gemini description, and text to speech. Include requirements and basic run instructions.
Want more depth? Deep Reverse