Local Web Demo for Qwen3 VL Multimodal Chatbot

Reverse engineered prompt

Build me a simple local web demo for Qwen3 VL so I can chat with a multimodal model from my browser.

I want to upload an image or video, ask normal questions about it, and get clear answers. It should also work for OCR style tasks, like reading text from screenshots or documents, and for visual reasoning, like explaining what is happening in a scene. Let me choose a Qwen3 VL model from Hugging Face, with a smaller model as the default so it can run on a normal GPU if possible.

Please set up the install steps, a clean chat interface, file upload, example prompts, and basic error messages if the model or GPU setup is missing. If there is already demo code in this repo, use it and make it easy to run. Add a short README section that explains how to start it locally and how to change models. Look up current docs online if you need to.

Want more depth? Deep Reverse

qwenlm/qwen3-vl — reverse-engineered prompt

Reverse engineered prompt