Local Demo Setup for Qwen3 VL Model

Reverse engineered prompt

I want to turn this repo into a simple working local demo for Qwen3 VL. Please set it up so I can open a web page, choose a Qwen3 VL model, upload an image or video, type a question, and get a clear answer back.

Keep it beginner friendly. If a smaller model is best for testing on normal hardware, use that by default and make it easy to change later. The demo should handle common things the model is meant for, like reading text in images, describing scenes, answering questions about videos, and explaining visual details. Add clear setup steps so I know how to install everything, start the app, and where to put any model keys or settings.

Please reuse what already exists in the repo where possible, clean up any rough edges, and make sure it runs end to end. Look up current docs online if you need to.

Want more depth? Deep Reverse

QwenLM/Qwen3-VL — reverse-engineered prompt

Reverse engineered prompt