open-mmlab/Multimodal-GPT — reverse-engineered prompt
Reverse engineered prompt
I want a local multi modal chatbot I can run from this repo. It should let me upload an image and chat about it in plain language, answering things like what is in the picture, reading text from images, doing simple visual reasoning, and also handling normal text only chat. Please set up the demo app so it launches cleanly in the browser and uses the provided model flow with LLaMA, OpenFlamingo, and the released LoRA weights if they are available.
Also make the project easy to use for later training, so there is a clear path to fine tune it on mixed image and text instruction data like captioning, VQA, OCR, dialogue, and language only instruction sets. If anything is missing, add sensible setup steps, validation, and brief run instructions so I can actually get it working without digging through the repo. You can look up current docs online if needed.
Want more depth? Deep Reverse