NVlabs/eagle — reverse-engineered prompt
Reverse engineered prompt
I want this repo turned into a simple working demo I can run locally.
Please make it easy to try the Eagle models for real world vision tasks. I should be able to upload an image or a video, ask a question in plain English, and get a useful answer back. For images, I want general understanding plus the option to find or point to things in the scene with boxes. For video, I want longer context behavior like describing what happens over time, splitting it into sections, and giving captions with timestamps when that is supported.
If there are multiple model versions here, give me a clean way to switch between them, especially Eagle 2.5 and LocateAnything. Use the existing pretrained models instead of training anything. Add a small web UI, a few example prompts, and clear setup instructions so a normal person can get it running on a GPU machine. If something is unclear, check the current docs online and wire up the most practical defaults.
Want more depth? Deep Reverse