bytedance/ui-tars — reverse-engineered prompt

Reverse engineered prompt

GitHub

Build me a working Python project around this UI TARS repo so I can give it a screenshot and a simple instruction like click the search box or open settings, and have it return structured actions and optional real mouse and keyboard automation code.

I want an easy way to run inference with the open model or a hosted endpoint, then post process the response correctly, especially the coordinate scaling, so clicks land in the right place on my actual screen. Please include the desktop, mobile, and action only prompt modes, a simple command line demo, and clear setup notes so I can test it quickly. If possible, add a small example loop that takes a screenshot, sends the task, parses the answer, and prints or executes the next step safely.

Keep it close to what this repo already supports, and look up current docs online if you need to.

Want more depth? Deep Reverse