avaturn-live/avtr-1 — reverse-engineered prompt

Reverse engineered prompt

GitHub

Build me a working AVTR 1 demo from this repo. I want to give it a portrait avatar and an audio file, then get back a video where the avatar speaks with matching mouth movement. It should also handle a two person conversation, where one audio track is the avatar talking and the other is the person they’re listening to, so the avatar shows natural listening motion too.

Please make the setup smooth on a Linux machine with an NVIDIA GPU, including downloading the model weights, building the local GPU engines, and using a clear storage folder for artifacts. Add a simple interactive streaming demo launcher and an offline video generator with options for avatar, background, duration, output file, speech audio, and listen audio.

Include friendly setup notes and checks for missing CUDA, TensorRT, Hugging Face login, ffmpeg, and GPU issues. Look up current docs online if you need to.

Want more depth? Deep Reverse