Local Lip-Sync Video App

Reverse engineered prompt

Build me a local app from this repo that lets me take a face video and a separate audio file and generate a lip synced output video with good quality and low delay. I want it to feel easy to use, ideally with a simple browser UI where I can upload a video, upload or record audio, run inference, and download the result. Please wire up the model weights download, environment setup, ffmpeg setup, and any needed preprocessing so I can actually run it without hunting through docs.

It should support normal talking head videos and work with different languages if the model already supports that. If there is a way to adjust the face crop or mouth area for better results, include that in the UI too. Please use the newer 1.5 model and keep the training scripts and configs usable in case I want to fine tune later, but the main goal is working inference. Add a clear README with exact run steps and a simple test example. If anything is unclear, check the latest project docs online.

Want more depth? Deep Reverse

TMElyralab/MuseTalk — reverse-engineered prompt

Reverse engineered prompt