Local Web App for EchoMimicV2 Video Generation

Reverse engineered prompt

Build me a local web app for EchoMimicV2 that can turn a single person photo plus an audio clip into a realistic talking video with face, head, and upper body motion.

I want it to feel simple for a normal user. Open the app, upload a reference image, upload English or Chinese speech audio, choose basic settings like video length or quality if they already exist, then click generate and get an MP4 preview plus a saved file. If there is an accelerated version available, include an easy way to use it, but keep the regular option too.

Please set up the Python project cleanly, wire it to the existing inference scripts and configs, and include clear instructions for installing requirements and downloading the model weights from the official links. If the pose alignment demo is usable, add that as an optional advanced flow. Look up the current docs online if needed.

Want more depth? Deep Reverse

antgroup/echomimic_v2 — reverse-engineered prompt

Reverse engineered prompt