GalaxyCong/EmoDubber — reverse-engineered prompt
Reverse engineered prompt
Build me a Python 3.10 research project for EmoDubber, a movie dubbing tool that can generate a new spoken audio waveform from silent lip motion, face features, a text script, and a reference speaker audio feature.
I want it to support the basic inference flow using provided checkpoints and a 16 kHz vocoder, with clear commands for the Chem and GRID datasets and the two main inference settings. Make the setup easy to follow, including environment creation, requirements install, downloading checkpoints, vocoder, and processed features, then running one command to produce generated audio.
Also include training scripts for Chem and GRID using the provided pretrained TTS checkpoint and dataset config files. Please keep the emotion control area present but label it as not finished, since only the classifier checkpoints are available right now. Add a simple README that explains everything in normal language, and look up current docs online if needed.
Want more depth? Deep Reverse