FWG-Network/VoxCPM — reverse-engineered prompt

Reverse engineered prompt

GitHub

Build me a local VoxCPM2 speech generator that feels easy to use.

I want a simple web demo where I can type text in any supported language and get natural speech back as playable and downloadable 48 kHz audio. Let me create a voice just by writing a description like age, gender, emotion, tone, and speed. Also let me upload a short voice clip to clone someone’s voice, with an optional transcript so it can match the reference more closely. If I add style instructions, it should keep the voice but change the delivery.

Please also make a basic Python API example and command line version so I can generate audio without the web page. Handle model loading from Hugging Face or a local folder, show useful errors if CUDA or Python versions are wrong, and include a few example scripts and tests. Look up the current VoxCPM docs online if you need to.

Want more depth? Deep Reverse