FWG-Network/VoxCPM — reverse-engineered prompt
Reverse engineered prompt
Build me a local VoxCPM2 speech generator that feels easy to use.
I want a simple web demo where I can type text in any supported language and get natural speech back as playable and downloadable 48 kHz audio. Let me create a voice just by writing a description like age, gender, emotion, tone, and speed. Also let me upload a short voice clip to clone someone’s voice, with an optional transcript so it can match the reference more closely. If I add style instructions, it should keep the voice but change the delivery.
Please also make a basic Python API example and command line version so I can generate audio without the web page. Handle model loading from Hugging Face or a local folder, show useful errors if CUDA or Python versions are wrong, and include a few example scripts and tests. Look up the current VoxCPM docs online if you need to.
Want more depth? Deep Reverse