Zyphra/Zonos

Reverse engineered prompt

Build me an easy local text to speech app using Zonos. I want a simple web page where I can type text, upload a short voice sample, pick the language, and generate a natural sounding audio file in that voice.

Please include controls for things the model supports, like speaking speed, pitch, audio quality, and emotion such as happy, sad, angry, or fearful. It should let me play the result in the browser and download the generated wav file. If possible, also support using an audio prefix so I can guide the style, like whispering.

Set it up so it runs locally with the recommended Gradio interface and can also be started with Docker. Make the setup friendly for someone who is not a Python expert, with clear install and run commands for Linux or Mac. Use the open weight Zonos model from the repo, and look up current docs online if you need to.

Want more depth? Deep Reverse

Zyphra/Zonos — reverse-engineered prompt

Reverse engineered prompt