OpenMOSS/MOSS-TTS — reverse-engineered prompt

Reverse engineered prompt

GitHub

Build me an open source Python tool for high quality speech and sound generation, like the MOSS TTS family. I want to type text and get natural sounding audio, with support for long narration, expressive voices, different languages, and optional pauses written in the text like pause timing.

It should also let me upload a short voice sample so I can clone the style for a new line, and it should support simple multi speaker dialogue where each speaker can have a different voice. Include a sound effect mode where I can describe an effect in text and export the generated audio.

Please make it usable from the command line, and also add a simple local demo interface where I can enter text, choose the mode, upload reference audio, listen to the result, and download the file. If real time streaming is available, include a basic streaming option too. Look up the current model docs online if needed and make the setup steps clear.

Want more depth? Deep Reverse