Local Voice Generation App with F5 TTS Integration

Reverse engineered prompt

Build me a simple local voice generation app using F5 TTS. I want to open it in my browser, upload a short reference audio clip, optionally paste the transcript of that clip, type the text I want spoken, and get a downloadable audio file that sounds like the reference voice.

Please make it easy for a normal person to run. Handle model download, audio loading, and FFmpeg checks as much as possible. If a GPU is available, use it, but don’t make the app crash if it has to run more slowly. Include the basic text to speech flow, chunked generation for longer text, and a way to create a story with multiple speakers or styles if I provide separate reference voices.

Also include a simple command line option for the same workflow, plus clear setup and run instructions. Look up the current F5 TTS docs online if you need to.

Want more depth? Deep Reverse

swivid/f5-tts — reverse-engineered prompt

Reverse engineered prompt