Python Text-to-Speech Web Application Design

Reverse engineered prompt

Build me a Python text to speech app based on F5 TTS. I want a simple web page where I can upload a short reference voice clip, paste or type the matching transcript if I have it, then enter new text and generate natural speech in that same voice. If I leave the transcript blank, try to auto detect it if the model supports that.

Please include a clean interface for basic generation, longer text split into chunks, and an option for multiple voices or speakers in one script. Let me preview the audio, download the result as a wav file, and see clear progress and error messages while it runs. Also include a simple command line version for the same workflow.

Set it up so the model downloads what it needs, uses GPU when available, and still gives helpful instructions if dependencies like FFmpeg or PyTorch are missing. Add a short README with install and run steps. Look up current docs online if you need to.

Want more depth? Deep Reverse

SWivid/F5-TTS — reverse-engineered prompt

Reverse engineered prompt