Windows Desktop App for Local Text-to-Speech Studio

Reverse engineered prompt

Build me a Windows desktop app for local text to speech using the Kokoro 82M voice model. I want it to feel like a simple studio, not a command line tool.

It should run offline after the first model download, use an NVIDIA GPU when available and fall back to CPU if not. Include a clean PySide style interface with two main modes, a quick scratchpad where I can type text, pick a voice, and synthesize audio, and an audiobook mode where I can load TXT or EPUB files, split them into manageable lines, preview individual segments, assign different voices per line, and render everything into one audio file.

Please include voice mixing so I can blend two voices, plus controls for speed, pitch, sample rate, and seed. I also want drag and drop loading, a built in audio player with waveform preview, project saving and loading with a custom project file, and a simple Windows run script that sets up the environment and installs the right GPU support if possible.

Want more depth? Deep Reverse

AcTePuKc/Kokoro-Local-Gui — reverse-engineered prompt

Reverse engineered prompt