Self-Hosted Kokoro Text-to-Speech Service

Reverse engineered prompt

I want a self hosted text to speech service based on Kokoro that I can run locally, ideally with Docker, and it should also be able to run directly on my machine if needed. Make it expose an OpenAI compatible speech API so other tools can point at it easily, plus a simple web page where I can type text, pick voices, and hear or download the result.

Please support the languages mentioned in the project, let me mix voices together, and include per word timestamps or caption output for generated audio. It would also be great to have phoneme tools, both turning text into phonemes and generating speech from phonemes.

Make it practical for different hardware, CPU only, NVIDIA GPU, AMD GPU if possible, and Apple Silicon when run natively. Include the usual API docs page so I can test things in the browser. If there are example requests for Python or simple HTTP calls, add those too. Look up current docs online if you need to.

Want more depth? Deep Reverse

remsky/Kokoro-FastAPI — reverse-engineered prompt

Reverse engineered prompt