Polished Linux Speech-to-Text Desktop Application Design

Reverse engineered prompt

Build me a polished Linux desktop app for local speech to text using whisper.cpp. I want it to feel premium and modern, with a dark cyber neon glassy interface, smooth animations, and a simple dashboard that makes the whole workflow obvious.

The app should let me choose or build the whisper.cpp backend, pick CPU, Vulkan, CUDA, or OpenVINO if available, select a local GGML model, set language, threads, translate to English, and choose output formats like TXT, SRT, and VTT. It should support drag and drop for audio or video files, show a batch queue with durations, let me remove or clear files, and transcribe them one after another. If the user drops a video, extract the audio locally with ffmpeg. Also include a microphone recording screen with a live waveform and a button to transcribe the recording.

Everything should run locally for privacy, with a Rust and Tauri backend controlling the native tools and a HTML, CSS, JavaScript frontend. Package it for Linux as deb, rpm, AppImage, and Arch where possible.

Want more depth? Deep Reverse

AtomicError/whisper-desktop — reverse-engineered prompt

Reverse engineered prompt