vivinastase/voxseg — reverse-engineered prompt
Reverse engineered prompt
Build me a Python package and command line tool for voice activity detection that can train a speech versus non speech model from audio data and then create segment predictions for new recordings.
It should work with Kaldi style data folders, especially wav.scp, segments, and utt2spk, and it should include helper scripts to convert raw wav files plus simple annotation files into the expected format. I also want a script that converts the prediction output back into plain text files that can be opened in Audacity.
The training command should let me choose things like frame length, sample related feature settings, filter count, validation split, optional validation data, test data, and output directory. Save the trained model, a config json with the parameters, and logs. Also support running prediction or evaluation from a saved config if that is already part of the project.
Keep it installable with pip from the repo, include requirements, basic tests if practical, and make the README explain the input format and example commands clearly.
Want more depth? Deep Reverse