You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I asked for an issue with both Whisper and WhisperX that kills the readability of the subtitle whenever you put the length limits. Fullstops appearing mid sentences, segments splitting people's names. Random sentence cuts that felt unnatural.
To deal with this i found this spacy python file (credits to Glenn Langford) which can do all of the above for us while also putting length limits. It basically redeems the readability of the subtitle no matter your character or max lines value. The script shortens your subtitles while maintaining the natural flow by splitting the subtitles at punctuation and conjunctions and natural words. It takes care of not splitting at nouns and people's names and city names.
But there's a problem this script only works with whisper. When i tried running it on WhisperX JSON output it straight up gave me errors. I understand this is because of the structural differences in WhisperX and Whisper. But i really wanna run this script with WhisperX as timestamps of original whisper give me headaches.
If you want to run this script with original Whisper do this.
I asked for an issue with both Whisper and WhisperX that kills the readability of the subtitle whenever you put the length limits. Fullstops appearing mid sentences, segments splitting people's names. Random sentence cuts that felt unnatural.
To deal with this i found this spacy python file (credits to Glenn Langford) which can do all of the above for us while also putting length limits. It basically redeems the readability of the subtitle no matter your character or max lines value. The script shortens your subtitles while maintaining the natural flow by splitting the subtitles at punctuation and conjunctions and natural words. It takes care of not splitting at nouns and people's names and city names.
But there's a problem this script only works with whisper. When i tried running it on WhisperX JSON output it straight up gave me errors. I understand this is because of the structural differences in WhisperX and Whisper. But i really wanna run this script with WhisperX as timestamps of original whisper give me headaches.
If you want to run this script with original Whisper do this.
Install Python
pip install -U pip setuptools wheel
pip install -U 'spacy[cuda11x]'
python -m spacy download en_core_web_trf
Run this python script with JSON in same directory
(https://gist.githubusercontent.com/glangford/a2b24ffd92c832c60e1b1b49da1a8b27/raw/c588b33d2598f7ef92a26edf3dc314d119a70602/subwisp.py)
python3 -m subwisp input.json >output.srt
The text was updated successfully, but these errors were encountered: