-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved detect silence #745
base: master
Are you sure you want to change the base?
Conversation
detect_silence finds separate slices of silence and in a last step combines subsequent silent slices into ranges of continuous silence. The added tests specifically ensure the correct function of this combination step.
Previously, detect_silence would collect all slices of min_silence_len in a list, then processed that list to merge subsequent slices into continuous silent ranges. This change performs the merging immediately when silence is detected for a slice, eliminating the need for a second pass over and the memory overhead associated with the internal list of silent slices.
Using numpy to compute RMS for silence detection to reduce redundant computation (and benefit from numpys highly optimized implementation) compared to previous implementation of detect_silence. Some caveats: - adds numpy as new dependency - previously RMS values where rounded down to the next integer; this is now not the case anymore, resulting in borders of silence ranges to possibly vary slightly compared to previous implementation
29837f7
to
61a9459
Compare
|
||
from .utils import db_to_float | ||
|
||
|
||
def detect_silence(audio_segment, min_silence_len=1000, silence_thresh=-16, seek_step=1): | ||
def _convert_to_numpy(audio_segment): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a property to AudioSegment
?
Something like:
@property
def as_numpy(self):
Hey, sorry I saw your responses a bit late just now. Could you perhaps provide a link to the video in question so that I can have a look? |
I believe I was processing the audio from this video: BTW: I've used ffmpeg eventually. Super fast and accurate. |
Overview
Reimplementation of
detect_silence
: Previously this function would invoke RMS computations independently for each slice ofmin_silence_len
in the given audio segment, which leads to a lot of recomputing of similar values of theseek_step
is small. The new implementation avoids this, resulting in much smaller detection time.Caveats
This introduces numpy as a new dependency. This is for two reasons:
While implementing this without numpy would be possible, it would likely not see the same amount of performance increase and easy of implementation.
detect_silence
previously used audioop to compute RMS values of slices, which rounds the computed value down to the nearest integers - the silence threshold is not rounded. This is no longer the case in the new implementation, resulting in some slices that were previously detected as silent to not be so anymore. In practice this means that detected silent regions might be slightly shorter than before (by usually one or twoseek_step
s).Performance results
%timeit
results on audio segments consisting mostly of silence20 minute segment
~114 minute segment