-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate ShortTermFeatures for an audio signal of 0.1 seconds of a mono, 16KHz and PCM wave file #396
Comments
Hi! I’d like to help with the feature extraction issue you're facing. To do so, I need a bit more info:
This will help me understand the problem better and find a solution for you. Thanks! |
Hello again, I conducted a small experiment and was able to replicate the issue you described. It appears that there isn't sufficient information to compute chroma features. To address this and ensure the code functions (even if it means the chroma feature values are zeroes), I've implemented a fix. I plan to submit a pull request for this fix, pending the library author's approval. For testing, I took the following approach (I recommend using fractions of the sampling rate, Fs, rather than sample counts, but the choice is yours. In my tests, I used an Fs of 44100): from pyAudioAnalysis import ShortTermFeatures
from pyAudioAnalysis import audioBasicIO
def extract_features(frac_second, samples_features, Fs, x):
samples_frac_second = frac_second * Fs
samples_windows = samples_features // samples_frac_second
F, f_names = ShortTermFeatures.feature_extraction(x[:samples_features], Fs, frac_second*Fs, frac_second*Fs,
deltas=False)
print(f"In {frac_second} there are {samples_frac_second} samples")
print(f"In {samples_features} there are {samples_windows} windows")
print(len(F[0]))
print(len(f_names))
return F, f_names
def issue_396():
# Use a breakpoint in the code line below to debug your script.
[Fs, x] = audioBasicIO.read_audio_file('./audio/limbo_mono.wav')
for frac_second in [0.1, 0.05, 0.025, 0.01, 0.0036, 0.0018]:
print(f"Experiment with {frac_second} frac of second")
F, f_names = extract_features(frac_second, 16000, Fs, x)
# Press the green button in the gutter to run the script.
if __name__ == '__main__':
issue_396() Output generated: Experiment with 0.1 frac of second
In 0.1 there are 4410.0 samples
In 16000 there are 3.0 windows
3
34
Experiment with 0.05 frac of second
In 0.05 there are 2205.0 samples
In 16000 there are 7.0 windows
7
34
Experiment with 0.025 frac of second
In 0.025 there are 1102.5 samples
In 16000 there are 14.0 windows
14
34
Experiment with 0.01 frac of second
In 0.01 there are 441.0 samples
In 16000 there are 36.0 windows
36
34
Experiment with 0.0036 frac of second
In 0.0036 there are 158.76 samples
In 16000 there are 100.0 windows
101
34
Experiment with 0.0018 frac of second
In 0.0018 there are 79.38 samples
In 16000 there are 201.0 windows
202
34 Fix: In the method chroma_features inside of the file ShortTermFeatures.py adapt the following part like this: else:
I = np.nonzero(num_chroma > num_chroma.shape[0])[0][0]
C = np.zeros((num_chroma.shape[0],))
if I > 1:
# If I <= 1 there are no chroma features that can be extracted
C[num_chroma[0:I - 1]] = spec[num_chroma[0:I - 1]]
C /= num_freqs_per_chroma
final_matrix = np.zeros((12, 1)) I'm submitting a pull request (https://github.com/Caparrini/pyAudioAnalysis), although I'm uncertain if it aligns with the expected behavior. I've uploaded it here for your convenience, should you prefer this over modifying your local library directly. Please choose whichever option suits you best. Best regards, |
Hello,
First of all, thank you for providing a great library.
I would like to process a signal of 0.1 seconds (1600) for short-term features.
It's throwing an error. When I try little bigger values
Now no errors but there are <= 8 points in features which I believe is quite low for uniqueness. Using python_speech_features, I'm successfully able to generate 20 points but I think the resulting mfcc is not unique in terms of noise.
How do you generate features with more points (20-40), and a good amount of noise for a short signal?
The text was updated successfully, but these errors were encountered: