Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The FTB layer have an ability that denoising audio? #34

Open
AnkakeYakisobaTsuyudaku opened this issue Dec 24, 2024 · 7 comments
Open

Comments

@AnkakeYakisobaTsuyudaku
Copy link

AnkakeYakisobaTsuyudaku commented Dec 24, 2024

I want to have question about encode layer.
I think the FTB block in Encode layer will denoise audio signal according to the encode-decode layer and references. Is this statement is correct.

@AnkakeYakisobaTsuyudaku AnkakeYakisobaTsuyudaku changed the title The length of output data is different from input data using predict.py The FTB layer have an ability that denoising audio? Dec 24, 2024
@pokepress
Copy link

If you're asking whether the algorithm can be used to remove noise from a signal, based on my experience in my fork where I've been using it to process broadcast radio audio, yes, it can do that. That said, there are a lot of different kinds of noise (in terms of sources and characteristics), so could you be more specific as to which one(s) you're talking about.

@AnkakeYakisobaTsuyudaku
Copy link
Author

AnkakeYakisobaTsuyudaku commented Jan 10, 2025

Thanks for replying message and sorry late for reply your advice. I think the degradation pattern of historical sound is partly similar to be AM sound(include noise). I don’t know how radio sound is degraded, but historical audio is noisy(especially, including clicknoise) and narrow band width(about 100Hz to 3kHz). By the way, I have one question about FTB block in this program. In this network (that is not rearranged by you) default boolean datatype of freq_attn (written in aero.py) is false, and FTB is probably not working according to code(line 33 and line 93). Is this opinion true?

@AnkakeYakisobaTsuyudaku
Copy link
Author

I read your code and md file forked from this repository. I explained in recent comment too, I think the character of degradation in historical audio is similar to AM Radio sound. And according to the README, you succeeded the denoise, especially distorted noise in low frequency band, and band width extension up to 16kHz. What did you do to your train data in preprocess?
Thanks for reading this comment.

@pokepress
Copy link

If you're asking what I did to generate the test data for radio, I bought some personal AM/FM transmitters, then used them to transmit audio to a variety of radios. I captured both the raw and radio audio to the same recording device simultaneously (this keeps them closely synchronized so you can align them to the sample later and not worry about them drifting apart). I got the actual source audio from the Free Music Archive, Project Gutenberg, and some self-produced audio, where I added a set of tones to the start of each track:
image
which I burned to a CD, and played the CD through a mixer. Here's a quick & dirty diagram:
image

@AnkakeYakisobaTsuyudaku
Copy link
Author

Thank you for explaining the pre-process. I understand the method .
But I wonder another question about restoring audio like AM.
This type of audio have to be compressed dynamic range because we can hear the high-tone sound(ex: clarinet).
I think the non-linear effect (in this case, compression) is not able (or difficult) to improve completely (in this case, restore the dynamics).
How did you train (ex, training paramaters, insert different layer or process)?
I'm going to vary the intensity of compression at different stages of learning.(in the first learning phase, the audios are compressed weakly. And after that, the compression is stronger than first.)(but there is no theorical reason)

@pokepress
Copy link

My project wasn't really designed to tackle the aspect of dynamic range compression-the goal was really to undo just the fidelity reduction of broadcasting itself, rather than the extensive amount of processing (EQ, compression, etc.) radio stations typically use before transmitting the audio. That said, the FM model does seem to expand the dynamic range somewhat, so I'm guessing the FM transmitters I used do apply some dynamic range changes. As far as matching the volume levels, I used the second set of tones (880 hz) in the waveform shown above to align the volumes of the radio and GT versions of the audio.

@AnkakeYakisobaTsuyudaku
Copy link
Author

Thank you for replying. I understood. Anyway, I'm going to use your checkpoint restore AM radio sound.Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants