Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added rudimentary feature extraction #74

Closed
wants to merge 10 commits into from
Closed

Conversation

MJWeldy
Copy link

@MJWeldy MJWeldy commented Aug 6, 2023

Hi Joe,
My name is Matt Weldy. I am a phd student working with bioacoustics in Oregon. I am currently working on a project using the previous 320 length birdnet embeddings. However, I ran into this lib today and thought I would add in a rudimentary implementation of the embedding extraction. When the analyzer class is initiated feature extraction in enabled with Analyzer(fetch_embeddings = 1). The processed embedding is accessed through recording.features_list or recording.features

Feel free to keep it or toss it as you see fit :)

@joeweiss
Copy link
Owner

joeweiss commented Aug 6, 2023

Hi Matt. This looks good, and this functionality was on the todo list #42, so thanks!

This PR will need a test case, preferably one that tests against the commandline embeddings.py script from BirdNET-Analyzer. I'm happy to take a stab at that, or you can if you would prefer.

Also, just FYI, the license for the project is going to change from GPLv3 to Apache 2 (a more permissive license). I'll likely hold off on merging this PR until after the license change, so your code contribution will be under Apache 2.

for c in recording.chunks:

if self.use_custom_classifier:
pred = self.predict_with_custom_classifier(c)[0]
elif self.fetch_embeddings == True:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MJWeldy From looking at this it appears that custom classifier functionality and the embeddings functionality are not possible at the same time. Is that correct?

I haven't used embeddings myself, so not sure if that is even relevant.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll try to get to adding a test case in the next few weeks.

The licensing is fine with me. At this point I need to get my current project (which uses the older 320 length embeddings) over the line before I switch to the lib library. The way I added the feature extraction does not allow for the custom classifier and embedding extraction in the same run. But it would be fairly trivial to allow it to return both. I can think of a few niche case where someone might want both (like someone trying to visualize the embeddings for a few feature classes) while using a custom classifier head), but I think it would be pretty rare.

I did not add the embeddings to the older tf.lite model. Also, it would be nice to adapt the multiprocessing functions to allow for embeddings. My current scripts embed 1000 audio files in ~5-6 seconds.

@joeweiss
Copy link
Owner

Hey Matt. I'm going to close this for now since the PR is out of date. I still have #42 as an open issue and I'll refer to your PR when I get around to implementing it.

@joeweiss joeweiss closed this Nov 16, 2023
@MJWeldy
Copy link
Author

MJWeldy commented Apr 16, 2024

Hi Joe,
I am circling back on the embedding extraction PR request. I am currently refactoring a research project to use birdnetlib instead of custom analyzer code. But in order to complete the refactor I need to be able to access the embeddings without a custom classifier. It looks like my original PR was never integrated. Would it be possible to pull it in now?
Best,
Matt

@joeweiss
Copy link
Owner

Hey Matt. I'd love to get #42 implemented.

We've had a few updates since August, so I'll need to get this PR up-to-date. It might not need any changes, it just needs to be checked. Would you mind confirming it still works as expected (against the current main branch)?

Also, it needs a test case. I can probably do that in the next day or so.

Thanks!

@joeweiss joeweiss reopened this Apr 16, 2024
@MJWeldy
Copy link
Author

MJWeldy commented Apr 16, 2024

Hi Joe!
It looks like it might work as is. If you could push the new updates and the PR to a new branch I can pull it down and test it. Also, I'll send over a test case this week. Do you have a preference how the test embeddings are passed along (i.e., text, pickle, npz)?
Best,
Matt

@joeweiss
Copy link
Owner

Hey Matt.

I decided to add a test case myself, which led to me going ahead and fleshing out the method. Check it out here: #112

For test cases, I've been running the BirdNET-Analyzer command line script, then asserting birdnetlib's output directly against that, rather than using an expected fixture. Usually I'd use a test fixture (npz or text), but the live test helps me catch upstream changes in the main project.

Thanks for checking out the implementation and for providing the original code (and the spark for revisiting it!).

If everything looks good to you, I'll merge that PR and publish it to PyPI.

@joeweiss
Copy link
Owner

#112 merged in.

@joeweiss joeweiss closed this Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants