Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AI reconstruction profiling #488

Open
whit2333 opened this issue Feb 26, 2025 · 5 comments
Open

AI reconstruction profiling #488

whit2333 opened this issue Feb 26, 2025 · 5 comments
Assignees

Comments

@whit2333
Copy link
Collaborator

I was running a profiler and found that
org/jlab/rec/ahdc/AI/TrackConstruction.get_all_possible_track takes 99.7% of the cpu resources (sampled over 1 minute).
Is this expected?

Image

@mathieuouillon
Copy link
Collaborator

This function generates all possible track candidates from the list of super-precluster.
This function should take a lot of the time of the track-finding process, but I didn't expect 99%.
When you run the engine which part of the engine is active, but from my experience the Kalman Filter should take way longer than the track finding (including the generation of the track candidates)

@whit2333
Copy link
Collaborator Author

This is on cosmic data, so I am not sure it is the best representation. The scripts and results can be found here https://code.jlab.org/hallb/alert/cj_profile

@whit2333
Copy link
Collaborator Author

whit2333 commented Feb 26, 2025

@mathieuouillon
This is a new one: https://code.jlab.org/hallb/alert/c12/-/jobs/8294#L189
Seems related to pytorch pytorch/vision#3771
Also
https://code.jlab.org/hallb/alert/c12/-/jobs/8387#L134

Are there some settings for pytorch that I should set to limit the resources it tries to use? Should this be in a yaml file or passed to recon-util?

@whit2333
Copy link
Collaborator Author

whit2333 commented Feb 27, 2025

It appears the reconstruction never finishes the first event before running out of memory. I assume this can be resolved with more VM memory allocations?

Image

Every "seed" in the loop takes a few seconds on my machine which seems broken. I really cannot follow the logic of the function too.
It could certainly benefit from much more inline comments explaining each step/loop.

@whit2333
Copy link
Collaborator Author

@mathieuouillon and @baltzell doesn't this line mean mean it will go on forever until the line below breaks out of the loop? Also seems like code that would be very prone to introducing memory leaks.

I am very skeptical this code is reliable. Nested 4-loops deep with and too many variable names with the word "combination" is a bad look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

3 participants