-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyGRB: Update postprocessing functions #4550
Conversation
I think the issue keeping this PR open is what information about templates should be written into the output files. I'm told the main PyCBC searches only write the template ID, and then go back to the template file to get that information. I've started to break this convention in the current state of these code changes by writing the template masses to the trigger output files. The template ID that is currently written in the WIP PyGRB code is effectively an index of the split template file. Using this would require us to keep track of the thousands of template files which does not seems like a great idea. I'm thinking the best way to go about this is to write the template hash in place of the template ID, which would allow us to find the templates in the full template bank in postprocessing. Pinging @pannarale and @titodalcanton for discussion. I'd like to have a bit of input before trying to code this. |
@jakeb245 yes, I would use a similar approach used by
Have a look at this documentation page for more info: https://pycbc.org/pycbc/latest/html/formats/hdf_format.html. You can also see an example of this by looking at the various files from one of the all-sky all-time O4 chunks, in particular you want to look at one of the Ping me on Slack if more is needed. |
Co-authored-by: Francesco Pannarale <[email protected]>
@jakeb245 I had a close look at this PR. As you know I am a bit removed from the development of this code, so I left a bunch of questions above to fill some gaps in my understanding, but I do not see anything too concerning here. I am happy to approve this once you have answered the comments. |
Co-authored-by: Tito Dal Canton <[email protected]>
I think there are a couple things to fix (noted above), then I think this can go in. |
Co-authored-by: Tito Dal Canton <[email protected]>
Co-authored-by: Tito Dal Canton <[email protected]>
I am still not super convinced that I understand the logic of the slide_id/event_id thing here, so whoever is developing that functionality should keep it in mind. But let's merge this and continue developing. |
I think this has been addressed
* Add mapping function * Implement mapping in trig combiner * Fix "template_id" dataset being written too early * Force template_id to be integer * Try adding bank files to workflow * Remove fixme * Only use one bank_file * Add template mapping to inj finder * Small change * mapping function works with trig file object * Small change * Remove typo * Add bank file opt to inj_finder * Add template masses to multi_inspiral output * sort_trigs updates * Extract trig properties * Add old imports back for CodeClimate * Remove unused bestnr opts * Update pycbc/results/pygrb_postprocessing_utils.py Co-authored-by: Francesco Pannarale <[email protected]> * Codeclimate * Remove masses from multi inspiral output * Correct segment loading names * Add NoneType handling to _slide_vetoes function * Indentation fix * Add 's' back in * Fix docstring(?) * Codeclimate * Codeclimate * Update pycbc/results/pygrb_postprocessing_utils.py Co-authored-by: Tito Dal Canton <[email protected]> * Uses event_ids in sort_trigs to avoid confusion * Add possibility of multiple banks (and NotImplementedError) * Remove enumerate and fix indexing issue * Check for single bank earlier * Simplify column name check * Use zip() * Update pycbc/results/pygrb_postprocessing_utils.py Co-authored-by: Tito Dal Canton <[email protected]> * Update pycbc/results/pygrb_postprocessing_utils.py Co-authored-by: Tito Dal Canton <[email protected]> --------- Co-authored-by: Francesco Pannarale <[email protected]> Co-authored-by: Tito Dal Canton <[email protected]>
* Add mapping function * Implement mapping in trig combiner * Fix "template_id" dataset being written too early * Force template_id to be integer * Try adding bank files to workflow * Remove fixme * Only use one bank_file * Add template mapping to inj finder * Small change * mapping function works with trig file object * Small change * Remove typo * Add bank file opt to inj_finder * Add template masses to multi_inspiral output * sort_trigs updates * Extract trig properties * Add old imports back for CodeClimate * Remove unused bestnr opts * Update pycbc/results/pygrb_postprocessing_utils.py Co-authored-by: Francesco Pannarale <[email protected]> * Codeclimate * Remove masses from multi inspiral output * Correct segment loading names * Add NoneType handling to _slide_vetoes function * Indentation fix * Add 's' back in * Fix docstring(?) * Codeclimate * Codeclimate * Update pycbc/results/pygrb_postprocessing_utils.py Co-authored-by: Tito Dal Canton <[email protected]> * Uses event_ids in sort_trigs to avoid confusion * Add possibility of multiple banks (and NotImplementedError) * Remove enumerate and fix indexing issue * Check for single bank earlier * Simplify column name check * Use zip() * Update pycbc/results/pygrb_postprocessing_utils.py Co-authored-by: Tito Dal Canton <[email protected]> * Update pycbc/results/pygrb_postprocessing_utils.py Co-authored-by: Tito Dal Canton <[email protected]> --------- Co-authored-by: Francesco Pannarale <[email protected]> Co-authored-by: Tito Dal Canton <[email protected]>
* Add mapping function * Implement mapping in trig combiner * Fix "template_id" dataset being written too early * Force template_id to be integer * Try adding bank files to workflow * Remove fixme * Only use one bank_file * Add template mapping to inj finder * Small change * mapping function works with trig file object * Small change * Remove typo * Add bank file opt to inj_finder * Add template masses to multi_inspiral output * sort_trigs updates * Extract trig properties * Add old imports back for CodeClimate * Remove unused bestnr opts * Update pycbc/results/pygrb_postprocessing_utils.py Co-authored-by: Francesco Pannarale <[email protected]> * Codeclimate * Remove masses from multi inspiral output * Correct segment loading names * Add NoneType handling to _slide_vetoes function * Indentation fix * Add 's' back in * Fix docstring(?) * Codeclimate * Codeclimate * Update pycbc/results/pygrb_postprocessing_utils.py Co-authored-by: Tito Dal Canton <[email protected]> * Uses event_ids in sort_trigs to avoid confusion * Add possibility of multiple banks (and NotImplementedError) * Remove enumerate and fix indexing issue * Check for single bank earlier * Simplify column name check * Use zip() * Update pycbc/results/pygrb_postprocessing_utils.py Co-authored-by: Tito Dal Canton <[email protected]> * Update pycbc/results/pygrb_postprocessing_utils.py Co-authored-by: Tito Dal Canton <[email protected]> --------- Co-authored-by: Francesco Pannarale <[email protected]> Co-authored-by: Tito Dal Canton <[email protected]>
This PR is another step in resolving #4419. It should finish off converting the backend parts of the old XML functionality into HDF. The functionality is intended to be the same, but there could be some cases in here where it should be changed.