PyGRB: Update postprocessing functions #4550

jakeb245 · 2023-10-31T15:50:33Z

This PR is another step in resolving #4419. It should finish off converting the backend parts of the old XML functionality into HDF. The functionality is intended to be the same, but there could be some cases in here where it should be changed.

bin/pycbc_multi_inspiral

pycbc/results/pygrb_postprocessing_utils.py

jakeb245 · 2024-01-03T15:28:38Z

I think the issue keeping this PR open is what information about templates should be written into the output files. I'm told the main PyCBC searches only write the template ID, and then go back to the template file to get that information. I've started to break this convention in the current state of these code changes by writing the template masses to the trigger output files.

The template ID that is currently written in the WIP PyGRB code is effectively an index of the split template file. Using this would require us to keep track of the thousands of template files which does not seems like a great idea. I'm thinking the best way to go about this is to write the template hash in place of the template ID, which would allow us to find the templates in the full template bank in postprocessing.

Pinging @pannarale and @titodalcanton for discussion. I'd like to have a bit of input before trying to code this.

titodalcanton · 2024-01-08T13:44:30Z

@jakeb245 yes, I would use a similar approach used by pycbc_inspiral in the all-sky all-time search, which indeed consists in storing just the template hash for each trigger after matched filtering. To be precise, the following happens there:

pycbc_inspiral does matched filtering for each bank split separately, and the corresponding triggers store the template hashes only.
pycbc_coinc_mergetrigs then takes all the triggers from all the bank splits from step 1, merges them together, and uses the template hashes to recover the template IDs into the full bank.

Have a look at this documentation page for more info: https://pycbc.org/pycbc/latest/html/formats/hdf_format.html. You can also see an example of this by looking at the various files from one of the all-sky all-time O4 chunks, in particular you want to look at one of the full_data/*-INSPIRAL_BANK*.hdf files for step 1, and full_data/*-HDF_TRIGGER_MERGE_FULL_DATA*.hdf for step 2.

Ping me on Slack if more is needed.

Co-authored-by: Francesco Pannarale <[email protected]>

bin/pygrb/pycbc_grb_inj_finder

bin/pygrb/pycbc_grb_trig_combiner

bin/pygrb/pycbc_make_offline_grb_workflow

pycbc/results/pygrb_postprocessing_utils.py

titodalcanton · 2024-02-10T20:52:29Z

@jakeb245 I had a close look at this PR. As you know I am a bit removed from the development of this code, so I left a bunch of questions above to fill some gaps in my understanding, but I do not see anything too concerning here. I am happy to approve this once you have answered the comments.

Co-authored-by: Tito Dal Canton <[email protected]>

pycbc/results/pygrb_postprocessing_utils.py

titodalcanton · 2024-02-14T15:38:37Z

I think there are a couple things to fix (noted above), then I think this can go in.

Co-authored-by: Tito Dal Canton <[email protected]>

titodalcanton · 2024-02-15T10:10:59Z

I am still not super convinced that I understand the logic of the slide_id/event_id thing here, so whoever is developing that functionality should keep it in mind. But let's merge this and continue developing.

I think this has been addressed

* Add mapping function * Implement mapping in trig combiner * Fix "template_id" dataset being written too early * Force template_id to be integer * Try adding bank files to workflow * Remove fixme * Only use one bank_file * Add template mapping to inj finder * Small change * mapping function works with trig file object * Small change * Remove typo * Add bank file opt to inj_finder * Add template masses to multi_inspiral output * sort_trigs updates * Extract trig properties * Add old imports back for CodeClimate * Remove unused bestnr opts * Update pycbc/results/pygrb_postprocessing_utils.py Co-authored-by: Francesco Pannarale <[email protected]> * Codeclimate * Remove masses from multi inspiral output * Correct segment loading names * Add NoneType handling to _slide_vetoes function * Indentation fix * Add 's' back in * Fix docstring(?) * Codeclimate * Codeclimate * Update pycbc/results/pygrb_postprocessing_utils.py Co-authored-by: Tito Dal Canton <[email protected]> * Uses event_ids in sort_trigs to avoid confusion * Add possibility of multiple banks (and NotImplementedError) * Remove enumerate and fix indexing issue * Check for single bank earlier * Simplify column name check * Use zip() * Update pycbc/results/pygrb_postprocessing_utils.py Co-authored-by: Tito Dal Canton <[email protected]> * Update pycbc/results/pygrb_postprocessing_utils.py Co-authored-by: Tito Dal Canton <[email protected]> --------- Co-authored-by: Francesco Pannarale <[email protected]> Co-authored-by: Tito Dal Canton <[email protected]>

jakeb245 self-assigned this Oct 31, 2023

jakeb245 added the PyGRB PyGRB development label Oct 31, 2023

jakeb245 requested a review from pannarale October 31, 2023 15:51

jakeb245 mentioned this pull request Nov 3, 2023

Update PyGRB efficiency script for HDF files #4562

Merged

pannarale previously requested changes Nov 6, 2023

View reviewed changes

bin/pycbc_multi_inspiral Outdated Show resolved Hide resolved

pycbc/results/pygrb_postprocessing_utils.py Outdated Show resolved Hide resolved

jakeb245 requested a review from pannarale December 20, 2023 15:52

jakeb245 force-pushed the pp_utils branch from d7b3cb3 to 089fd4a Compare January 16, 2024 18:43

jakeb245 and others added 21 commits January 18, 2024 12:49

Add mapping function

596920a

Implement mapping in trig combiner

bf8e971

Fix "template_id" dataset being written too early

02b04e0

Force template_id to be integer

c736dee

Try adding bank files to workflow

7c9bbd2

Remove fixme

e707e28

Only use one bank_file

b75940e

Add template mapping to inj finder

a4ac47c

Small change

8b57915

mapping function works with trig file object

a2011a5

Small change

50916d8

Remove typo

9369491

Add bank file opt to inj_finder

93edd5f

Add template masses to multi_inspiral output

1f0f7fa

sort_trigs updates

93ac747

Extract trig properties

67394c0

Add old imports back for CodeClimate

3c40adf

Remove unused bestnr opts

e24c0e6

Update pycbc/results/pygrb_postprocessing_utils.py

9f1ba6c

Co-authored-by: Francesco Pannarale <[email protected]>

Codeclimate

fc9b061

Remove masses from multi inspiral output

6f48e9f

Merge remote-tracking branch 'gwastro/master' into pp_utils

d4f297f