Refactor hash_obj to handle nested arbitrary objects #329

ShiveshM · 2017-04-06T05:33:08Z

Similar to how it's done in the normQuant function

backstory:
When a Pipeline object is used as a Param value in a ParamSet, the ParamSet method values_hash fails to obtain a hash value. Inside the values_hash function, hash_obj is applied to a tuple of the ParamSet values, one of which is the Pipeline object. Currently hash_obj treats tuples (and all other Sequences) by converting it to a string using pickle. The Pipeline object is not picklable so it fails.

The text was updated successfully, but these errors were encountered:

…329 (#339) * Add hash property to Pipeline and implement a temporary fix to issue #329 * Respond to PR comments

…eio mkdir function (#342) * Add hash property to Pipeline and implement a temporary fix to issue #329 * Make compare script works with MapSets and return the path in the fileio mkdir function * Repond to PR comments, add ability to input Map, Pipeline and DistributionMaker objects as input to the compare function

* get greco sample working with included flux-reweighted weights * Add hashing to transformation of data * add caching in unfolding stage for the creation of the initial histograms * misc bugs * reconfigure caching slightly * revert cfx example script settings back to use the leesard sample * cleanup roounfold * roountils convenience script * progress on adding eff corrections to unfolding stage * allow loading in gen_lvl sample from sample.py and more progress of eff corr in unfolding stage * make separate function for use of real data in roounfold.py and finish up eff implementation * Add option in sample cfg to load only specific keys * Fix bug in caching where it took 2 attempts at each stage for caching to kick in * misc * make unfolding with efficiency work for the greco sample * Bug: Fix de-sync of the sample pipeline with the gen_lvl pipeline The sample pipeline uses the output Data object of the generator level pipeline, to work out efficiencies. The gen lvl pipeline is fed into the sample pipeline. Previously this was done through a config file. In scripts, when the params of the sample pipeline are dynamically changed, the gen lvl needs to be kept in sync. Now the gen lvl Pipeline object can also be fed into the sample config. Issue with hashing came up. The Pipeline object cannot be pickled, so I did some try/except acrobatics in hash.py as a fix. Probably a more solid fix is to check if the object (or any object inside the Sequence) already contains a hashed value in a more general way, then replace the unpickleable object with this hash value. * add keep_keys for greco sample * add true_e_scale osc param to weight.py and cfx pipeline * Fix invalid values bugs and others in weight.py * add is_dir and is_valid_file in pisa fileio utils * Make compare functionality callable from another script * go over which params should be free for CFX analysis * Implement memcaching so that it doesn't have to deepcopy every stage * suffix comes after, not before u dope * add noise * bug in scaled flux systematics * misc * misc * Implement alias feature in sample.py * output rates in debug * fix units for greco sample * units of gen level sample * Add ability to return the efficiency maps in roounfold.py * change name of binning from unsmeared to unfolded * begin discrete systematics stage * Implement dicrete systematics stage * Make sure all test work * Add hash property to Pipeline and implement a temporary fix to issue #329 * Respond to PR comments * Flavint uses properties now, fix in sample.py * Move output_events param to instantiate at __init__ * update muon example cfg * remove extraneous object file * remove remnant from old commit * remove remnant from old commit * Turn on file caching in fit.py and fix bug in roounfold.py * update sample.py to use the latest flavint class * fix import bug in roounfold * Add README for mc settings and respond to PR comments * Add Madison locations to mc config files * make set, get methods case insensitive in the Data object and raise error if applyCut is used due to bugs

* get greco sample working with included flux-reweighted weights * Add hashing to transformation of data * add caching in unfolding stage for the creation of the initial histograms * misc bugs * reconfigure caching slightly * revert cfx example script settings back to use the leesard sample * cleanup roounfold * roountils convenience script * progress on adding eff corrections to unfolding stage * allow loading in gen_lvl sample from sample.py and more progress of eff corr in unfolding stage * make separate function for use of real data in roounfold.py and finish up eff implementation * Add option in sample cfg to load only specific keys * Fix bug in caching where it took 2 attempts at each stage for caching to kick in * misc * make unfolding with efficiency work for the greco sample * Bug: Fix de-sync of the sample pipeline with the gen_lvl pipeline The sample pipeline uses the output Data object of the generator level pipeline, to work out efficiencies. The gen lvl pipeline is fed into the sample pipeline. Previously this was done through a config file. In scripts, when the params of the sample pipeline are dynamically changed, the gen lvl needs to be kept in sync. Now the gen lvl Pipeline object can also be fed into the sample config. Issue with hashing came up. The Pipeline object cannot be pickled, so I did some try/except acrobatics in hash.py as a fix. Probably a more solid fix is to check if the object (or any object inside the Sequence) already contains a hashed value in a more general way, then replace the unpickleable object with this hash value. * add keep_keys for greco sample * add true_e_scale osc param to weight.py and cfx pipeline * Fix invalid values bugs and others in weight.py * add is_dir and is_valid_file in pisa fileio utils * Make compare functionality callable from another script * go over which params should be free for CFX analysis * Implement memcaching so that it doesn't have to deepcopy every stage * suffix comes after, not before u dope * add noise * bug in scaled flux systematics * misc * misc * Implement alias feature in sample.py * output rates in debug * fix units for greco sample * units of gen level sample * Add ability to return the efficiency maps in roounfold.py * change name of binning from unsmeared to unfolded * begin discrete systematics stage * Implement dicrete systematics stage * Make sure all test work * Add hash property to Pipeline and implement a temporary fix to issue #329 * Respond to PR comments * Flavint uses properties now, fix in sample.py * Move output_events param to instantiate at __init__ * update muon example cfg * remove extraneous object file * remove remnant from old commit * remove remnant from old commit * Turn on file caching in fit.py and fix bug in roounfold.py * update sample.py to use the latest flavint class * fix import bug in roounfold * Add README for mc settings and respond to PR comments * Add Madison locations to mc config files * Modify separator in CFX stages to reflect recent PISA updates * Implement MCEq into PISA * add some docs and pep8ify mceq * bug with units in mceq * Remove incorrect doc in mceq.py * add doc links to default param options in mceq

…329 (#339) * Add hash property to Pipeline and implement a temporary fix to issue #329 * Respond to PR comments

…eio mkdir function (#342) * Add hash property to Pipeline and implement a temporary fix to issue #329 * Make compare script works with MapSets and return the path in the fileio mkdir function * Repond to PR comments, add ability to input Map, Pipeline and DistributionMaker objects as input to the compare function

* get greco sample working with included flux-reweighted weights * Add hashing to transformation of data * add caching in unfolding stage for the creation of the initial histograms * misc bugs * reconfigure caching slightly * revert cfx example script settings back to use the leesard sample * cleanup roounfold * roountils convenience script * progress on adding eff corrections to unfolding stage * allow loading in gen_lvl sample from sample.py and more progress of eff corr in unfolding stage * make separate function for use of real data in roounfold.py and finish up eff implementation * Add option in sample cfg to load only specific keys * Fix bug in caching where it took 2 attempts at each stage for caching to kick in * misc * make unfolding with efficiency work for the greco sample * Bug: Fix de-sync of the sample pipeline with the gen_lvl pipeline The sample pipeline uses the output Data object of the generator level pipeline, to work out efficiencies. The gen lvl pipeline is fed into the sample pipeline. Previously this was done through a config file. In scripts, when the params of the sample pipeline are dynamically changed, the gen lvl needs to be kept in sync. Now the gen lvl Pipeline object can also be fed into the sample config. Issue with hashing came up. The Pipeline object cannot be pickled, so I did some try/except acrobatics in hash.py as a fix. Probably a more solid fix is to check if the object (or any object inside the Sequence) already contains a hashed value in a more general way, then replace the unpickleable object with this hash value. * add keep_keys for greco sample * add true_e_scale osc param to weight.py and cfx pipeline * Fix invalid values bugs and others in weight.py * add is_dir and is_valid_file in pisa fileio utils * Make compare functionality callable from another script * go over which params should be free for CFX analysis * Implement memcaching so that it doesn't have to deepcopy every stage * suffix comes after, not before u dope * add noise * bug in scaled flux systematics * misc * misc * Implement alias feature in sample.py * output rates in debug * fix units for greco sample * units of gen level sample * Add ability to return the efficiency maps in roounfold.py * change name of binning from unsmeared to unfolded * begin discrete systematics stage * Implement dicrete systematics stage * Make sure all test work * Add hash property to Pipeline and implement a temporary fix to issue #329 * Respond to PR comments * Flavint uses properties now, fix in sample.py * Move output_events param to instantiate at __init__ * update muon example cfg * remove extraneous object file * remove remnant from old commit * remove remnant from old commit * Turn on file caching in fit.py and fix bug in roounfold.py * update sample.py to use the latest flavint class * fix import bug in roounfold * Add README for mc settings and respond to PR comments * Add Madison locations to mc config files * make set, get methods case insensitive in the Data object and raise error if applyCut is used due to bugs

* get greco sample working with included flux-reweighted weights * Add hashing to transformation of data * add caching in unfolding stage for the creation of the initial histograms * misc bugs * reconfigure caching slightly * revert cfx example script settings back to use the leesard sample * cleanup roounfold * roountils convenience script * progress on adding eff corrections to unfolding stage * allow loading in gen_lvl sample from sample.py and more progress of eff corr in unfolding stage * make separate function for use of real data in roounfold.py and finish up eff implementation * Add option in sample cfg to load only specific keys * Fix bug in caching where it took 2 attempts at each stage for caching to kick in * misc * make unfolding with efficiency work for the greco sample * Bug: Fix de-sync of the sample pipeline with the gen_lvl pipeline The sample pipeline uses the output Data object of the generator level pipeline, to work out efficiencies. The gen lvl pipeline is fed into the sample pipeline. Previously this was done through a config file. In scripts, when the params of the sample pipeline are dynamically changed, the gen lvl needs to be kept in sync. Now the gen lvl Pipeline object can also be fed into the sample config. Issue with hashing came up. The Pipeline object cannot be pickled, so I did some try/except acrobatics in hash.py as a fix. Probably a more solid fix is to check if the object (or any object inside the Sequence) already contains a hashed value in a more general way, then replace the unpickleable object with this hash value. * add keep_keys for greco sample * add true_e_scale osc param to weight.py and cfx pipeline * Fix invalid values bugs and others in weight.py * add is_dir and is_valid_file in pisa fileio utils * Make compare functionality callable from another script * go over which params should be free for CFX analysis * Implement memcaching so that it doesn't have to deepcopy every stage * suffix comes after, not before u dope * add noise * bug in scaled flux systematics * misc * misc * Implement alias feature in sample.py * output rates in debug * fix units for greco sample * units of gen level sample * Add ability to return the efficiency maps in roounfold.py * change name of binning from unsmeared to unfolded * begin discrete systematics stage * Implement dicrete systematics stage * Make sure all test work * Add hash property to Pipeline and implement a temporary fix to issue #329 * Respond to PR comments * Flavint uses properties now, fix in sample.py * Move output_events param to instantiate at __init__ * update muon example cfg * remove extraneous object file * remove remnant from old commit * remove remnant from old commit * Turn on file caching in fit.py and fix bug in roounfold.py * update sample.py to use the latest flavint class * fix import bug in roounfold * Add README for mc settings and respond to PR comments * Add Madison locations to mc config files * Modify separator in CFX stages to reflect recent PISA updates * Implement MCEq into PISA * add some docs and pep8ify mceq * bug with units in mceq * Remove incorrect doc in mceq.py * add doc links to default param options in mceq

LeanderFischer · 2024-05-27T13:21:14Z

Still seems to be the case, but I'm not sure how serious this problem is 🤔 If someone feels up for it, they could try to implement it, apparently similar to the normQuant implementation..

ShiveshM changed the title ~~Refactor hash_obj to handle nested arbitrary objects~~ Refactor hash_obj to handle nested arbitrary objects Apr 6, 2017

jllanfranchi added bug inconsistency labels Apr 6, 2017

jllanfranchi pushed a commit that referenced this issue Apr 13, 2017

Add hash property to Pipeline and implement a temporary fix to issue #…

fd98b16

…329 (#339) * Add hash property to Pipeline and implement a temporary fix to issue #329 * Respond to PR comments

jllanfranchi pushed a commit that referenced this issue Feb 13, 2019

Add hash property to Pipeline and implement a temporary fix to issue #…

49e16e1

…329 (#339) * Add hash property to Pipeline and implement a temporary fix to issue #329 * Respond to PR comments

LeanderFischer added the help wanted label May 27, 2024

LeanderFischer added this to the PISA 4.2 milestone May 27, 2024

thehrh removed this from the PISA 4.2 milestone Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor hash_obj to handle nested arbitrary objects #329

Refactor hash_obj to handle nested arbitrary objects #329

ShiveshM commented Apr 6, 2017

LeanderFischer commented May 27, 2024

Refactor hash_obj to handle nested arbitrary objects #329

Refactor hash_obj to handle nested arbitrary objects #329

Comments

ShiveshM commented Apr 6, 2017

LeanderFischer commented May 27, 2024