Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weight normalisation #36

Open
bifani opened this issue Jun 3, 2016 · 3 comments
Open

weight normalisation #36

bifani opened this issue Jun 3, 2016 · 3 comments
Labels

Comments

@bifani
Copy link

bifani commented Jun 3, 2016

I have used hep_ml in the past weeks to reweight MC distributions and stumbled upon the following issue
When determining weights as data/MC ratio of normalised distributions, the computed weights are normalised such as Sum w_i = N
However, I noticed this is not the case for weights obtained using hep_ml.reweight
Is this expected or am I missing something?

@arogozhnikov
Copy link
Owner

Hi Simone,
it is important though not noted in the documentation:
normalization constant in reweighters is not fixed.

This is because the final normalization constant may depend on third-party factors.

In many cases the normalization constant does not play a significant role (e.g. to compute efficiencies / ROC curves / train classifiers), however when it does, you should compute it yourself.


Explanation: absence of normalization in reweighters makes it possible to guarantee that reweighter.predict_weights is deterministic mapping.

E.g. if you predict a large sample at once or predict separately weight for each event and concatenate predictions - the result is the same. If you normalize, obviously the result is wrong in the second case.

@jcob95
Copy link

jcob95 commented Feb 17, 2020

Hi, related to this question, I'm trying to compare a single reweighter trained and tested using the entire dataset to several reweighters which are trained on individual bins of the data. What I'm trying to do is reconstruct the reweighted distributions over the whole data range from the binned reweighters.

Therefore, is it possible to obtain the normalization constant used somehow or can I normalize the reweighters externally?

Thanks

@arogozhnikov
Copy link
Owner

@jcob95, you should renormalize externally. As I understand your case, you should compute expected amount of samples in each bin first, and then within each bin you need to apply normalization so that total weight coincides with expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants