Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression should not fail for all Dependent Variables if just one fails cuts #238

Open
cameronc137 opened this issue Jul 27, 2019 · 13 comments

Comments

@cameronc137
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
When a single sam saturates for a full run it prevents regression on all the main detectors

Describe the solution you'd like
Separate DV and IV error flag behavior

Describe alternatives you've considered
Postpan...

Additional context
The data for slug 13 and 14 is all saturated out, so we will need to remove sam 3 and respin again.

@wdconinc
Copy link
Member

This should be possible by turning fGoodEventNumber into a TVectorD and modifying the rms to error normalization with a NormByColumn.

@wdconinc
Copy link
Member

Before anyone asks, this only works to get around problematic dv, not iv. And as long as sam3 is in the global error flag, you will still exclude those events.

@paulmking
Copy link
Collaborator

paulmking commented Jul 27, 2019 via email

@paulmking
Copy link
Collaborator

paulmking commented Jul 27, 2019 via email

@cameronc137
Copy link
Collaborator Author

This has become relevant again as we would like to have postpan or japan regression fail for hardware failures in AT detectors but don't want to throw out data from the main detectors unnecessarily. Fortunately this is largely academic as only 1/run events are failing for the ATs, but it makes their plots have weird scales and offset means and rms when not handled correctly. See https://logbooks.jlab.org/entry/3717352#comment-23246 for this discussion.

@wdconinc
Copy link
Member

I'll implement it later today in a feature branch. Then you can go crazy over it :-)

@wdconinc
Copy link
Member

This is a bit more involved than originally anticipated. Here are the questions, for an imaginary situation with 2 dependent variables (Y) with n_1 and n_2 good events (and n_intersect, n_union clear by nomenclature), and any number of independent variables (X) with n good events over all those channels:

  • How many events are in the XX correlation matrix?
    • n
    • n_intersect
    • n_union
  • How many events are in column 1 of the XY cross-correlation matrix (Y_1)?
    • n
    • n_intersect
    • n_union
    • n_1

@wdconinc
Copy link
Member

wdconinc commented Aug 12, 2019

And, since it does get calculated and stored, even if it doesn't go into the final results:

  • How many events are in the 1-2 element of the YY correlation matrix?
    • n_intersect (which requires us to keep track of this)
    • n_union (which requires us to keep track of this)
    • sqrt(n_1 * n_2) (which is easiest to implement)

@cameronc137
Copy link
Collaborator Author

cameronc137 commented Sep 17, 2019

Did we ever have any conclusions from the discussion about how to implement device error codes into the regression analysis?

Right now our default sets of cuts (at least are supposed to) cut any local device glitches (often 1/hour, but enough to skew regression slope calculations if not removed) globally, for Main Dets and BPMs and the normalizer BCM (which is enough to get this to work).

But when doing regression on the ATs and the SAMs I think we would like to do regression independently on each, using the local device cuts for just one at a time without needing to promote them to global ErrorFlag

@paulmking
Copy link
Collaborator

@cameronc137 Can you do some sampling of our existing root files to know how many events would be cut for the SAMs and ATs in "typical" runs?

@cameronc137
Copy link
Collaborator Author

Sure, I'll do a spot check on some runs and find what is common.

A simple solution would be to do two sets of regression, one on just the main detectors, using their device error codes and the global error flag, with ErrorFlag == minirun determination; and another set with all the detectors we want (Mains, ATs, SAMs) and using the same ErrorFlag == minirun determination but also cutting out any event that has any one of the detectors' device error flags bad. So then we would have the Main detectors by themselves as a "blessed" result, and all detectors together as a diagnostic result.

@cameronc137
Copy link
Collaborator Author

As discussed a few weeks ago (https://prex.jlab.org/wiki/index.php/20190923-Analyzer-Mtg), we decided that skimming through the dataset generally everything looks OK, but we should sit on the safe side. Including the AT and SAMs in the regression, and thus cutting events when those channels have an issue is generally a small loss in events (~1e-4 lost event fraction). Maybe we should do three sets of regression: main detectors only, with ATs, and with ATs and SAMs.

@paulmking
Copy link
Collaborator

Summarizing this discussion, I think we do want the regression to cut events when any independent or dependent variable has a problem because otherwise we don't have certainty that the summations for each variable don't have a big difference in which events are included, and the result may not be valid.

In the PREX-2 respin and in the CREX start up, we will have a regression set with just the MD as the DVs, and a second one with the MD, AT, and SAM as the DVs.

Final thought that just undoes my conclusion: all IVs must be good for all events we keep, but we could accumulate the DVs independently. Then if the number of events in a DV was "too different" than the IV good events, we zero the matrix element. Let's think about that for a possible next pass, but let's do what I said above for right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants