Regression should not fail for all Dependent Variables if just one fails cuts #238

cameronc137 · 2019-07-27T16:41:31Z

Is your feature request related to a problem? Please describe.
When a single sam saturates for a full run it prevents regression on all the main detectors

Describe the solution you'd like
Separate DV and IV error flag behavior

Describe alternatives you've considered
Postpan...

Additional context
The data for slug 13 and 14 is all saturated out, so we will need to remove sam 3 and respin again.

wdconinc · 2019-07-27T17:44:39Z

This should be possible by turning fGoodEventNumber into a TVectorD and modifying the rms to error normalization with a NormByColumn.

wdconinc · 2019-07-27T17:47:19Z

Before anyone asks, this only works to get around problematic dv, not iv. And as long as sam3 is in the global error flag, you will still exclude those events.

paulmking · 2019-07-27T18:36:16Z

The reason we designed our this way was to make sure that the covariance matrix elements and the individual channel variances are actually spanning the same events, and all the data is valid. Sent from my mobile device. -------- Original message --------From: Wouter Deconinck <[email protected]> Date: 7/27/19 13:47 (GMT-05:00) To: JeffersonLab/japan <[email protected]> Cc: Subscribed <[email protected]> Subject: Re: [JeffersonLab/japan] Regression should not fail for all Dependent Variables if just one fails cuts (#238) Before anyone asks, this only works to get around problematic dv, not iv. And as long as sam3 is in the global error flag, you will still exclude those events. —You are receiving this because you are subscribed to this thread.Reply to this email directly, view it on GitHub, or mute the thread. [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "#238?email_source=notifications\u0026email_token=AB55ZLGNP7VHQOM6BIHIRLDQBSC2RA5CNFSM4IHJ42L2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD26PY3I#issuecomment-515701869", "url": "#238?email_source=notifications\u0026email_token=AB55ZLGNP7VHQOM6BIHIRLDQBSC2RA5CNFSM4IHJ42L2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD26PY3I#issuecomment-515701869", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

paulmking · 2019-07-27T18:39:46Z

As Wouter pointed out, we could change the way the dv errors are handled. But we should think about what that might mean for summing over different events. Sent from my mobile device. -------- Original message --------From: pking <[email protected]> Date: 7/27/19 14:36 (GMT-05:00) To: JeffersonLab/japan <[email protected]>, JeffersonLab/japan <[email protected]> Cc: Subscribed <[email protected]> Subject: Re: [JeffersonLab/japan] Regression should not fail for all Dependent Variables if just one fails cuts (#238) The reason we designed our this way was to make sure that the covariance matrix elements and the individual channel variances are actually spanning the same events, and all the data is valid. Sent from my mobile device.-------- Original message --------From: Wouter Deconinck <[email protected]> Date: 7/27/19 13:47 (GMT-05:00) To: JeffersonLab/japan <[email protected]> Cc: Subscribed <[email protected]> Subject: Re: [JeffersonLab/japan] Regression should not fail for all Dependent Variables if just one fails cuts (#238) Before anyone asks, this only works to get around problematic dv, not iv. And as long as sam3 is in the global error flag, you will still exclude those events. —You are receiving this because you are subscribed to this thread.Reply to this email directly, view it on GitHub, or mute the thread. [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "#238?email_source=notifications\u0026email_token=AB55ZLGNP7VHQOM6BIHIRLDQBSC2RA5CNFSM4IHJ42L2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD26PY3I#issuecomment-515701869", "url": "#238?email_source=notifications\u0026email_token=AB55ZLGNP7VHQOM6BIHIRLDQBSC2RA5CNFSM4IHJ42L2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD26PY3I#issuecomment-515701869", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

cameronc137 · 2019-08-10T20:36:51Z

This has become relevant again as we would like to have postpan or japan regression fail for hardware failures in AT detectors but don't want to throw out data from the main detectors unnecessarily. Fortunately this is largely academic as only 1/run events are failing for the ATs, but it makes their plots have weird scales and offset means and rms when not handled correctly. See https://logbooks.jlab.org/entry/3717352#comment-23246 for this discussion.

wdconinc · 2019-08-10T20:42:25Z

I'll implement it later today in a feature branch. Then you can go crazy over it :-)

wdconinc · 2019-08-12T16:21:43Z

This is a bit more involved than originally anticipated. Here are the questions, for an imaginary situation with 2 dependent variables (Y) with n_1 and n_2 good events (and n_intersect, n_union clear by nomenclature), and any number of independent variables (X) with n good events over all those channels:

How many events are in the XX correlation matrix?
- n
- n_intersect
- n_union
How many events are in column 1 of the XY cross-correlation matrix (Y_1)?
- n
- n_intersect
- n_union
- n_1

wdconinc · 2019-08-12T16:50:00Z

And, since it does get calculated and stored, even if it doesn't go into the final results:

How many events are in the 1-2 element of the YY correlation matrix?
- n_intersect (which requires us to keep track of this)
- n_union (which requires us to keep track of this)
- sqrt(n_1 * n_2) (which is easiest to implement)

cameronc137 · 2019-09-17T16:39:00Z

Did we ever have any conclusions from the discussion about how to implement device error codes into the regression analysis?

Right now our default sets of cuts (at least are supposed to) cut any local device glitches (often 1/hour, but enough to skew regression slope calculations if not removed) globally, for Main Dets and BPMs and the normalizer BCM (which is enough to get this to work).

But when doing regression on the ATs and the SAMs I think we would like to do regression independently on each, using the local device cuts for just one at a time without needing to promote them to global ErrorFlag

paulmking · 2019-09-17T17:52:47Z

@cameronc137 Can you do some sampling of our existing root files to know how many events would be cut for the SAMs and ATs in "typical" runs?

cameronc137 · 2019-09-17T18:52:47Z

Sure, I'll do a spot check on some runs and find what is common.

A simple solution would be to do two sets of regression, one on just the main detectors, using their device error codes and the global error flag, with ErrorFlag == minirun determination; and another set with all the detectors we want (Mains, ATs, SAMs) and using the same ErrorFlag == minirun determination but also cutting out any event that has any one of the detectors' device error flags bad. So then we would have the Main detectors by themselves as a "blessed" result, and all detectors together as a diagnostic result.

cameronc137 · 2019-11-04T17:52:37Z

As discussed a few weeks ago (https://prex.jlab.org/wiki/index.php/20190923-Analyzer-Mtg), we decided that skimming through the dataset generally everything looks OK, but we should sit on the safe side. Including the AT and SAMs in the regression, and thus cutting events when those channels have an issue is generally a small loss in events (~1e-4 lost event fraction). Maybe we should do three sets of regression: main detectors only, with ATs, and with ATs and SAMs.

paulmking · 2019-12-10T12:03:20Z

Summarizing this discussion, I think we do want the regression to cut events when any independent or dependent variable has a problem because otherwise we don't have certainty that the summations for each variable don't have a big difference in which events are included, and the result may not be valid.

In the PREX-2 respin and in the CREX start up, we will have a regression set with just the MD as the DVs, and a second one with the MD, AT, and SAM as the DVs.

Final thought that just undoes my conclusion: all IVs must be good for all events we keep, but we could accumulate the DVs independently. Then if the number of events in a DV was "too different" than the IV good events, we zero the matrix element. Let's think about that for a possible next pass, but let's do what I said above for right now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression should not fail for all Dependent Variables if just one fails cuts #238

Regression should not fail for all Dependent Variables if just one fails cuts #238

cameronc137 commented Jul 27, 2019

wdconinc commented Jul 27, 2019

wdconinc commented Jul 27, 2019

paulmking commented Jul 27, 2019 via email

paulmking commented Jul 27, 2019 via email

cameronc137 commented Aug 10, 2019

wdconinc commented Aug 10, 2019

wdconinc commented Aug 12, 2019

wdconinc commented Aug 12, 2019 •

edited by cameronc137

Loading

cameronc137 commented Sep 17, 2019 •

edited

Loading

paulmking commented Sep 17, 2019

cameronc137 commented Sep 17, 2019

cameronc137 commented Nov 4, 2019

paulmking commented Dec 10, 2019

Regression should not fail for all Dependent Variables if just one fails cuts #238

Regression should not fail for all Dependent Variables if just one fails cuts #238

Comments

cameronc137 commented Jul 27, 2019

wdconinc commented Jul 27, 2019

wdconinc commented Jul 27, 2019

paulmking commented Jul 27, 2019 via email

paulmking commented Jul 27, 2019 via email

cameronc137 commented Aug 10, 2019

wdconinc commented Aug 10, 2019

wdconinc commented Aug 12, 2019

wdconinc commented Aug 12, 2019 • edited by cameronc137 Loading

cameronc137 commented Sep 17, 2019 • edited Loading

paulmking commented Sep 17, 2019

cameronc137 commented Sep 17, 2019

cameronc137 commented Nov 4, 2019

paulmking commented Dec 10, 2019

wdconinc commented Aug 12, 2019 •

edited by cameronc137

Loading

cameronc137 commented Sep 17, 2019 •

edited

Loading