-
Notifications
You must be signed in to change notification settings - Fork 2
Add PumpProbePulses.pumped_pulses_ratios() #317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #317 +/- ##
==========================================
+ Coverage 57.36% 57.40% +0.03%
==========================================
Files 30 30
Lines 4539 4566 +27
==========================================
+ Hits 2604 2621 +17
- Misses 1935 1945 +10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
628d451
to
67e28db
Compare
@@ -1507,6 +1507,59 @@ def pulse_mask(self, labelled=True, field=None): | |||
else: | |||
raise ValueError(f"{field=!r} parameter was not 'fel'/'ppl'/None") | |||
|
|||
def pumped_pulses_ratios(self, ppl_only_value=np.nan, labelled=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
labelled
is never used. I don't have a strong preference whether we implement it or remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well spotted. I realized at the end all other methods carry it, but forgot the implementation...
pulses._get_train_ids = lambda: [1000, 1001, 1002, 1003, 1004] | ||
pulses._pulse_ids = pd.Series( | ||
[300, 310, 300, 300, 300, 310], | ||
index=pd.MultiIndex.from_tuples([ | ||
(1000, 0, True, True), | ||
(1000, 0, True, False), | ||
(1001, 0, True, False), | ||
(1002, 0, False, True), | ||
(1003, 0, True, True), | ||
(1003, 0, True, True), | ||
], names=['trainId', 'pulseIndex', 'fel', 'ppl'])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I think mocking out bits of the innards of a class like this to test its public API is annoyingly brittle, and it's better to make some suitable input. I won't hold the PR up over it, though. Maybe we need better facilities for making mock data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I agree, though I found it acceptable for PulsePattern
given the explicit caching mechanism.
As you guessed correctly, I wanted to avoid having to creating different mock devices with the current interface to test a very particular scenario. Ideally we would continue to be able to test without Maxwell, and we also cannot rely on runs sticking around forever unless we copy them to a defined place. Maybe some serialized data in the repository that is unpacked into EXDF?
I think doing this with a pandas series makes it more complex than using a 2D array. I haven't tested this, but as an idea: fel_count = self.pulse_mask(field='fel').sum(axis=1)
ppl_count = self.pulse_mask(field='ppl').sum(axis=1)
ratio = ppl_count / fel_count
ratio[fel_count = 0] = fill_value Might want to avoid division by zero; |
Assuming a constant number of pulses per train is no longer useful for me, in particular when pump-probe is used. That's why I based the Going forward the more ubiquitous use of frame filters will likely make this even more common. I don't think we can get away with using linear pulse axes more often. EDIT: Looks like I mixed up MRs here... this function actually reduces to a train dimension. Just ignore the part above please for that matter. Concerning your comment: The downside of that code is that right now I only rely on the base interface (see implementation in |
67e28db
to
54e8093
Compare
I'd maybe add |
pumped_count = pd.Series([]) | ||
|
||
# Compute the ratio for trains with at least one pumped pulse. | ||
ratios = pumped_count / fel_count.loc[pumped_count.index] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow the logic here, if pumped_count
and fel_count
have different trains, how do you know that fel_count
will have more trains? If not, wouldn't this throw an error?
It's likely I misunderstood something but if not, perhaps a simple solution would be to do an explicit inner merge like
joint_count = pd.concat([fel_count, pumped_count], keys=['fel', 'ppl'], axis=1, join='inner'
ratios = joint_counts.ppl.div(joint_counts.fel)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, never mind, I get it -- all the trains have FEL pulses but not necessarily PPL pulses. Nevertheless, just because it should not happen doesn't mean it cannot happen, no?
(Also, a better way to do what I suggested is with np.intersect1d
on the train IDs -- but probably none of what I suggested is necessary.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite - this line is not about FEL vs PPL pulses, but FEL vs FEL+PPL pulses. The set of trains with pumped pulses is a (not necessarily proper) subset of the set of trains with FEL pulses. It becomes clear when comparing line 1528 and line 1533, the latter makes a stricter indexing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a thought about this, but this implementation would run into the problem of selected trains vs trains with data. As with |
# pd.SeriesGroupBy.count() is indeed faster than | ||
# pd.SeriesGroupBy.groups, likely due additional objects | ||
# created by the latter. | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to me to be more complicated than necessary, why not do the following?
try:
ppl_only_index = pids[:, :, False, True].groupby('trainId').count()
except KeyError:
ppl_only_index = pd.Series([])
I recently needed to frequently distinguish regular trains with all or most pulses pumped vs trains where intentionally only few or no pulses are pumped for reference. This was way more work than I wanted to, mostly because of necessary train alignment between these modes.
This PR adds a corresponding method
.pumped_pulses_ratios()
returning such a series automatically.@takluyver This could also be used to distinguish pump-probe patterns