Skip to content

Conversation

philsmt
Copy link
Collaborator

@philsmt philsmt commented Apr 10, 2025

I recently needed to frequently distinguish regular trains with all or most pulses pumped vs trains where intentionally only few or no pulses are pumped for reference. This was way more work than I wanted to, mostly because of necessary train alignment between these modes.

This PR adds a corresponding method .pumped_pulses_ratios() returning such a series automatically.

@takluyver This could also be used to distinguish pump-probe patterns

Copy link

codecov bot commented Apr 10, 2025

Codecov Report

Attention: Patch coverage is 62.96296% with 10 lines in your changes missing coverage. Please review.

Project coverage is 57.40%. Comparing base (0c4c101) to head (54e8093).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
src/extra/components/pulses.py 62.96% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #317      +/-   ##
==========================================
+ Coverage   57.36%   57.40%   +0.03%     
==========================================
  Files          30       30              
  Lines        4539     4566      +27     
==========================================
+ Hits         2604     2621      +17     
- Misses       1935     1945      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@philsmt philsmt force-pushed the feat/pump-probe-ratios branch 3 times, most recently from 628d451 to 67e28db Compare April 10, 2025 08:48
@@ -1507,6 +1507,59 @@ def pulse_mask(self, labelled=True, field=None):
else:
raise ValueError(f"{field=!r} parameter was not 'fel'/'ppl'/None")

def pumped_pulses_ratios(self, ppl_only_value=np.nan, labelled=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

labelled is never used. I don't have a strong preference whether we implement it or remove it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted. I realized at the end all other methods carry it, but forgot the implementation...

Comment on lines +746 to +756
pulses._get_train_ids = lambda: [1000, 1001, 1002, 1003, 1004]
pulses._pulse_ids = pd.Series(
[300, 310, 300, 300, 300, 310],
index=pd.MultiIndex.from_tuples([
(1000, 0, True, True),
(1000, 0, True, False),
(1001, 0, True, False),
(1002, 0, False, True),
(1003, 0, True, True),
(1003, 0, True, True),
], names=['trainId', 'pulseIndex', 'fel', 'ppl']))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think mocking out bits of the innards of a class like this to test its public API is annoyingly brittle, and it's better to make some suitable input. I won't hold the PR up over it, though. Maybe we need better facilities for making mock data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I agree, though I found it acceptable for PulsePattern given the explicit caching mechanism.

As you guessed correctly, I wanted to avoid having to creating different mock devices with the current interface to test a very particular scenario. Ideally we would continue to be able to test without Maxwell, and we also cannot rely on runs sticking around forever unless we copy them to a defined place. Maybe some serialized data in the repository that is unpacked into EXDF?

@takluyver
Copy link
Member

I think doing this with a pandas series makes it more complex than using a 2D array. I haven't tested this, but as an idea:

fel_count = self.pulse_mask(field='fel').sum(axis=1)
ppl_count = self.pulse_mask(field='ppl').sum(axis=1)
ratio = ppl_count / fel_count
ratio[fel_count = 0] = fill_value

Might want to avoid division by zero; np.divide() takes a where= parameter.

@philsmt
Copy link
Collaborator Author

philsmt commented Apr 15, 2025

I think doing this with a pandas series makes it more complex than using a 2D array.

Assuming a constant number of pulses per train is no longer useful for me, in particular when pump-probe is used. That's why I based the PulsePattern family mostly on pandas types.

Going forward the more ubiquitous use of frame filters will likely make this even more common. I don't think we can get away with using linear pulse axes more often.

EDIT: Looks like I mixed up MRs here... this function actually reduces to a train dimension. Just ignore the part above please for that matter.

Concerning your comment: The downside of that code is that right now I only rely on the base interface (see implementation in DldPulses), while for that case I'd need pulse_mask, too. The code looks easier, provided .pulse_mask() does actually always include pulse-less trains, which I don't quite recall whether it did 🤔

@philsmt philsmt force-pushed the feat/pump-probe-ratios branch from 67e28db to 54e8093 Compare April 15, 2025 07:17
@takluyver
Copy link
Member

I'd maybe add .pulse_mask() - or rather the variant with the field= parameter - to DldPulses, to match PumpProbePulses. It looks to me like the implementation of that plus pumped_pulses_ratios() using the mask arrays is still simpler than the MultiIndex version, and the enhanced pulse_mask could be useful as well. Up to you, though.

pumped_count = pd.Series([])

# Compute the ratio for trains with at least one pumped pulse.
ratios = pumped_count / fel_count.loc[pumped_count.index]
Copy link

@fadybishara fadybishara Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow the logic here, if pumped_count and fel_count have different trains, how do you know that fel_count will have more trains? If not, wouldn't this throw an error?

It's likely I misunderstood something but if not, perhaps a simple solution would be to do an explicit inner merge like

joint_count = pd.concat([fel_count, pumped_count], keys=['fel', 'ppl'], axis=1, join='inner'
ratios = joint_counts.ppl.div(joint_counts.fel)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, never mind, I get it -- all the trains have FEL pulses but not necessarily PPL pulses. Nevertheless, just because it should not happen doesn't mean it cannot happen, no?

(Also, a better way to do what I suggested is with np.intersect1d on the train IDs -- but probably none of what I suggested is necessary.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite - this line is not about FEL vs PPL pulses, but FEL vs FEL+PPL pulses. The set of trains with pumped pulses is a (not necessarily proper) subset of the set of trains with FEL pulses. It becomes clear when comparing line 1528 and line 1533, the latter makes a stricter indexing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, right, 1528 vs. 1533 is what I was referring to. [:, :, True, True] is necessarily a subset (not sure what you mean by "proper") of [:, :, True, :] so there is no problem here.

@philsmt
Copy link
Collaborator Author

philsmt commented Apr 24, 2025

I'd maybe add .pulse_mask() - or rather the variant with the field= parameter - to DldPulses, to match PumpProbePulses. It looks to me like the implementation of that plus pumped_pulses_ratios() using the mask arrays is still simpler than the MultiIndex version, and the enhanced pulse_mask could be useful as well. Up to you, though.

I had a thought about this, but this implementation would run into the problem of selected trains vs trains with data. As with KeyData.data_counts(), the statistics methods of these components strive to return a result for all selected trains, whether there is data or not.

# pd.SeriesGroupBy.count() is indeed faster than
# pd.SeriesGroupBy.groups, likely due additional objects
# created by the latter.
try:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to me to be more complicated than necessary, why not do the following?

try:
    ppl_only_index = pids[:, :, False, True].groupby('trainId').count()
except KeyError:
    ppl_only_index = pd.Series([])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants