-
Notifications
You must be signed in to change notification settings - Fork 2
Add support for passing a union of runs to Scantool #327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #327 +/- ##
==========================================
+ Coverage 60.18% 60.29% +0.11%
==========================================
Files 30 30
Lines 4551 4564 +13
==========================================
+ Hits 2739 2752 +13
Misses 1812 1812 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
self._active = self.source["isMoving"].ndarray().any() | ||
self._scan_type = values["scanEnv.scanType.value"] | ||
self._scan_type = get_value("scanEnv.scanType.value", is_str=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would KeyData.as_single_value()
not work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't seem to work for strings, I get this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[6], line 1
----> 1 run["FXE_DAQ_SCAN[/MDL/KARABACON](https://max-jhub.desy.de/MDL/KARABACON)", "scanEnv.scanType.value"].as_single_value()
File [~/src/EXtra-data/extra_data/keydata.py:338](https://max-jhub.desy.de/user/wrigleyj/lab/tree/notebooks/src/EXtra-data/extra_data/keydata.py#line=337), in KeyData.as_single_value(self, rtol, atol, reduce_by)
336 value = reduce_by(data)
337 elif isinstance(reduce_by, str) and hasattr(np, reduce_by):
--> 338 value = getattr(np, reduce_by)(data, axis=0)
339 elif reduce_by == 'first':
340 value = data[0]
File ~/.conda/envs/scratch/lib/python3.12/site-packages/numpy/lib/_function_base_impl.py:4001, in median(a, axis, out, overwrite_input, keepdims)
3916 @array_function_dispatch(_median_dispatcher)
3917 def median(a, axis=None, out=None, overwrite_input=False, keepdims=False):
3918 """
3919 Compute the median along the specified axis.
3920
(...) 3999
4000 """
-> 4001 return _ureduce(a, func=_median, keepdims=keepdims, axis=axis, out=out,
4002 overwrite_input=overwrite_input)
File ~/.conda/envs/scratch/lib/python3.12/site-packages/numpy/lib/_function_base_impl.py:3894, in _ureduce(a, func, keepdims, **kwargs)
3891 index_out = (0, ) * nd
3892 kwargs['out'] = out[(Ellipsis, ) + index_out]
-> 3894 r = func(a, **kwargs)
3896 if out is not None:
3897 return out
File ~/.conda/envs/scratch/lib/python3.12/site-packages/numpy/lib/_function_base_impl.py:4053, in _median(a, axis, out, overwrite_input)
4049 indexer = tuple(indexer)
4051 # Use mean in both odd and even case to coerce data type,
4052 # using out array if needed.
-> 4053 rout = mean(part[indexer], axis=axis, out=out)
4054 if supports_nans and sz > 0:
4055 # If nans are possible, warn and replace by nans like mean would.
4056 rout = np.lib._utils_impl._median_nancheck(part, rout, axis)
File ~/.conda/envs/scratch/lib/python3.12/site-packages/numpy/_core/fromnumeric.py:3860, in mean(a, axis, dtype, out, keepdims, where)
3857 else:
3858 return mean(axis=axis, dtype=dtype, out=out, **kwargs)
-> 3860 return _methods._mean(a, axis=axis, dtype=dtype,
3861 out=out, **kwargs)
File ~[/](https://max-jhub.desy.de/).conda/envs/scratch/lib/python3.12[/](https://max-jhub.desy.de/)site-packages/numpy/_core/_methods.py:147, in _mean(a, axis, dtype, out, keepdims, where)
145 ret = ret.dtype.type(ret / rcount)
146 else:
--> 147 ret = ret / rcount
149 return ret
TypeError: ufunc 'divide' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Are the data you're pooling together several similar scans, or one scan split over multiple runs? Or something else? Even with the warning, I think this could be misleading - e.g. if you stick two scans together, should the number of steps be the total of the steps in each scan?
This feels somewhat awkward. Do we want a more general mechanism to get these kinds of values - either from RUN or CONTROL - when multiple runs are stuck together? Maybe with a check that the retrieved values are consistent across the pooled data? Or maybe just using |
I think Ivette's usecase was with several similar scans? (correct me if I'm wrong @bermudei)
Maybe? I dunno if it's worth it since AFAIK this is the only component where we had to look at run values, but if you think it'd be better I can have a go at implementing it. |
As Thomas puts it, string values have only been in Concerning this PR, I'm hesitant to simply use the first value without any further check. This seems like baking in dangerous assumptions into a component that could break in mysterious ways. I agree when used properly this does the right thing, but that's not the situation I'm worried about. I'm afraid we risk the automagic behaviour we want to enable with components. I fully agree with James that EXtra-data fundamentally motivates to treat multi-runs the same way as a single one. Code-wise however I would prefer to move any decision we make on how to treat this case upstream to EXtra-data, as it will come up more in other components. I already had cases where Maybe it's time to properly handle this instead? |
That's fair, though some properties like the acquisition time are stored as arrays, so |
I was thinking in the direction of loading a |
Could an alternative be that the Scantool component accepts a list of runs rather than a union, such that you can read the |
Isn't a list of runs what a union is supposed to be though? Like conceptually I think in these situations what people would want to use is a single object that represents all the runs that they can
That could work 👍 And it would simplify this PR since we could stick to reading RUN data. |
Conceptually yes, but technically a union merges DataCollection which in the end results in a single object with no knowledge of what a run is. And ending up by default to ignore RUN data if the unioned DataCollections are not from a single run. The suggestion I made merely help going around this limitation. Though I prefer the second option I mentioned. |
This is for convenience when working with combined runs. The output looks like this:

The bulk of the changes are to support reading the scantool settings from CONTROL data instead of RUN data.