Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: Avoid loss of precision when casting in packet loading #781

Closed

Conversation

greglucas
Copy link
Collaborator

Change Summary

Overview

When using derived values there can be situations where a linear conversion factor is applied to a uint8 value to turn a raw measurement into a float temperature value for instance. These are represented as a small uint datatype onboard, but need to be represented as a float or larger integer datatype on the ground so we don't lose precision. Previously we were getting 2.1 cast to 2 after the derived types were attempted to be cast to their onboard types.

closes #780

@greglucas greglucas added the packet parsing Related to packet parsing or XTCE label Aug 26, 2024
y = x.astype(dtype, copy=False)
# We need to compare the arrays to see if we trimmed any values by
# casting to a smaller datatype (e.g. float64 to uint8, 2.1 to 2)
if np.array_equal(x, y):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all somewhat unfortunate in how many "arrays" we'll be creating here. If anyone has ideas for a better way to do this I'm interested in hearing it.

We are potentially creating 3 separate arrays:

  1. initial asarray from a list -> creates an int64
  2. Trying to cast to a smaller dtype -> creates a uint8
  3. Doing this array comparison in a higher-order dtype space.

They are all happening in numpy c-level functions so shouldn't be a huge hit, but it would be nice to not do this array comparison just to check loss of precision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought would be to modify the _get_minimum_numpy_datatype function to take the derived value logic into account and return the appropriate datatype.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good suggestion, thanks.

I opened up: #786
with an alternative approach, which is actually able to get rid of the numpy try/except array creation too. The one downside is there isn't a great way to test all of the XTCE stuff currently :-/ I did test it on the new SWE XTCE file that was failing and it looks good, but I don't have a great way to parameterize that PR like we do with this one.

When using derived values there can be situations where a linear
conversion factor is applied to a uint8 value to turn a raw measurement
into a float temperature value for instance. These are represented
as a small uint datatype onboard, but need to be represented as a
float or larger integer datatype on the ground so we don't lose
precision. Previously we were getting 2.1 cast to 2 after the
derived types were attempted to be cast to their onboard types.
@greglucas
Copy link
Collaborator Author

Closing in favor of the other PR

@greglucas greglucas closed this Aug 30, 2024
@greglucas greglucas deleted the min-dtype-fix branch August 30, 2024 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
packet parsing Related to packet parsing or XTCE
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG - Reading in derived float values loses precision
2 participants