Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MNT: Minimize the datatypes created from space_packet_parser #723

Merged

Conversation

greglucas
Copy link
Collaborator

Change Summary

Overview

Previously all datatypes from numpy were created with int64/float64 encodings. np.array([1, 2, 3]) isn't minimized. We know the expected datatype from space packet parser's xtce definition, so we can infer a good numpy datatype to minimize the size.

closes #722

Testing

I added a test within SWAPI's area because it is using the function already and the data is there to test with. I'm not sure if we want something more standalone to test the functionality or if this is OK, let me know if you have preferences either way.

Extra

cc @medley56 and @BStoneLASP as I talked to both of you about this earlier. @medley56 let me know if something like this would be nice to add to space_packet_parser, or if we should look at doing the "simpler" thing and just add a reference to the DataEncoding within the ParsedDataItem. Happy to PR something over there if it'd be useful more broadly.

Copy link
Contributor

@subagonsouth subagonsouth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I had identified this as something that needed to be worked on, but you got it implemented before I could even bring it up.

One nit pick on test coverage.

@greglucas
Copy link
Collaborator Author

One nit pick on test coverage.

@subagonsouth I don't see that, did you forget to submit that comment?

Comment on lines 10 to 12
pytest_plugins = [
"imap_processing.tests.swapi.test_swapi_l1",
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@subagonsouth, I used your plugins idea here to bring in an external semi-unrelated fixture that probably shouldn't be brought up into the main conftest.py level. Does this make sense to you here, or is there a different/better way to bring in external fixtures? Maybe you have an even better way of addressing the test coverage here though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this apparently didn't actually work the way I thought it did... It did work locally for me, but not remotely. Reading more about it on pytest it looks like it may not be supported to redefine pytest_plugins outside of the root conftest.py?

I ended up calling the function directly now instead of the fixture, which also allowed me to parametrize the test. Still curious about your thoughts on making this better though.

Comment on lines +286 to +287
If it can't be coerced to that datatype, fallback to general array creation
without a specific datatype. This can happen with derived values.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I added a parametrization over use_derived_value above, that actually found that generically assuming the XTCE datatype doesn't work when we go to derived values because we can go from uint2 to str for enumerations and other cases. This function does the "simple" thing and just falls back to a default array creation if we can't say anything about what it will be derived to.

Copy link
Contributor

@subagonsouth subagonsouth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I commented about one additional unit test that could be added.

imap_processing/tests/test_utils.py Show resolved Hide resolved
@pytest.mark.parametrize(
"use_derived_value, expected_mode", [(True, "HVENG"), (False, 2)]
)
def test_packet_file_to_datasets(use_derived_value, expected_mode):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this test. My nit pick that got lost in the ether was suggesting that this test belonged in this file rather than in the swapi test. One additional improvement would be to have unit test coverage for _get_minimum_numpy_datatype.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this quick and couldn't see a great way of doing it. It'd be a deeply nested mock currently and basically just hard-coding expected results rather than actually testing against XTCE instance types. So, I'm going to punt on this because we are testing at least uint8 / uint16 as having multiple explicit return types in the current test.

I did do the easier version and test the other private function _create_minimum_dataset for the two cases there.

Test that we get multiple apids in the output.
"""
test_file = "tests/swapi/l0_data/imap_swapi_l0_raw_20231012_v001.pkts"
packet_files = imap_module_directory / test_file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Does this work on Windows? I can't specifically find if dividing by a string with hard coded forward-slashes works is handled by pathlib or not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe it should. I of course don't have a Windows machine to test this on, but I believe I've run into this in CI before and as long as you start from a Path instance the forward slashes in strings are respected across OS' properly. Here is what our AI overlords tell me:

On Windows, Path.cwd() will give you the current working directory in a format like C:\Users\YourUsername\YourDirectory. When you divide this by "a/b/c.txt", it will result in a path like C:\Users\YourUsername\YourDirectory\a\b\c.txt.

On POSIX systems (Linux, macOS), Path.cwd() will give you the current working directory in a format like /home/YourUsername/YourDirectory. When you divide this by "a/b/c.txt", it will result in a path like /home/YourUsername/YourDirectory/a/b/c.txt.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and speaking of which, we are testing it in CI, so I guess it is working as expected :)

Previously all datatypes from numpy were created with int64/float64
encodings. np.array([1, 2, 3]) isn't minimized. We know the expected
datatype from space packet parser's xtce definition, so we can infer
a good numpy datatype to minimize the size.
@greglucas greglucas merged commit 4347f38 into IMAP-Science-Operations-Center:dev Jul 30, 2024
17 checks passed
@greglucas greglucas deleted the xtce-datatype branch July 30, 2024 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Infer the proper numpy data types when reading packets
2 participants