-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MNT: Minimize the datatypes created from space_packet_parser #723
MNT: Minimize the datatypes created from space_packet_parser #723
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! I had identified this as something that needed to be worked on, but you got it implemented before I could even bring it up.
One nit pick on test coverage.
@subagonsouth I don't see that, did you forget to submit that comment? |
imap_processing/tests/test_utils.py
Outdated
pytest_plugins = [ | ||
"imap_processing.tests.swapi.test_swapi_l1", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@subagonsouth, I used your plugins idea here to bring in an external semi-unrelated fixture that probably shouldn't be brought up into the main conftest.py
level. Does this make sense to you here, or is there a different/better way to bring in external fixtures? Maybe you have an even better way of addressing the test coverage here though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this apparently didn't actually work the way I thought it did... It did work locally for me, but not remotely. Reading more about it on pytest it looks like it may not be supported to redefine pytest_plugins
outside of the root conftest.py
?
I ended up calling the function directly now instead of the fixture, which also allowed me to parametrize the test. Still curious about your thoughts on making this better though.
If it can't be coerced to that datatype, fallback to general array creation | ||
without a specific datatype. This can happen with derived values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I added a parametrization over use_derived_value
above, that actually found that generically assuming the XTCE datatype doesn't work when we go to derived values because we can go from uint2
to str
for enumerations and other cases. This function does the "simple" thing and just falls back to a default array creation if we can't say anything about what it will be derived to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I commented about one additional unit test that could be added.
@pytest.mark.parametrize( | ||
"use_derived_value, expected_mode", [(True, "HVENG"), (False, 2)] | ||
) | ||
def test_packet_file_to_datasets(use_derived_value, expected_mode): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this test. My nit pick that got lost in the ether was suggesting that this test belonged in this file rather than in the swapi test. One additional improvement would be to have unit test coverage for _get_minimum_numpy_datatype
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into this quick and couldn't see a great way of doing it. It'd be a deeply nested mock currently and basically just hard-coding expected results rather than actually testing against XTCE instance types. So, I'm going to punt on this because we are testing at least uint8 / uint16 as having multiple explicit return types in the current test.
I did do the easier version and test the other private function _create_minimum_dataset
for the two cases there.
Test that we get multiple apids in the output. | ||
""" | ||
test_file = "tests/swapi/l0_data/imap_swapi_l0_raw_20231012_v001.pkts" | ||
packet_files = imap_module_directory / test_file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Does this work on Windows? I can't specifically find if dividing by a string with hard coded forward-slashes works is handled by pathlib or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe it should. I of course don't have a Windows machine to test this on, but I believe I've run into this in CI before and as long as you start from a Path
instance the forward slashes in strings are respected across OS' properly. Here is what our AI overlords tell me:
On Windows, Path.cwd() will give you the current working directory in a format like C:\Users\YourUsername\YourDirectory. When you divide this by "a/b/c.txt", it will result in a path like C:\Users\YourUsername\YourDirectory\a\b\c.txt.
On POSIX systems (Linux, macOS), Path.cwd() will give you the current working directory in a format like /home/YourUsername/YourDirectory. When you divide this by "a/b/c.txt", it will result in a path like /home/YourUsername/YourDirectory/a/b/c.txt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and speaking of which, we are testing it in CI, so I guess it is working as expected :)
Previously all datatypes from numpy were created with int64/float64 encodings. np.array([1, 2, 3]) isn't minimized. We know the expected datatype from space packet parser's xtce definition, so we can infer a good numpy datatype to minimize the size.
4347f38
into
IMAP-Science-Operations-Center:dev
Change Summary
Overview
Previously all datatypes from numpy were created with int64/float64 encodings.
np.array([1, 2, 3])
isn't minimized. We know the expected datatype from space packet parser's xtce definition, so we can infer a good numpy datatype to minimize the size.closes #722
Testing
I added a test within SWAPI's area because it is using the function already and the data is there to test with. I'm not sure if we want something more standalone to test the functionality or if this is OK, let me know if you have preferences either way.
Extra
cc @medley56 and @BStoneLASP as I talked to both of you about this earlier. @medley56 let me know if something like this would be nice to add to
space_packet_parser
, or if we should look at doing the "simpler" thing and just add a reference to theDataEncoding
within theParsedDataItem
. Happy to PR something over there if it'd be useful more broadly.