MNT: Minimize the datatypes created from space_packet_parser #723

greglucas · 2024-07-26T23:05:56Z

Change Summary

Overview

Previously all datatypes from numpy were created with int64/float64 encodings. np.array([1, 2, 3]) isn't minimized. We know the expected datatype from space packet parser's xtce definition, so we can infer a good numpy datatype to minimize the size.

closes #722

Testing

I added a test within SWAPI's area because it is using the function already and the data is there to test with. I'm not sure if we want something more standalone to test the functionality or if this is OK, let me know if you have preferences either way.

Extra

cc @medley56 and @BStoneLASP as I talked to both of you about this earlier. @medley56 let me know if something like this would be nice to add to space_packet_parser, or if we should look at doing the "simpler" thing and just add a reference to the DataEncoding within the ParsedDataItem. Happy to PR something over there if it'd be useful more broadly.

subagonsouth

This is great! I had identified this as something that needed to be worked on, but you got it implemented before I could even bring it up.

One nit pick on test coverage.

greglucas · 2024-07-29T22:02:02Z

One nit pick on test coverage.

@subagonsouth I don't see that, did you forget to submit that comment?

greglucas · 2024-07-29T22:19:34Z

imap_processing/tests/test_utils.py

+pytest_plugins = [
+    "imap_processing.tests.swapi.test_swapi_l1",
+]


@subagonsouth, I used your plugins idea here to bring in an external semi-unrelated fixture that probably shouldn't be brought up into the main conftest.py level. Does this make sense to you here, or is there a different/better way to bring in external fixtures? Maybe you have an even better way of addressing the test coverage here though.

So, this apparently didn't actually work the way I thought it did... It did work locally for me, but not remotely. Reading more about it on pytest it looks like it may not be supported to redefine pytest_plugins outside of the root conftest.py?

I ended up calling the function directly now instead of the fixture, which also allowed me to parametrize the test. Still curious about your thoughts on making this better though.

greglucas · 2024-07-29T23:01:28Z

imap_processing/utils.py

+    If it can't be coerced to that datatype, fallback to general array creation
+    without a specific datatype. This can happen with derived values.


Note that I added a parametrization over use_derived_value above, that actually found that generically assuming the XTCE datatype doesn't work when we go to derived values because we can go from uint2 to str for enumerations and other cases. This function does the "simple" thing and just falls back to a default array creation if we can't say anything about what it will be derived to.

subagonsouth

This looks good. I commented about one additional unit test that could be added.

imap_processing/tests/test_utils.py

subagonsouth · 2024-07-30T15:39:09Z

imap_processing/tests/test_utils.py

+@pytest.mark.parametrize(
+    "use_derived_value, expected_mode", [(True, "HVENG"), (False, 2)]
+)
+def test_packet_file_to_datasets(use_derived_value, expected_mode):


Thanks for adding this test. My nit pick that got lost in the ether was suggesting that this test belonged in this file rather than in the swapi test. One additional improvement would be to have unit test coverage for _get_minimum_numpy_datatype.

I looked into this quick and couldn't see a great way of doing it. It'd be a deeply nested mock currently and basically just hard-coding expected results rather than actually testing against XTCE instance types. So, I'm going to punt on this because we are testing at least uint8 / uint16 as having multiple explicit return types in the current test.

I did do the easier version and test the other private function _create_minimum_dataset for the two cases there.

subagonsouth · 2024-07-30T15:44:05Z

imap_processing/tests/test_utils.py

+    Test that we get multiple apids in the output.
+    """
+    test_file = "tests/swapi/l0_data/imap_swapi_l0_raw_20231012_v001.pkts"
+    packet_files = imap_module_directory / test_file


Question: Does this work on Windows? I can't specifically find if dividing by a string with hard coded forward-slashes works is handled by pathlib or not.

Yes, I believe it should. I of course don't have a Windows machine to test this on, but I believe I've run into this in CI before and as long as you start from a Path instance the forward slashes in strings are respected across OS' properly. Here is what our AI overlords tell me:

On Windows, Path.cwd() will give you the current working directory in a format like C:\Users\YourUsername\YourDirectory. When you divide this by "a/b/c.txt", it will result in a path like C:\Users\YourUsername\YourDirectory\a\b\c.txt.

On POSIX systems (Linux, macOS), Path.cwd() will give you the current working directory in a format like /home/YourUsername/YourDirectory. When you divide this by "a/b/c.txt", it will result in a path like /home/YourUsername/YourDirectory/a/b/c.txt.

and speaking of which, we are testing it in CI, so I guess it is working as expected :)

Previously all datatypes from numpy were created with int64/float64 encodings. np.array([1, 2, 3]) isn't minimized. We know the expected datatype from space packet parser's xtce definition, so we can infer a good numpy datatype to minimize the size.

greglucas added the enhancement New feature or request label Jul 26, 2024

greglucas requested review from subagonsouth and tech3371 July 26, 2024 23:05

greglucas force-pushed the xtce-datatype branch from 2c8a53b to 76ebb92 Compare July 26, 2024 23:08

subagonsouth reviewed Jul 29, 2024

View reviewed changes

greglucas force-pushed the xtce-datatype branch from 76ebb92 to e8b7b63 Compare July 29, 2024 22:17

greglucas commented Jul 29, 2024

View reviewed changes

greglucas force-pushed the xtce-datatype branch from e8b7b63 to b7d059e Compare July 29, 2024 22:58

greglucas commented Jul 29, 2024

View reviewed changes

subagonsouth approved these changes Jul 30, 2024

View reviewed changes

subagonsouth reviewed Jul 30, 2024

View reviewed changes

greglucas force-pushed the xtce-datatype branch from b7d059e to 07004a0 Compare July 30, 2024 21:09

greglucas force-pushed the xtce-datatype branch from 07004a0 to d7e7bef Compare July 30, 2024 21:12

greglucas merged commit 4347f38 into IMAP-Science-Operations-Center:dev Jul 30, 2024
17 checks passed

greglucas deleted the xtce-datatype branch July 30, 2024 21:19

bourque assigned greglucas Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MNT: Minimize the datatypes created from space_packet_parser #723

MNT: Minimize the datatypes created from space_packet_parser #723

greglucas commented Jul 26, 2024

subagonsouth left a comment

greglucas commented Jul 29, 2024

greglucas Jul 29, 2024

greglucas Jul 29, 2024

greglucas Jul 29, 2024

subagonsouth left a comment

subagonsouth Jul 30, 2024

greglucas Jul 30, 2024

subagonsouth Jul 30, 2024

greglucas Jul 30, 2024

greglucas Jul 30, 2024

		If it can't be coerced to that datatype, fallback to general array creation
		without a specific datatype. This can happen with derived values.

MNT: Minimize the datatypes created from space_packet_parser #723

MNT: Minimize the datatypes created from space_packet_parser #723

Conversation

greglucas commented Jul 26, 2024

Change Summary

Overview

Testing

Extra

subagonsouth left a comment

Choose a reason for hiding this comment

greglucas commented Jul 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

subagonsouth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment