Refactor hit_l1a.py to use packet_file_to_datasets function #828

vmartinez-cu · 2024-09-11T23:12:02Z

Updated hit_l1a.py to use the packet_file_to_datasets function instead of using data classes and manually creating xarray datasets for products. Added some new functions to handle housekeeping data processing.

uses packet_file_to_datasets function in hit_l1a.py
New functions for processing housekeeping dataset
New function to process science data (WIP)
Updated unit tests

New files

hit_packet_definitions.xml
- definitions for all APIDs. Replaced packet definition files per APID.

Updated Files

hit_l1a.py
- uses packet_file_to_datasets function in hit_l1a.py
- New functions for processing housekeeping dataset
- New function to process science data (WIP)
test_hit_l1a.py
- updated unit tests to reflect changes in hit_l1a.py

Deleted files

test_housekeeping.py
- tests the housekeeping data class. This class will be removed after hit_l1b is refactored.
P_HIT_HSKP.xml
- packet definition for housekeeping. Replaced by hit_packet_definitions.xml
P_HIT_SCIENCE.xml
- packet definition for science data. Replaced by hit_packet_definitions.xml

Testing

test_hit_l1a.py
- added new fixtures and additional assertions for housekeeping data

Issue #822

sdhoyt

Great work on this! Just a few comments

imap_processing/hit/l1a/hit_l1a.py

sdhoyt · 2024-09-12T13:40:00Z

imap_processing/hit/l1a/hit_l1a.py


+    # Concatenate along 'leak_index' and reorder dimensions
+    stacked_leaks = xr.concat(leak_vars, dim="leak_index").transpose(


should the dim here be adc_channels?

Really nice use of the xarray functions to do this for you :)

Correct, adc_channels is the dim for the leak variable. Dims will be updated later in the process when dims for all the fields get assigned according to the cdf yaml file definitions. I just needed to put a dimension here that made sense as a placeholder.

Assign attributes and dimensions to each data array in the Dataset

for field, data in dataset.data_vars.items(): # Create a list of dimensions using the DEPEND_I keys in the # attributes dims = [ value for key, value in attr_mgr.get_variable_attributes(field).items() if "DEPEND" in key ] dataset[field] = xr.DataArray( data, dims=dims, attrs=attr_mgr.get_variable_attributes(field), )

I updated the function to pass in adc_channels and assign it as a dimension

sdhoyt · 2024-09-12T13:45:35Z

imap_processing/tests/hit/test_hit_l1a.py

+    """Test concatenation of leak_i variables"""
+
+    # Call the function
+    updated_dataset = concatenate_leak_variables(housekeeping_dataset)


I think it'd be good to check that the values are correct. Maybe before you call this function, remove everything but the first couple packets for housekeeping and hardcode arrays for all the Leak values to check against

sdhoyt · 2024-09-12T13:46:31Z

imap_processing/tests/hit/test_housekeeping.py

@@ -1,91 +0,0 @@
-import numpy as np


You can also remove the Housekeeping data class

I did initially, but it broke my hit l1b tests for housekeeping so to avoid putting those code updates in this PR, I just left the data classes for now.

greglucas

Looks great to me! Just a few minor suggestions.

imap_processing/hit/l1a/hit_l1a.py

greglucas · 2024-09-12T22:32:30Z

imap_processing/hit/l1a/hit_l1a.py


+    # Concatenate along 'leak_index' and reorder dimensions
+    stacked_leaks = xr.concat(leak_vars, dim="leak_index").transpose(


Really nice use of the xarray functions to do this for you :)

greglucas · 2024-09-12T22:34:31Z

imap_processing/hit/l1a/hit_l1a.py

-        "leak_i_raw",
+    logger.info("Creating HIT L1A housekeeping dataset")
+
+    logical_source = "imap_hit_l1a_hk"


Should this be in the CDF metadata?

The logical source variable is used to grab the HIT L1A instrument level attributes from imap_hit_global_cdf_attrs.yaml when updating the dataset attributes dataset.attrs = attr_mgr.get_global_attributes(logical_source)

Attributes from imap_hit_global_cdf_attrs.yaml for L1A housekeeping:

imap_hit_l1a_hk: <<: *instrument_base Data_level: 1A Data_type: L1A_HK>Level-1A Housekeeping Logical_source: imap_hit_l1a_hk Logical_source_description: IMAP Mission HIT Instrument Level-1A Housekeeping Data.

Would it help to rename this variable or plug in the value directly into the line that updates the dataset attributes? dataset.attrs = attr_mgr.get_global_attributes(imap_hit_l1a_hk)

imap_processing/hit/l1a/hit_l1a.py

greglucas · 2024-09-12T22:36:35Z

imap_processing/hit/l1a/hit_l1a.py

+    dataset = dataset.assign_coords(
+        {
+            "adc_channels": adc_channels,
+            "adc_channels_label": adc_channels_label,


I forget, do the "labels" need to be defined as coordinates as well, or are those variables that are dependent on the coordinates?

i.e. should adc_channels_label actually be in the data_vars section?

I think it needs to be added to the coordinates as well because I initially didn't have it in there but then had to add it. I don't remember why though. I'll get clarification on this.

I checked with Tenzin and she explained that cdflib updates requires this for data with greater than 1 dimensions

greglucas · 2024-09-12T22:41:15Z

imap_processing/hit/l1a/hit_l1a.py

+    # Assign attributes and dimensions to each data array in the Dataset
+    for field, data in dataset.data_vars.items():


I think this will basically be re-creating all of your DataArrays and resetting the dataset.

Instead could you loop through just the field values and update the current dataset variable?

dataset[field].attrs = attr_mgr.get_variable_attributes(field) # Something like this where you can just set the name of existing coords dataset[field].assign_coords(dims)

Good catch. Updated!

greglucas · 2024-09-12T22:41:39Z

imap_processing/tests/hit/test_hit_l1a.py

+def datasets(packet_filepath):
+    """Create datasets from packet file"""
+    packet_definition = (
+        imap_module_directory / "hit/packet_definitions/" "hit_packet_definitions.xml"


Suggested change

imap_module_directory / "hit/packet_definitions/" "hit_packet_definitions.xml"

imap_module_directory / "hit/packet_definitions/hit_packet_definitions.xml"

greglucas · 2024-09-12T22:43:41Z

imap_processing/tests/hit/test_hit_l1a.py

+    # ----------------
+    # Check that the dataset has the correct variables
+    assert valid_keys == set(processed_hskp_dataset.data_vars.keys())
+    assert set(dropped_keys).isdisjoint(set(processed_hskp_dataset.data_vars.keys()))


Nice, I did not know the isdisjoint method existed, great use of it!

-Add function to handle concatenating leak_i variables. -Drop variables from housekeeping dataset that aren't needed for the CDF product. -Update dimensions and add attributes to the housekeeping Dataset. -Delete create_datasets function since packet_file_to_datasets. creates xarray datasets and those just need to be updated. This will happen in housekeeping and science data processing functions. -Add function to process science data (WIP). -Clean up code and add/update docstrings and comments.

hit_l1a.py was refactored to use the packet_file_to_datasets function. The unit tests were updated to reflect changes. -Added new fixtures for attributes manager, datasets dict, and housekeeping dataset. -Added new tests for new functions (concatenating leak_i data and processing housekeeping). -Added additional assertions for housekeeping dataset.

…ses, but only after hit_l1b is refactored not to use it

…he data arrays

… leak_i variables to assign as a dimension. Also change dims to a dict from a list since assign_coords takes in a dictionary

…er. Also add assertion to check values are correct

sdhoyt

I recommend running the cli.py as an extra test if you haven't done that already. But I think everything looks good!

vmartinez-cu force-pushed the update_hit_l1a branch from 93ac195 to b7a5a17 Compare September 11, 2024 23:37

vmartinez-cu added Ins: HIT Related to the HIT instrument Level: L1 Level 1 processing enhancement New feature or request labels Sep 12, 2024

vmartinez-cu requested review from greglucas and sdhoyt September 12, 2024 00:04

vmartinez-cu marked this pull request as ready for review September 12, 2024 00:04

vmartinez-cu added this to the Sept 2024 milestone Sep 12, 2024

sdhoyt reviewed Sep 12, 2024

View reviewed changes

vmartinez-cu mentioned this pull request Sep 12, 2024

HIT: Refactor hit_l1a.py to use packet_file_to_datasets function #822

Closed

greglucas reviewed Sep 12, 2024

View reviewed changes

bourque assigned vmartinez-cu Sep 16, 2024

vmartinez-cu added 12 commits September 19, 2024 10:18

WIP - updating file to use packet_to_datasets function in utils

112a26c

Add main for quick testing to view input data

b5ef12c

Replace packet definition files with updated file covering all HIT apids

cb6cc90

Minor updates to docstring and comment

8dd4d18

Remove main from hit_l1a.py

98a7852

Delete test for housekeeping data class. Need to delete the data clas…

ac78b25

…ses, but only after hit_l1b is refactored not to use it

Address PR comments/suggestions

197132e

Assign attrs and dims directly to data arrays rather than re-create t…

86062ca

…he data arrays

Add adc_channels as a parameter to the function that concatenates the…

9a0debd

… leak_i variables to assign as a dimension. Also change dims to a dict from a list since assign_coords takes in a dictionary

Update test for concatenating leak_i variables to take in new paramet…

19fec46

…er. Also add assertion to check values are correct

vmartinez-cu force-pushed the update_hit_l1a branch from 35ba6ed to 19fec46 Compare September 19, 2024 17:51

sdhoyt approved these changes Sep 20, 2024

View reviewed changes

Add attributes to epoch variable in dataset. Update test data file name

1d77f36

vmartinez-cu merged commit ba5ad0b into IMAP-Science-Operations-Center:dev Sep 24, 2024
17 checks passed

vmartinez-cu deleted the update_hit_l1a branch September 24, 2024 19:10

vmartinez-cu mentioned this pull request Oct 2, 2024

Remove/consolidate utility functions #702

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor hit_l1a.py to use packet_file_to_datasets function #828

Refactor hit_l1a.py to use packet_file_to_datasets function #828

vmartinez-cu commented Sep 11, 2024 •

edited

Loading

sdhoyt left a comment

sdhoyt Sep 12, 2024

greglucas Sep 12, 2024

vmartinez-cu Sep 17, 2024 •

edited

Loading

vmartinez-cu Sep 19, 2024

sdhoyt Sep 12, 2024

vmartinez-cu Sep 19, 2024

sdhoyt Sep 12, 2024

vmartinez-cu Sep 17, 2024

greglucas left a comment

greglucas Sep 12, 2024

greglucas Sep 12, 2024

vmartinez-cu Sep 18, 2024 •

edited

Loading

greglucas Sep 12, 2024

vmartinez-cu Sep 18, 2024

vmartinez-cu Sep 18, 2024

greglucas Sep 12, 2024

vmartinez-cu Sep 19, 2024

greglucas Sep 12, 2024

greglucas Sep 12, 2024

sdhoyt left a comment


		# Concatenate along 'leak_index' and reorder dimensions
		stacked_leaks = xr.concat(leak_vars, dim="leak_index").transpose(

		# Assign attributes and dimensions to each data array in the Dataset
		for field, data in dataset.data_vars.items():

	imap_module_directory / "hit/packet_definitions/" "hit_packet_definitions.xml"
	imap_module_directory / "hit/packet_definitions/hit_packet_definitions.xml"

Refactor hit_l1a.py to use packet_file_to_datasets function #828

Refactor hit_l1a.py to use packet_file_to_datasets function #828

Conversation

vmartinez-cu commented Sep 11, 2024 • edited Loading

New files

Updated Files

Deleted files

Testing

sdhoyt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vmartinez-cu Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

greglucas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vmartinez-cu Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdhoyt left a comment

Choose a reason for hiding this comment

vmartinez-cu commented Sep 11, 2024 •

edited

Loading

vmartinez-cu Sep 17, 2024 •

edited

Loading

vmartinez-cu Sep 18, 2024 •

edited

Loading