Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor hit_l1a.py to use packet_file_to_datasets function #828

Merged

Conversation

vmartinez-cu
Copy link
Contributor

@vmartinez-cu vmartinez-cu commented Sep 11, 2024

Updated hit_l1a.py to use the packet_file_to_datasets function instead of using data classes and manually creating xarray datasets for products. Added some new functions to handle housekeeping data processing.

  • uses packet_file_to_datasets function in hit_l1a.py
  • New functions for processing housekeeping dataset
  • New function to process science data (WIP)
  • Updated unit tests

New files

  • hit_packet_definitions.xml
    • definitions for all APIDs. Replaced packet definition files per APID.

Updated Files

  • hit_l1a.py
    • uses packet_file_to_datasets function in hit_l1a.py
    • New functions for processing housekeeping dataset
    • New function to process science data (WIP)
  • test_hit_l1a.py
    • updated unit tests to reflect changes in hit_l1a.py

Deleted files

  • test_housekeeping.py
    • tests the housekeeping data class. This class will be removed after hit_l1b is refactored.
  • P_HIT_HSKP.xml
    • packet definition for housekeeping. Replaced by hit_packet_definitions.xml
  • P_HIT_SCIENCE.xml
    • packet definition for science data. Replaced by hit_packet_definitions.xml

Testing

  • test_hit_l1a.py
    • added new fixtures and additional assertions for housekeeping data

Issue #822

@vmartinez-cu vmartinez-cu added Ins: HIT Related to the HIT instrument Level: L1 Level 1 processing enhancement New feature or request labels Sep 12, 2024
@vmartinez-cu vmartinez-cu marked this pull request as ready for review September 12, 2024 00:04
@vmartinez-cu vmartinez-cu added this to the Sept 2024 milestone Sep 12, 2024
Copy link
Contributor

@sdhoyt sdhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this! Just a few comments

imap_processing/hit/l1a/hit_l1a.py Show resolved Hide resolved
imap_processing/hit/l1a/hit_l1a.py Outdated Show resolved Hide resolved

# Concatenate along 'leak_index' and reorder dimensions
stacked_leaks = xr.concat(leak_vars, dim="leak_index").transpose(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the dim here be adc_channels?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice use of the xarray functions to do this for you :)

Copy link
Contributor Author

@vmartinez-cu vmartinez-cu Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, adc_channels is the dim for the leak variable. Dims will be updated later in the process when dims for all the fields get assigned according to the cdf yaml file definitions. I just needed to put a dimension here that made sense as a placeholder.

Assign attributes and dimensions to each data array in the Dataset

    for field, data in dataset.data_vars.items():
        # Create a list of dimensions using the DEPEND_I keys in the
        # attributes
        dims = [
            value
            for key, value in attr_mgr.get_variable_attributes(field).items()
            if "DEPEND" in key
        ]
        dataset[field] = xr.DataArray(
            data,
            dims=dims,
            attrs=attr_mgr.get_variable_attributes(field),
        )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the function to pass in adc_channels and assign it as a dimension

"""Test concatenation of leak_i variables"""

# Call the function
updated_dataset = concatenate_leak_variables(housekeeping_dataset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be good to check that the values are correct. Maybe before you call this function, remove everything but the first couple packets for housekeeping and hardcode arrays for all the Leak values to check against

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!

@@ -1,91 +0,0 @@
import numpy as np
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also remove the Housekeeping data class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did initially, but it broke my hit l1b tests for housekeeping so to avoid putting those code updates in this PR, I just left the data classes for now.

Copy link
Collaborator

@greglucas greglucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Just a few minor suggestions.

imap_processing/hit/l1a/hit_l1a.py Outdated Show resolved Hide resolved
imap_processing/hit/l1a/hit_l1a.py Show resolved Hide resolved
imap_processing/hit/l1a/hit_l1a.py Show resolved Hide resolved

# Concatenate along 'leak_index' and reorder dimensions
stacked_leaks = xr.concat(leak_vars, dim="leak_index").transpose(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice use of the xarray functions to do this for you :)

"leak_i_raw",
logger.info("Creating HIT L1A housekeeping dataset")

logical_source = "imap_hit_l1a_hk"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in the CDF metadata?

Copy link
Contributor Author

@vmartinez-cu vmartinez-cu Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logical source variable is used to grab the HIT L1A instrument level attributes from imap_hit_global_cdf_attrs.yaml when updating the dataset attributes dataset.attrs = attr_mgr.get_global_attributes(logical_source)

Attributes from imap_hit_global_cdf_attrs.yaml for L1A housekeeping:

imap_hit_l1a_hk:
    <<: *instrument_base
    Data_level: 1A
    Data_type: L1A_HK>Level-1A Housekeeping
    Logical_source: imap_hit_l1a_hk
    Logical_source_description: IMAP Mission HIT Instrument Level-1A Housekeeping Data.

Would it help to rename this variable or plug in the value directly into the line that updates the dataset attributes? dataset.attrs = attr_mgr.get_global_attributes(imap_hit_l1a_hk)

imap_processing/hit/l1a/hit_l1a.py Outdated Show resolved Hide resolved
dataset = dataset.assign_coords(
{
"adc_channels": adc_channels,
"adc_channels_label": adc_channels_label,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget, do the "labels" need to be defined as coordinates as well, or are those variables that are dependent on the coordinates?

i.e. should adc_channels_label actually be in the data_vars section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs to be added to the coordinates as well because I initially didn't have it in there but then had to add it. I don't remember why though. I'll get clarification on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked with Tenzin and she explained that cdflib updates requires this for data with greater than 1 dimensions

Comment on lines 238 to 239
# Assign attributes and dimensions to each data array in the Dataset
for field, data in dataset.data_vars.items():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will basically be re-creating all of your DataArrays and resetting the dataset.

Instead could you loop through just the field values and update the current dataset variable?

dataset[field].attrs = attr_mgr.get_variable_attributes(field)
# Something like this where you can just set the name of existing coords
dataset[field].assign_coords(dims)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Updated!

def datasets(packet_filepath):
"""Create datasets from packet file"""
packet_definition = (
imap_module_directory / "hit/packet_definitions/" "hit_packet_definitions.xml"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
imap_module_directory / "hit/packet_definitions/" "hit_packet_definitions.xml"
imap_module_directory / "hit/packet_definitions/hit_packet_definitions.xml"

# ----------------
# Check that the dataset has the correct variables
assert valid_keys == set(processed_hskp_dataset.data_vars.keys())
assert set(dropped_keys).isdisjoint(set(processed_hskp_dataset.data_vars.keys()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I did not know the isdisjoint method existed, great use of it!

-Add function to handle concatenating leak_i variables.
-Drop variables from housekeeping dataset that aren't needed
 for the CDF product.
-Update dimensions and add attributes to the housekeeping Dataset.
-Delete create_datasets function since packet_file_to_datasets.
 creates xarray datasets and those just need to be updated. This
 will happen in housekeeping and science data processing functions.
-Add function to process science data (WIP).
-Clean up code and add/update docstrings and comments.
hit_l1a.py was refactored to use the packet_file_to_datasets
function. The unit tests were updated to reflect changes.

-Added new fixtures for attributes manager, datasets dict, and
 housekeeping dataset.
-Added new tests for new functions (concatenating leak_i data
 and processing housekeeping).
-Added additional assertions for housekeeping dataset.
…ses, but only after hit_l1b is refactored not to use it
… leak_i variables to assign as a dimension. Also change dims to a dict from a list since assign_coords takes in a dictionary
…er. Also add assertion to check values are correct
Copy link
Contributor

@sdhoyt sdhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend running the cli.py as an extra test if you haven't done that already. But I think everything looks good!

@vmartinez-cu vmartinez-cu merged commit ba5ad0b into IMAP-Science-Operations-Center:dev Sep 24, 2024
17 checks passed
@vmartinez-cu vmartinez-cu deleted the update_hit_l1a branch September 24, 2024 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Ins: HIT Related to the HIT instrument Level: L1 Level 1 processing
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants