-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor hit_l1a.py to use packet_file_to_datasets function #828
Refactor hit_l1a.py to use packet_file_to_datasets function #828
Conversation
93ac195
to
b7a5a17
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on this! Just a few comments
imap_processing/hit/l1a/hit_l1a.py
Outdated
|
||
# Concatenate along 'leak_index' and reorder dimensions | ||
stacked_leaks = xr.concat(leak_vars, dim="leak_index").transpose( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the dim here be adc_channels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice use of the xarray functions to do this for you :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, adc_channels is the dim for the leak variable. Dims will be updated later in the process when dims for all the fields get assigned according to the cdf yaml file definitions. I just needed to put a dimension here that made sense as a placeholder.
Assign attributes and dimensions to each data array in the Dataset
for field, data in dataset.data_vars.items():
# Create a list of dimensions using the DEPEND_I keys in the
# attributes
dims = [
value
for key, value in attr_mgr.get_variable_attributes(field).items()
if "DEPEND" in key
]
dataset[field] = xr.DataArray(
data,
dims=dims,
attrs=attr_mgr.get_variable_attributes(field),
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the function to pass in adc_channels and assign it as a dimension
"""Test concatenation of leak_i variables""" | ||
|
||
# Call the function | ||
updated_dataset = concatenate_leak_variables(housekeeping_dataset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be good to check that the values are correct. Maybe before you call this function, remove everything but the first couple packets for housekeeping and hardcode arrays for all the Leak values to check against
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added!
@@ -1,91 +0,0 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also remove the Housekeeping data class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did initially, but it broke my hit l1b tests for housekeeping so to avoid putting those code updates in this PR, I just left the data classes for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! Just a few minor suggestions.
imap_processing/hit/l1a/hit_l1a.py
Outdated
|
||
# Concatenate along 'leak_index' and reorder dimensions | ||
stacked_leaks = xr.concat(leak_vars, dim="leak_index").transpose( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice use of the xarray functions to do this for you :)
"leak_i_raw", | ||
logger.info("Creating HIT L1A housekeeping dataset") | ||
|
||
logical_source = "imap_hit_l1a_hk" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be in the CDF metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logical source variable is used to grab the HIT L1A instrument level attributes from imap_hit_global_cdf_attrs.yaml when updating the dataset attributes dataset.attrs = attr_mgr.get_global_attributes(logical_source)
Attributes from imap_hit_global_cdf_attrs.yaml for L1A housekeeping:
imap_hit_l1a_hk:
<<: *instrument_base
Data_level: 1A
Data_type: L1A_HK>Level-1A Housekeeping
Logical_source: imap_hit_l1a_hk
Logical_source_description: IMAP Mission HIT Instrument Level-1A Housekeeping Data.
Would it help to rename this variable or plug in the value directly into the line that updates the dataset attributes? dataset.attrs = attr_mgr.get_global_attributes(imap_hit_l1a_hk)
dataset = dataset.assign_coords( | ||
{ | ||
"adc_channels": adc_channels, | ||
"adc_channels_label": adc_channels_label, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forget, do the "labels" need to be defined as coordinates as well, or are those variables that are dependent on the coordinates?
i.e. should adc_channels_label
actually be in the data_vars
section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it needs to be added to the coordinates as well because I initially didn't have it in there but then had to add it. I don't remember why though. I'll get clarification on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked with Tenzin and she explained that cdflib updates requires this for data with greater than 1 dimensions
imap_processing/hit/l1a/hit_l1a.py
Outdated
# Assign attributes and dimensions to each data array in the Dataset | ||
for field, data in dataset.data_vars.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will basically be re-creating all of your DataArray
s and resetting the dataset.
Instead could you loop through just the field values and update the current dataset variable?
dataset[field].attrs = attr_mgr.get_variable_attributes(field)
# Something like this where you can just set the name of existing coords
dataset[field].assign_coords(dims)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Updated!
def datasets(packet_filepath): | ||
"""Create datasets from packet file""" | ||
packet_definition = ( | ||
imap_module_directory / "hit/packet_definitions/" "hit_packet_definitions.xml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imap_module_directory / "hit/packet_definitions/" "hit_packet_definitions.xml" | |
imap_module_directory / "hit/packet_definitions/hit_packet_definitions.xml" |
# ---------------- | ||
# Check that the dataset has the correct variables | ||
assert valid_keys == set(processed_hskp_dataset.data_vars.keys()) | ||
assert set(dropped_keys).isdisjoint(set(processed_hskp_dataset.data_vars.keys())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I did not know the isdisjoint
method existed, great use of it!
-Add function to handle concatenating leak_i variables. -Drop variables from housekeeping dataset that aren't needed for the CDF product. -Update dimensions and add attributes to the housekeeping Dataset. -Delete create_datasets function since packet_file_to_datasets. creates xarray datasets and those just need to be updated. This will happen in housekeeping and science data processing functions. -Add function to process science data (WIP). -Clean up code and add/update docstrings and comments.
hit_l1a.py was refactored to use the packet_file_to_datasets function. The unit tests were updated to reflect changes. -Added new fixtures for attributes manager, datasets dict, and housekeeping dataset. -Added new tests for new functions (concatenating leak_i data and processing housekeeping). -Added additional assertions for housekeeping dataset.
…ses, but only after hit_l1b is refactored not to use it
… leak_i variables to assign as a dimension. Also change dims to a dict from a list since assign_coords takes in a dictionary
…er. Also add assertion to check values are correct
35ba6ed
to
19fec46
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend running the cli.py as an extra test if you haven't done that already. But I think everything looks good!
ba5ad0b
into
IMAP-Science-Operations-Center:dev
Updated hit_l1a.py to use the packet_file_to_datasets function instead of using data classes and manually creating xarray datasets for products. Added some new functions to handle housekeeping data processing.
New files
Updated Files
Deleted files
Testing
Issue #822