Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoDICE data product organization refactor #826

Open
wants to merge 7 commits into
base: dev
Choose a base branch
from

Conversation

bourque
Copy link
Collaborator

@bourque bourque commented Sep 11, 2024

Apologies for the large PR here, I couldn't see a great way to split this up into smaller PRs. This PR is a somewhat large refactor of the CoDICE L1a processing pipeline in order to allow better flexibility in defining and processing the 18 different CoDICE L1a data products. Before, I was treating all CoDICE-lo products and all CoDICE-hi products as if they could be built the same way, but I have found that there are enough subtle differences between all of these products that they should be built in a more individualized way. As such, I have introduced a few new 'data product configuration' variables (namely CDF coordinates, CDF dimensions, and CDF data variables) that can be defined on the per-product level. In doing so, I have created a few more focused methods and renamed a few things where I think it made sense to.

@bourque bourque added Ins: CoDICE Related to the CoDICE instrument Level: L1 Level 1 processing labels Sep 11, 2024
@bourque bourque added this to the Sept 2024 milestone Sep 11, 2024
@bourque bourque requested review from joeymukherjee and a team September 11, 2024 19:03
@bourque bourque self-assigned this Sep 11, 2024
@bourque bourque requested review from greglucas, subagonsouth and tech3371 and removed request for a team September 11, 2024 19:03
Comment on lines +156 to +157
# Reshape to 4 dimensions to allow for epoch dimension
reshaped_variable_data = np.expand_dims(variable_data, axis=0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem quite right. This is adding an extra empty dimension, shouldn't we be reshaping the data you have into the proper dimensions? (Similar to what you had before I think)

Will this only ever have one epoch variable, or could there be more than that?

variable_data.reshape(
                    (
                        len(dataset["epoch"],
                        self.num_energy_steps,
                        self.num_positions,
                        self.num_spin_sectors,
                    )

@@ -288,33 +305,23 @@ def get_energy_table(self) -> None:

# Get the appropriate values
sweep_table = sweep_data[sweep_data["table_idx"] == sweep_table_id]
self.energy_table = sweep_table["esa_v"].values
energy_table: list[float] = sweep_table["esa_v"].values
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't these numpy arrays? I wonder if we need to bring in typechecking stubs from pandas and/or xarray to help with this... (For another time)

Suggested change
energy_table: list[float] = sweep_table["esa_v"].values
energy_table: NDArray[float] = sweep_table["esa_v"].values

Comment on lines +363 to +364
config = constants.DATA_PRODUCT_CONFIGURATIONS.get(apid) # type: ignore[call-overload]
self.coords_to_include = config["coords"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to set a self.config = config, and then lookup these values later since they are all named identically. self.config["dataset_name"] rather than self.dataset_name doesn't seem too bad.

imap_processing/codice/constants.py Show resolved Hide resolved
Copy link
Contributor

@tech3371 tech3371 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a ton of good work. I followed almost of the change except the last one. Complicated but logic flows well IMO.

"esa_step",
"energy_label",
], # TODO: These will likely change
"dataset_name": "imap_codice_l1a_hi_counters_aggregated",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to follow filename convention. If so, can you change this and others to use - instead of _ for the descriptor names?

Suggested change
"dataset_name": "imap_codice_l1a_hi_counters_aggregated",
"dataset_name": "imap_codice_l1a_hi_counters-aggregated",

"num_counters": 8,
"num_energy_steps": 15, # TODO: Double check with Joey
"num_positions": 4, # TODO: Double check with Joey
"num_spin_sectors": 1,
"support_variables": ["data_quality", "spin_period"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that there are these two extra data variable stored in CDF file and others only has science data stored?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly!

@@ -509,9 +552,11 @@ def process_codice_l1a(file_path: Path, data_version: str) -> xr.Dataset:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that you are only processing first packet from packet_dataset. Is that right?

"""
for variable_name in self.support_variables:
if variable_name == "energy_table":
variable_data = self.get_energy_table()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function getting energy value in engineering units or is this getting energy step value, Eg. [0,1,2,....127]?

Comment on lines 155 to 171
for variable_data, variable_name in zip(self.data, self.variable_names):
# Data arrays are structured depending on the instrument
if self.instrument == "lo":
variable_data_arr = np.array(variable_data).reshape(
(
1,
self.num_positions,
self.num_spin_sectors,
self.num_energy_steps,
)
)
dims = ["epoch", "inst_az", "spin_sector", "esa_step"]
elif self.instrument == "hi":
variable_data_arr = np.array(variable_data).reshape(
(
1,
self.num_energy_steps,
self.num_positions,
self.num_spin_sectors,
)
)
dims = ["epoch", "esa_step", "inst_az", "spin_sector"]
# Reshape to 4 dimensions to allow for epoch dimension
reshaped_variable_data = np.expand_dims(variable_data, axis=0)

# Get the CDF attributes
cdf_attrs_key = (
f"{self.dataset_name.split('imap_codice_l1a_')[-1]}-{variable_name}"
)
attrs = cdf_attrs.get_variable_attributes(cdf_attrs_key)
attrs = self.cdf_attrs.get_variable_attributes(cdf_attrs_key)

# Create the CDF data variable
dataset[variable_name] = xr.DataArray(
variable_data_arr,
reshaped_variable_data,
name=variable_name,
dims=dims,
dims=self.dims,
attrs=attrs,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following pretty good up until this. I am not following how self.data is associated with self.variable_names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ins: CoDICE Related to the CoDICE instrument Level: L1 Level 1 processing
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants