Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] HIT L0 Science Data Decom #815

Draft
wants to merge 13 commits into
base: dev
Choose a base branch
from

Conversation

vmartinez-cu
Copy link
Contributor

@vmartinez-cu vmartinez-cu commented Sep 6, 2024

This code begins the work of decommutating and decompressing HIT L0 science data. Some initial feedback on the approach used would be much appreciated.

Background info:
packet_file_to_datasets in utils.py performs an initial decom of HIT ccsds files. This function returns a dictionary with an xarray dataset per APID. For HIT science data (APID = 1251), the xarray dataset contains the unpacked CCSDS headers and the science data in binary to be manually unpacked in the code (this is necessary due to how the TLM file is organized). The science dataset is the input for this code and will return a dataset with decommutated and decompressed science data. So far, this code performs the following tasks:

  • Assemble science frames - a complete set of data (i.e. science frame) consists of science data from 20 packets. This code groups science data into science frames. Checks are made to ensure the packets in a science frame belong together. Invalid science frames are skipped. For each valid science frame, data is further categorized by the l1a products that are needed (i.e. count rates and event data). The first 6 packets in the frame are count rates and the remaining 14 packets are event data. These get added to the dataset as new data variables and are still in binary.
  • Parse count rates - The count rates data per science frame need to be manually unpacked. The code handles this and adds the data to the dataset as new data variables

The decom_hit function is the starting point for processing the L0 data. This function will be called from the hit_l1a.py file (to be implemented in a future PR)

As this is a WIP, I will be working on the following tasks and updating this PR.
-Unit tests
-Fix pre-commit checks

New Files

  • decom_hit.py
    • L0 data processing
  • hit_packet_definitions.xml
    • combines definitions for all APIDs
  • sci_sample.ccsds
    • sample science data

Deleted Files

  • P_HIT_HSKP.xml
    • replaced with hit_packet_definitions.xml
  • P_HIT_SCIENCE.xml
    • replaced with hit_packet_definitions.xml

Testing

  • Will be working on this as part of this PR

…t_file_to_datasets that just returned dataset with two packets used for initial testing and work around for issues identified in ccsds file. replace ccsds sample file
…unction to update ccsds header fields to use sc_tick dimension. Replace default epoch dimension with new epoch data array with times per science frame rather than per packet
@vmartinez-cu vmartinez-cu changed the title WIP: HIT L0 Science Data Decom [WIP] HIT L0 Science Data Decom Sep 6, 2024
Copy link
Contributor

@tech3371 tech3371 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed it and so far so good. I will look at remaining function first thing on Monday. Nice work!

imap_processing/hit/l0/decom_hit.py Outdated Show resolved Hide resolved
imap_processing/hit/l0/decom_hit.py Show resolved Hide resolved
imap_processing/hit/l0/decom_hit.py Show resolved Hide resolved
imap_processing/hit/l0/decom_hit.py Show resolved Hide resolved
imap_processing/hit/l0/decom_hit.py Show resolved Hide resolved
"""
# sc_tick contains spacecraft time per packet
sci_dataset.coords["sc_tick"] = sci_dataset["sc_tick"]
sci_dataset = sci_dataset.swap_dims({"epoch": "sc_tick"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if you need this function. I believe packet_file_to_datasets already creates epoch using sc_tick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replace the epoch dimension later to only contain one time per science frame rather than per packet since that is what's needed for the science data. However, I still need the per packet sc_tick times for the ccsds header fields since those are not being grouped by science frame. Essentially, I need two time dimensions. One per packet for the ccsds header fields and one per science frame for the science data. I can explain this better in the docstring

checks if the counters are in sequential order.

Both conditions need to be met for a science frame to be considered
valid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really helpful doc strings!

Comment on lines 329 to 330
science_frame_start = 0
while science_frame_start + 20 <= len(sci_dataset.epoch):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like you may want to approach this differently. You may want to find indices of all 1s and 2 and then loop through those and check if those are valid science frame or not. If you start incrementing with 20 from beginning, you may miss good packets due to being off from where you should start. I am happy to jump on call if it helps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this would be helpful! I'll find a time to chat more about this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this to use numpy.where and numpy.diff. Please take a look when you get a chance!

imap_processing/hit/l0/decom_hit.py Outdated Show resolved Hide resolved
imap_processing/hit/l0/decom_hit.py Outdated Show resolved Hide resolved
@vmartinez-cu vmartinez-cu added Ins: HIT Related to the HIT instrument Level: L0 Level 0 processing labels Sep 12, 2024
@vmartinez-cu vmartinez-cu added this to the Sept 2024 milestone Sep 12, 2024
Comment on lines 331 to 367
# Find indices where sequence flag is 1 (the start of a science frame)
# and filter for indices that are 20 packets apart. These will be the
# starting indices for science frames in the science data.
start_indices: np.array = np.where(seq_flgs == 1)[0]
valid_start_indices = start_indices[np.where(np.diff(start_indices) == 20)[0]]
last_index_of_frame = None

if valid_start_indices[0] != 0:
# The first start index is not at the beginning of the file.
print(
f"{valid_start_indices[0]} packets at start of file belong to science frame from previous day's ccsds file"
)
# TODO: Will need to handle these packets when processing multiple files

for i, start in enumerate(valid_start_indices):
# Get sequence flags and counters corresponding to this science frame
seq_flgs_chunk = seq_flgs[start : start + packets_in_frame]
src_seq_ctr_chunk = src_seq_ctrs[start : start + packets_in_frame]

# Check for valid science frames with proper sequence flags and counters
# and append corresponding science data to lists.
if is_valid_science_frame(seq_flgs_chunk, src_seq_ctr_chunk):
science_data_chunk = science_data[start : start + packets_in_frame]
epoch_data_chunk = epoch_data[start : start + packets_in_frame]
# First 6 packets contain count rates data
count_rates_binary.append("".join(science_data_chunk[:6]))
# Last 14 packets contain pulse height event data
pha_binary.append("".join(science_data_chunk[6:]))
# Just take first packet's epoch for the science frame
epoch_science_frame.append(epoch_data_chunk[0])
last_index_of_frame = start + packets_in_frame
else:
# TODO: log issue
# Skip invalid science frame and move on to the next one
print(
f"Invalid science frame found with starting packet index = " f"{start}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me. Nice work!

Comment on lines 378 to 388
# Add new data variables to the dataset
epoch_science_frame = np.array(epoch_science_frame)
sci_dataset = sci_dataset.drop_vars("epoch")
sci_dataset.coords["epoch"] = epoch_science_frame
sci_dataset["count_rates_binary"] = xr.DataArray(
count_rates_binary, dims=["epoch"], name="count_rates_binary"
)
sci_dataset["pha_binary"] = xr.DataArray(
pha_binary, dims=["epoch"], name="pha_binary"
)
return sci_dataset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, you need to figure out how to filter rest of data variables in sci_dataset to filter out only good science frame's data. Otherwise you will have different data shape between count_rates_binary, pha_binary, and epoch vs CCSDS header variables. You can use .isel on sci_dataset.

…ethod to find packets that match a grouping flags pattern. Added smaller functions for readability
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ins: HIT Related to the HIT instrument Level: L0 Level 0 processing
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

HIT: Unpack L0 count rates data HIT: Group L0 science packets into science frames
2 participants