New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[WIP] HIT L0 Science Data Decom #815

Draft

vmartinez-cu wants to merge 13 commits into IMAP-Science-Operations-Center:dev from vmartinez-cu:hit_sci_decom

Contributor

vmartinez-cu commented Sep 6, 2024 •

edited

Loading

This code begins the work of decommutating and decompressing HIT L0 science data. Some initial feedback on the approach used would be much appreciated.

Background info:
packet_file_to_datasets in utils.py performs an initial decom of HIT ccsds files. This function returns a dictionary with an xarray dataset per APID. For HIT science data (APID = 1251), the xarray dataset contains the unpacked CCSDS headers and the science data in binary to be manually unpacked in the code (this is necessary due to how the TLM file is organized). The science dataset is the input for this code and will return a dataset with decommutated and decompressed science data. So far, this code performs the following tasks:

Assemble science frames - a complete set of data (i.e. science frame) consists of science data from 20 packets. This code groups science data into science frames. Checks are made to ensure the packets in a science frame belong together. Invalid science frames are skipped. For each valid science frame, data is further categorized by the l1a products that are needed (i.e. count rates and event data). The first 6 packets in the frame are count rates and the remaining 14 packets are event data. These get added to the dataset as new data variables and are still in binary.
Parse count rates - The count rates data per science frame need to be manually unpacked. The code handles this and adds the data to the dataset as new data variables

The decom_hit function is the starting point for processing the L0 data. This function will be called from the hit_l1a.py file (to be implemented in a future PR)

As this is a WIP, I will be working on the following tasks and updating this PR.
-Unit tests
-Fix pre-commit checks

New Files

decom_hit.py
- L0 data processing
hit_packet_definitions.xml
- combines definitions for all APIDs
sci_sample.ccsds
- sample science data

Deleted Files

P_HIT_HSKP.xml
- replaced with hit_packet_definitions.xml
P_HIT_SCIENCE.xml
- replaced with hit_packet_definitions.xml

Testing

Will be working on this as part of this PR

vmartinez-cu added 8 commits

September 6, 2024 11:17


          Update packet definitions xml file. Add science sample data to repo f…

3519dda

…or testing


          Add new file to develop code to decom L0 data.

b977d63


          Add TODOs

56704c8


          Continue work to parse and decom count rates. WIP

13305b0


          Continue work to parse and decom count rates. WIP

37b4cbc


          WIP - add error handling for bad science frames. remove loop in packe…

17f6bc1

…t_file_to_datasets that just returned dataset with two packets used for initial testing and work around for issues identified in ccsds file. replace ccsds sample file


          Clean up docstrings and comments. Rename variables for clarity. Add f…

c4c4532

…unction to update ccsds header fields to use sc_tick dimension. Replace default epoch dimension with new epoch data array with times per science frame rather than per packet


          Clean up docstrings. Fix some mypy and numpydoc errors. Add some TODOs

cf937f8

vmartinez-cu changed the title ~~WIP: HIT L0 Science Data Decom~~ [WIP] HIT L0 Science Data Decom

vmartinez-cu requested review from sdhoyt, greglucas and tech3371

September 6, 2024 18:15

tech3371 reviewed

View reviewed changes

Contributor

tech3371 left a comment

I skimmed it and so far so good. I will look at remaining function first thing on Monday. Nice work!

imap_processing/hit/l0/decom_hit.py Outdated Show resolved Hide resolved

imap_processing/hit/l0/decom_hit.py Show resolved Hide resolved

imap_processing/hit/l0/decom_hit.py Show resolved Hide resolved

imap_processing/hit/l0/decom_hit.py Show resolved Hide resolved

imap_processing/hit/l0/decom_hit.py Show resolved Hide resolved

This was referenced Sep 9, 2024

HIT: Group L0 science packets into science frames #820

Open

HIT: Unpack L0 count rates data #821

Open

tech3371 reviewed

View reviewed changes

imap_processing/hit/l0/decom_hit.py

+                  """
+                  # sc_tick contains spacecraft time per packet
+                  sci_dataset.coords["sc_tick"] = sci_dataset["sc_tick"]
+                  sci_dataset = sci_dataset.swap_dims({"epoch": "sc_tick"})

Contributor

tech3371 Sep 10, 2024

I am not sure if you need this function. I believe packet_file_to_datasets already creates epoch using sc_tick.

Contributor Author

vmartinez-cu Sep 11, 2024

I replace the epoch dimension later to only contain one time per science frame rather than per packet since that is what's needed for the science data. However, I still need the per packet sc_tick times for the ccsds header fields since those are not being grouped by science frame. Essentially, I need two time dimensions. One per packet for the ccsds header fields and one per science frame for the science data. I can explain this better in the docstring

imap_processing/hit/l0/decom_hit.py Outdated

+                  checks if the counters are in sequential order.
+                  Both conditions need to be met for a science frame to be considered
+                  valid.

Contributor

tech3371 Sep 10, 2024

Really helpful doc strings!

imap_processing/hit/l0/decom_hit.py Outdated

Comment on lines 329 to 330

		science_frame_start = 0
		while science_frame_start + 20 <= len(sci_dataset.epoch):

Contributor

tech3371 Sep 10, 2024

I feel like you may want to approach this differently. You may want to find indices of all 1s and 2 and then loop through those and check if those are valid science frame or not. If you start incrementing with 20 from beginning, you may miss good packets due to being off from where you should start. I am happy to jump on call if it helps.

Contributor Author

vmartinez-cu Sep 11, 2024

Yes, this would be helpful! I'll find a time to chat more about this

Contributor Author

vmartinez-cu Sep 13, 2024

I updated this to use numpy.where and numpy.diff. Please take a look when you get a chance!

imap_processing/hit/l0/decom_hit.py Outdated Show resolved Hide resolved

imap_processing/hit/l0/decom_hit.py Outdated Show resolved Hide resolved

vmartinez-cu added 2 commits

September 10, 2024 10:18


          Add comments for global variables

22c5d80


          Add comments and clarification per PR comments. Add TODO to further p…

d500e79

…arse hdr_status_bits.

vmartinez-cu added Ins: HIT Level: L0 labels

vmartinez-cu added this to the Sept 2024 milestone

vmartinez-cu added 2 commits

September 12, 2024 18:51


          Use numpy operations to filter and group packets of 20 into science f…

76a61c1

…rames. PR suggestion.


          Minor update to clean up code

97cd91b

tech3371 reviewed

View reviewed changes

imap_processing/hit/l0/decom_hit.py Outdated

Comment on lines 331 to 367

+                  # Find indices where sequence flag is 1 (the start of a science frame)
+                  # and filter for indices that are 20 packets apart. These will be the
+                  # starting indices for science frames in the science data.
+                  start_indices: np.array = np.where(seq_flgs == 1)[0]
+                  valid_start_indices = start_indices[np.where(np.diff(start_indices) == 20)[0]]
+                  last_index_of_frame = None
+                  if valid_start_indices[0] != 0:
+                      # The first start index is not at the beginning of the file.
+                      print(
+                          f"{valid_start_indices[0]} packets at start of file belong to science frame from previous day's ccsds file"
+                      )
+                      # TODO: Will need to handle these packets when processing multiple files
+                  for i, start in enumerate(valid_start_indices):
+                      # Get sequence flags and counters corresponding to this science frame
+                      seq_flgs_chunk = seq_flgs[start : start + packets_in_frame]
+                      src_seq_ctr_chunk = src_seq_ctrs[start : start + packets_in_frame]
+                      # Check for valid science frames with proper sequence flags and counters
+                      # and append corresponding science data to lists.
+                      if is_valid_science_frame(seq_flgs_chunk, src_seq_ctr_chunk):
+                          science_data_chunk = science_data[start : start + packets_in_frame]
+                          epoch_data_chunk = epoch_data[start : start + packets_in_frame]
+                          # First 6 packets contain count rates data
+                          count_rates_binary.append("".join(science_data_chunk[:6]))
+                          # Last 14 packets contain pulse height event data
+                          pha_binary.append("".join(science_data_chunk[6:]))
+                          # Just take first packet's epoch for the science frame
+                          epoch_science_frame.append(epoch_data_chunk[0])
+                          last_index_of_frame = start + packets_in_frame
+                      else:
+                          # TODO: log issue
+                          # Skip invalid science frame and move on to the next one
+                          print(
+                              f"Invalid science frame found with starting packet index = " f"{start}"
+                          )

Contributor

tech3371 Sep 13, 2024

That makes sense to me. Nice work!

tech3371 reviewed

View reviewed changes

imap_processing/hit/l0/decom_hit.py Outdated

Comment on lines 378 to 388

+                  # Add new data variables to the dataset
+                  epoch_science_frame = np.array(epoch_science_frame)
+                  sci_dataset = sci_dataset.drop_vars("epoch")
+                  sci_dataset.coords["epoch"] = epoch_science_frame
+                  sci_dataset["count_rates_binary"] = xr.DataArray(
+                      count_rates_binary, dims=["epoch"], name="count_rates_binary"
+                  )
+                  sci_dataset["pha_binary"] = xr.DataArray(
+                      pha_binary, dims=["epoch"], name="pha_binary"
+                  )
+                  return sci_dataset

Contributor

tech3371 Sep 13, 2024

Now, you need to figure out how to filter rest of data variables in sci_dataset to filter out only good science frame's data. Otherwise you will have different data shape between count_rates_binary, pha_binary, and epoch vs CCSDS header variables. You can use .isel on sci_dataset.


          Update approach for grouping packets to use numpy's sliding windows m…

6d29f75

…ethod to find packets that match a grouping flags pattern. Added smaller functions for readability

bourque assigned vmartinez-cu

This was linked to issues Sep 16, 2024

HIT: Group L0 science packets into science frames #820

Open

HIT: Unpack L0 count rates data #821

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ins: HIT Level: L0