-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] HIT L0 Science Data Decom #815
base: dev
Are you sure you want to change the base?
[WIP] HIT L0 Science Data Decom #815
Conversation
…t_file_to_datasets that just returned dataset with two packets used for initial testing and work around for issues identified in ccsds file. replace ccsds sample file
…unction to update ccsds header fields to use sc_tick dimension. Replace default epoch dimension with new epoch data array with times per science frame rather than per packet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skimmed it and so far so good. I will look at remaining function first thing on Monday. Nice work!
""" | ||
# sc_tick contains spacecraft time per packet | ||
sci_dataset.coords["sc_tick"] = sci_dataset["sc_tick"] | ||
sci_dataset = sci_dataset.swap_dims({"epoch": "sc_tick"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if you need this function. I believe packet_file_to_datasets
already creates epoch
using sc_tick
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replace the epoch dimension later to only contain one time per science frame rather than per packet since that is what's needed for the science data. However, I still need the per packet sc_tick times for the ccsds header fields since those are not being grouped by science frame. Essentially, I need two time dimensions. One per packet for the ccsds header fields and one per science frame for the science data. I can explain this better in the docstring
imap_processing/hit/l0/decom_hit.py
Outdated
checks if the counters are in sequential order. | ||
|
||
Both conditions need to be met for a science frame to be considered | ||
valid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really helpful doc strings!
imap_processing/hit/l0/decom_hit.py
Outdated
science_frame_start = 0 | ||
while science_frame_start + 20 <= len(sci_dataset.epoch): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like you may want to approach this differently. You may want to find indices of all 1s and 2 and then loop through those and check if those are valid science frame or not. If you start incrementing with 20 from beginning, you may miss good packets due to being off from where you should start. I am happy to jump on call if it helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this would be helpful! I'll find a time to chat more about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this to use numpy.where and numpy.diff. Please take a look when you get a chance!
imap_processing/hit/l0/decom_hit.py
Outdated
# Find indices where sequence flag is 1 (the start of a science frame) | ||
# and filter for indices that are 20 packets apart. These will be the | ||
# starting indices for science frames in the science data. | ||
start_indices: np.array = np.where(seq_flgs == 1)[0] | ||
valid_start_indices = start_indices[np.where(np.diff(start_indices) == 20)[0]] | ||
last_index_of_frame = None | ||
|
||
if valid_start_indices[0] != 0: | ||
# The first start index is not at the beginning of the file. | ||
print( | ||
f"{valid_start_indices[0]} packets at start of file belong to science frame from previous day's ccsds file" | ||
) | ||
# TODO: Will need to handle these packets when processing multiple files | ||
|
||
for i, start in enumerate(valid_start_indices): | ||
# Get sequence flags and counters corresponding to this science frame | ||
seq_flgs_chunk = seq_flgs[start : start + packets_in_frame] | ||
src_seq_ctr_chunk = src_seq_ctrs[start : start + packets_in_frame] | ||
|
||
# Check for valid science frames with proper sequence flags and counters | ||
# and append corresponding science data to lists. | ||
if is_valid_science_frame(seq_flgs_chunk, src_seq_ctr_chunk): | ||
science_data_chunk = science_data[start : start + packets_in_frame] | ||
epoch_data_chunk = epoch_data[start : start + packets_in_frame] | ||
# First 6 packets contain count rates data | ||
count_rates_binary.append("".join(science_data_chunk[:6])) | ||
# Last 14 packets contain pulse height event data | ||
pha_binary.append("".join(science_data_chunk[6:])) | ||
# Just take first packet's epoch for the science frame | ||
epoch_science_frame.append(epoch_data_chunk[0]) | ||
last_index_of_frame = start + packets_in_frame | ||
else: | ||
# TODO: log issue | ||
# Skip invalid science frame and move on to the next one | ||
print( | ||
f"Invalid science frame found with starting packet index = " f"{start}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me. Nice work!
imap_processing/hit/l0/decom_hit.py
Outdated
# Add new data variables to the dataset | ||
epoch_science_frame = np.array(epoch_science_frame) | ||
sci_dataset = sci_dataset.drop_vars("epoch") | ||
sci_dataset.coords["epoch"] = epoch_science_frame | ||
sci_dataset["count_rates_binary"] = xr.DataArray( | ||
count_rates_binary, dims=["epoch"], name="count_rates_binary" | ||
) | ||
sci_dataset["pha_binary"] = xr.DataArray( | ||
pha_binary, dims=["epoch"], name="pha_binary" | ||
) | ||
return sci_dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, you need to figure out how to filter rest of data variables in sci_dataset
to filter out only good science frame's data. Otherwise you will have different data shape between count_rates_binary
, pha_binary
, and epoch
vs CCSDS header variables. You can use .isel
on sci_dataset
.
…ethod to find packets that match a grouping flags pattern. Added smaller functions for readability
This code begins the work of decommutating and decompressing HIT L0 science data. Some initial feedback on the approach used would be much appreciated.
Background info:
packet_file_to_datasets
in utils.py performs an initial decom of HIT ccsds files. This function returns a dictionary with an xarray dataset per APID. For HIT science data (APID = 1251), the xarray dataset contains the unpacked CCSDS headers and the science data in binary to be manually unpacked in the code (this is necessary due to how the TLM file is organized). The science dataset is the input for this code and will return a dataset with decommutated and decompressed science data. So far, this code performs the following tasks:The
decom_hit
function is the starting point for processing the L0 data. This function will be called from the hit_l1a.py file (to be implemented in a future PR)As this is a WIP, I will be working on the following tasks and updating this PR.
-Unit tests
-Fix pre-commit checks
New Files
Deleted Files
Testing