Skip to content

Conversation

@weiwilliam
Copy link
Contributor

@weiwilliam weiwilliam commented Dec 5, 2025

Description

This PR fixed the bug of inconsistent dimension of winmsk when --thin option is activated.

Issue(s) addressed

Resolves #1727

Dependencies

No dependency.

Impact

No impact.

Checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have run the unit tests before creating the PR

@CoryMartin-NOAA
Copy link
Contributor

Shouldn't you want to thin after the time window reduction?

What if the user gives it a days worth of data, thins 99% and then clips based on the window? Wouldn't that result in fewer obs than clipping the window and then thinning by 99%?

@weiwilliam
Copy link
Contributor Author

Thank you, @CoryMartin-NOAA
I think clipping based on time should apply first, then thinning.
I will modify it.

It makes me wonder which way makes more sense.

  1. thinning in each raw obs file and then aggregating to the final array (current design)
  2. thinning in the final aggregated array (apply after the loop of files)
    What do you think?

@CoryMartin-NOAA
Copy link
Contributor

The current design is probably cheaper computationally, as it will not require as much memory?

@weiwilliam
Copy link
Contributor Author

I see, let me try.
I will update the number here.

@weiwilliam
Copy link
Contributor Author

I tried several ways and found that thinning is not a big deal.
The use of np.append slows down the code when the size of the appending array is large (low thinning value).
Using lists and np.concatenate outside of the for loop works best compared to using np.append in the for loop.
For a 6-hour cycle using --thin 0.2 with NOAA VIIRS AOD from a whole day,

  1. original design (thinning and np.append in for-loop): 4m 42s, ~3.8 GB memory usage
  2. using lists, thinning in for-loop, and then np.concatenate after for-loop: 2m 38s, ~6 GB memory usage
  3. using lists, np.concatenate, and then thinning: 2m 38s, ~7 GB memory usage

Thinning inside or outside the for-loop doesn't change the obs amount too much.
A few hundred points difference out of 0.8 million points when --thin 0.99.
A few thousand points difference out of 65 million points when --thin 0.2.

Copy link
Collaborator

@BenjaminRuston BenjaminRuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @weiwilliam this looks reasonable did not exercise heavily so if more is needed please let me know

@weiwilliam
Copy link
Contributor Author

@BenjaminRuston Thank you. Do you want me to change the use of np.append?

@BenjaminRuston
Copy link
Collaborator

@rajichidamb please have a look,, @CoryMartin-NOAA any objections to moving this forward

@BenjaminRuston
Copy link
Collaborator

@weiwilliam

Do you want me to change the use of np.append?

my feeling is to move forward and enter new issue if this is desired, a priority and resources are available to execute

removal is to reduce memory usage? hoping dask can be added to stack can also help

Copy link

@mer-a-o mer-a-o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. A minor suggestion: It would be helpful to fix the typo here https://github.com/JCSDA-internal/ioda-converters/pull/1729/files#diff-91d7dfb5a3c9fb1ed8371be1c7ee0d8260574ed3aa60c812eaa207e690c82697L374 and explain that thinning is done after time clipping.

self.obs_time = self.obs_time[mask_thin]

# Write out data
# create the final mask for output based on time window
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# create the final mask for output based on time window
# after the thinning above apply a mask based on time window

@mer-a-o is this the line you were referring to and did I interpret correctly please correct as needed @weiwilliam nothing else is needed correct

Copy link
Collaborator

@BenjaminRuston BenjaminRuston Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, now not confused,,, looks like can commit please @weiwilliam or @mer-a-o do so if this looks like the change desired and we'll move this forward

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Masking bug in viirs_aod2ioda.py

5 participants