-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MMEarth dataset #2202
Add MMEarth dataset #2202
Conversation
@adamjstewart and @ando-shah I think as a basic Dataset this does the job. For any more specific requirements we can subclass this to support other model specific needs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a new test to ruff
Planning to get a working draft of SatlasPretrain first and then we can modify this PR to ensure it matches and meets @ando-shah's needs. These will be some of our first multi-modal datasets, so want to make sure we're doing things consistently (at least at an external user-facing API level). |
if modality not in self.all_modalities: | ||
raise ValueError(f"'{modality}' is an invalid modality name.") | ||
|
||
def _validate_modality_bands(self, modality_bands: dict[str, list[str]]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validation is good, but not if it makes the code too complex.
Ando requested that Sentinel 1 should be separated into ascending and descending. That turned out to be a bit tricky, since MMEarth only stores "sentinel1" as a concatenation of ascending and descending, but I think I implemented it correctly to support band selection and normalization etc |
As another point regarding Sentinel1:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data loader seems much more complicated than it needs to be. Does it satisfy all of Ando's requirements?
Yeah all the feedback I have gotten from him, I have incorporated. I agree it is complicated but it's also loading data from >10 modalities so not a typical single image/target dataset. He said he will try it now and give any other feedback. |
@ando-shah found a bug in my implementation. The available band names were different from the sample specific band name, for example in era5, they have the current or previous date of the sample included in the band name, so not a general name, and therefore it was not correctly retrieved. |
@adamjstewart @ando-shah while trying to write a subclass based on a metadata file, an annoying part is the |
Then |
Co-authored-by: Adam J. Stewart <[email protected]>
@adamjstewart okay I think we can merge this then, and handle everything else in a subclass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still a bit more complicated than I would like, but it gets the job done.
Just realized we forgot to add a plot method to this dataset: #2377 I'll see if any of the remote sensing seminar students are interested in adding this. If not, may ask you to add a plot after deadlines pass and you have more time. |
This PR adds the MMEarth dataset. Implementation is tested with the 100k version under the assumption that the dataset format is identical for the other versions.
There are some expected changes to the dataset so still a draft. But opening a PR already for some discussion points:
transform
functions that would define the sample as needed for their models?