-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Train sampling #134
base: main
Are you sure you want to change the base?
Train sampling #134
Conversation
…ding window of the same data in each batch.
…limateiq-cnn into train-sampling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has train-notebook
as its merge base rather than main
, which I think it should be.
import numpy | ||
from numpy.typing import NDArray | ||
import tensorflow as tf | ||
|
||
from usl_models.flood_ml import constants | ||
from usl_models.flood_ml import metastore | ||
from usl_models.flood_ml import model | ||
from usl_models.flood_ml.model import FloodModel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd been sticking with the style guide which sez
Use import statements for packages and modules only, not for individual types, classes, or functions.
@@ -102,11 +107,12 @@ def generator(): | |||
|
|||
|
|||
def load_dataset_windowed( | |||
sim_names: list[str], | |||
sim_names: List[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've been using the built-in list[str]
rather than importing List[str]
consistently in the code, so let's stick with that. I believe they're equivalent.
(sim_name, temporal_meta, geo_feature_meta, label_meta) | ||
) | ||
|
||
if shuffle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we ever not want to shuffle? Could we just remove this as an option? Or at least make the default True since I think that's the preferred behavior?
The chunk lists that come out of the metastore I don't think will be guaranteed to come in a consistent order, so that could be the source of an implicit shuffle already (I think we'd need to add a sort to get_spatial_feature_and_label_chunk_metadata
if we wanted to guarantee order there.)
- consistent style fixes - add rainfall_duration to mock metadata
- use for loop rather than while + pop - default shuffle to true - remove debug print call
Instead of a sliding window, each batch now selects a random (simulation, chunk, timestep) per example to improve diversity per batch.