Train sampling #134

Katsutoshii · 2024-07-18T00:32:41Z

Instead of a sliding window, each batch now selects a random (simulation, chunk, timestep) per example to improve diversity per batch.

…ding window of the same data in each batch.

…limateiq-cnn into train-sampling

waltaskew

This PR has train-notebook as its merge base rather than main, which I think it should be.

waltaskew · 2024-07-18T20:51:48Z

usl_models/usl_models/flood_ml/dataset.py

 import numpy
 from numpy.typing import NDArray
 import tensorflow as tf

 from usl_models.flood_ml import constants
 from usl_models.flood_ml import metastore
-from usl_models.flood_ml import model
+from usl_models.flood_ml.model import FloodModel


We'd been sticking with the style guide which sez

Use import statements for packages and modules only, not for individual types, classes, or functions.

waltaskew · 2024-07-18T20:52:34Z

usl_models/usl_models/flood_ml/dataset.py

@@ -102,11 +107,12 @@ def generator():


 def load_dataset_windowed(
-    sim_names: list[str],
+    sim_names: List[str],


We've been using the built-in list[str] rather than importing List[str] consistently in the code, so let's stick with that. I believe they're equivalent.

usl_models/usl_models/flood_ml/dataset.py

waltaskew · 2024-07-18T20:58:32Z

usl_models/usl_models/flood_ml/dataset.py

+                (sim_name, temporal_meta, geo_feature_meta, label_meta)
+            )
+
+    if shuffle:


Would we ever not want to shuffle? Could we just remove this as an option? Or at least make the default True since I think that's the preferred behavior?

The chunk lists that come out of the metastore I don't think will be guaranteed to come in a consistent order, so that could be the source of an implicit shuffle already (I think we'd need to add a sort to get_spatial_feature_and_label_chunk_metadata if we wanted to guarantee order there.)

- consistent style fixes - add rainfall_duration to mock metadata

- use for loop rather than while + pop - default shuffle to true - remove debug print call

Katsutoshii added 2 commits July 18, 2024 00:31

Construct truly random batches for windowed training instead of a sli…

e4e700f

…ding window of the same data in each batch.

Merge branch 'train-notebook' of https://github.com/UrbanSystemsLab/c…

5391d0e

…limateiq-cnn into train-sampling

waltaskew reviewed Jul 18, 2024

View reviewed changes

Base automatically changed from train-notebook to main July 19, 2024 16:38

waltaskew added 2 commits July 19, 2024 10:06

walt fixes

1ee855a

- consistent style fixes - add rainfall_duration to mock metadata

walt fixes

4a3943e

- use for loop rather than while + pop - default shuffle to true - remove debug print call

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train sampling #134

Train sampling #134

Katsutoshii commented Jul 18, 2024

waltaskew left a comment

waltaskew Jul 18, 2024

waltaskew Jul 18, 2024

waltaskew Jul 18, 2024

Train sampling #134

Are you sure you want to change the base?

Train sampling #134

Conversation

Katsutoshii commented Jul 18, 2024

waltaskew left a comment

Choose a reason for hiding this comment

waltaskew Jul 18, 2024

Choose a reason for hiding this comment

waltaskew Jul 18, 2024

Choose a reason for hiding this comment

waltaskew Jul 18, 2024

Choose a reason for hiding this comment