Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train sampling #134

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Train sampling #134

wants to merge 4 commits into from

Conversation

Katsutoshii
Copy link
Collaborator

Instead of a sliding window, each batch now selects a random (simulation, chunk, timestep) per example to improve diversity per batch.

Copy link
Contributor

@waltaskew waltaskew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has train-notebook as its merge base rather than main, which I think it should be.

import numpy
from numpy.typing import NDArray
import tensorflow as tf

from usl_models.flood_ml import constants
from usl_models.flood_ml import metastore
from usl_models.flood_ml import model
from usl_models.flood_ml.model import FloodModel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd been sticking with the style guide which sez

Use import statements for packages and modules only, not for individual types, classes, or functions.

@@ -102,11 +107,12 @@ def generator():


def load_dataset_windowed(
sim_names: list[str],
sim_names: List[str],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been using the built-in list[str] rather than importing List[str] consistently in the code, so let's stick with that. I believe they're equivalent.

usl_models/usl_models/flood_ml/dataset.py Outdated Show resolved Hide resolved
usl_models/usl_models/flood_ml/dataset.py Outdated Show resolved Hide resolved
(sim_name, temporal_meta, geo_feature_meta, label_meta)
)

if shuffle:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we ever not want to shuffle? Could we just remove this as an option? Or at least make the default True since I think that's the preferred behavior?

The chunk lists that come out of the metastore I don't think will be guaranteed to come in a consistent order, so that could be the source of an implicit shuffle already (I think we'd need to add a sort to get_spatial_feature_and_label_chunk_metadata if we wanted to guarantee order there.)

Base automatically changed from train-notebook to main July 19, 2024 16:38
- consistent style fixes
- add rainfall_duration to mock metadata
- use for loop rather than while + pop
- default shuffle to true
- remove debug print call
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants