allenai
diff --git a/‎esrun_data/sample/README.md‎
Lines changed: 96 additions & 14 deletions b/‎esrun_data/sample/README.md‎
Lines changed: 96 additions & 14 deletions
diff --git a/‎esrun_data/sample/annotation_features.geojson‎
Lines changed: 120 additions & 0 deletions b/‎esrun_data/sample/annotation_features.geojson‎
Lines changed: 120 additions & 0 deletions
diff --git a/‎esrun_data/sample/annotation_task_features.geojson‎
Lines changed: 44 additions & 0 deletions b/‎esrun_data/sample/annotation_task_features.geojson‎
Lines changed: 44 additions & 0 deletions
diff --git a/‎esrun_data/sample/esrun.yaml‎
Lines changed: 38 additions & 0 deletions b/‎esrun_data/sample/esrun.yaml‎
Lines changed: 38 additions & 0 deletions
diff --git a/‎esrun_data/sample/partition_strategies.yaml‎
Lines changed: 0 additions & 9 deletions b/‎esrun_data/sample/partition_strategies.yaml‎
Lines changed: 0 additions & 9 deletions
diff --git a/‎esrun_data/sample/postprocessing_strategies.yaml‎
Lines changed: 0 additions & 8 deletions b/‎esrun_data/sample/postprocessing_strategies.yaml‎
Lines changed: 0 additions & 8 deletions
@@ -2,7 +2,13 @@
 
 ## What is esrunner?
 
-ESRunner provides the [EsPredictRunner](https://github.com/allenai/earth-system-run/blob/josh/esrunner/src/esrun/runner/local/predict_runner.py) (and perhaps eventually the EsFineTuneRunner) class which can be used to run predictions and fine-tuning pipelines outside of the esrun service.
+ESRunner provides:
+
+- the [EsPredictRunner](https://github.com/allenai/earth-system-run/blob/josh/esrunner/src/esrun/runner/local/predict_runner.py)
+- the [EsFineTuneRunner](https://github.com/allenai/earth-system-run/blob/josh/esrunner/src/esrun/runner/local/fine_tune_runner.py)
+
+classes, which can be used to run prediction and fine-tuning pipelines outside of the esrun service architecture
+
 
 ## Setting up your environment
 
@@ -15,11 +21,87 @@ ESRunner provides the [EsPredictRunner](https://github.com/allenai/earth-system-
 ## Project Structure
 - `checkpoint.ckpt`:  This is the model checkpoint file. It is required for running inference. If you are only building datasets, this file is not required.  Note: You probably don't want to check this file into git repository.
 - `dataset.json`: This is the rslearn dataset definition file.
-- `esrun.yaml`: This file defines the behavior of the esrunner including partitioning, postprocessing, etc..
+- `esrun.yaml`: This file defines the behavior of the esrunner including partitioning, postprocessing, training window prep, etc..
 - `model.yaml`: This is the rslearn (pytorch) model definition file.
-- `prediction/test-request1.geojson`: This directory contains the prediction requests in GeoJSON format. Each file represents a set of prediction requests for a specific region or time period.  Many different prediction requests can be defined within a single file as separate features in the feature collection. The esrunner will partition these requests into smaller tasks based on the partition strategies defined in `partition_strategies.yaml`.
+- `annotation_features.geojson`: Labeled annotation feature collection, exported from Studio. Only required for labeled window prep.
+- `annotation_task_features.geojson`: Studio tasks for the annotation features, also exported from Studio. Only required for labeled window prep.
+- `prediction/test-request1.geojson`: This directory contains the prediction requests in GeoJSON format. Each file represents a set of prediction requests for a specific region or time period.  Many different prediction requests can be defined within a single file as separate features in the feature collection. The esrunner will partition these requests into smaller tasks based on the partition strategies defined in `esrun.yaml#partition_strategies`
+
+## Fine-Tuning
+
+Fine-tuning is encapsulated in the Fine Tuning Workflow, accessible through `EsFineTuningRunner`. It currently only exposes a method for preparing labeled RSLearn windows from geojson feature collections exported through Earth System Studio. Using it requires your `esrun.yaml` to define the following data processing pipeline:
+
+```yaml
+window_prep:
+  sampler:
+  labeled_window_preparer:
+  data_splitter:
+```
+
+### sampler
+
+Technically optional, defaulting to `NoopSampler`. These classes receive a `list[AnnotationTask]` and are expected to return the same, filtered down by whatever needs your application has.
+
+### labeled_window_preparer
+
+Transforms individual `AnnotationTask` instances to `list[LabeledWindow[LabeledSTGeometry]]` or `list[LabeledWindow[ndarray]]` depending on whether vector or raster label output layers are desired.
+
+Available window preparers:
+  - `PointToPixelWindowPreparer` - Converts each annotation feature in a Studio task to a 1x1pixel window with a vector class label
+  - `PolygonToRasterWindowPreparer` - Converts a Studio task + its (multi/)polygon annotations into a uint8 2d class matrix
+
+### data_splitter
+
+Given a `LabeledWindow`, assign it to `train`, `val`, or `test`.
+
+Available data splitters:
+  - `RandomDataSplitter` - weighted random assignment
+
+### Run a pipeline end-to-end
+
+A fully functional `esrun.yaml` and set of `.geojson` files is available in `esrun_data/sample` as a reference example.
+Exercise it via:
 
-## Partitioning Strategies
+```
+python -m rslp.main esrun prepare_labeled_windows \
+    --project_path esrun_data/sample \
+    --scratch_path /tmp/scratch
+```
+
+to produce labeled training windows at:
+
+```
+/tmp/scratch/dataset
+```
+
+### Getting the geojson files
+
+Window labeling requires ES Studio Task + Annotation-formatted FeatureCollection files. The best way to get compliant
+data is to upload your raw data via Studio's Command Center "Add Dataset" feature, and export to the desired
+format via the "Export Annotations" tab. This will create the required data files in gcs, that you can then download to your working location.
+
+### Writing Your Own Samplers
+
+You may supply your own data samplers by creating a new class that implements the `SamplerInterface` class in the `esrun.runner.tools.samplers.sampler_interface` module. You can then specify your custom sampler in the `esrun.yaml` file. This
+class must be importable via your PYTHONPATH. Include it as code in this repository or as a new implementation in earth-system-run.git.
+
+### Writing Your Own LabeledWindowPreparers
+
+You may supply new implementations for converting raw Studio Tasks + Annotations into LabeledWindows. To do so, implement
+either `esrun.runner.tools.labeled_window_preparers.labeled_window_preparer.RasterLabelsWindowPreparer` (for rasterized targets) or `esrun.runner.tools.labeled_window_preparers.labeled_window_preparer.VectorLabelsWindowPreparer` (for vector targets). As with Samplers, these must be importable from your PYTHONPATH and can be referenced by class path in `esrun.yaml`. Include as code in this repository or contribute directly to earth-system-run.git.
+
+### Writing Your Own DataPartitioners
+
+You may supply your own data partitioners to determine test/eval/train split assignment for a LabeledWindow. To do so, implement `esrun.runner.tools.data_splitter.data_splitter_interface.DataSplitterInterface`.
+
+## Inference
+
+Inference is encapsulated in the Prediction Workflow, accessible through `EsPredictRunner`. It requires your `esrun.yaml` define:
+
+- partitioning strategy
+- post-processing strategy
+
+### Partitioning Strategies
 These stanzas defines how esrunner will break the inference request into multiple request geometries for compute parallelization (equivalent to rslearn window groups) and prediction window geometries.
 
 Partitioning strategies can be mixed and matched for flexible development.
@@ -43,15 +125,15 @@ prepare_window_geometries:
     window_size: 128 # intended to be a pixel value
 ```
 
-## Post-Processing Strategies
+### Post-Processing Strategies
 There are 3 different stages to postprocessing:
   - `postprocess_window()` - This is the stage where individual model outputs are converted into a digestible artifact for the next stage.
   - `postprocess_partition()` - This is the stage where the outputs from the window postprocessors are combined into a single per-partition artifact.
   - `postprocess_dataset()` - This is the final stage of postprocessing where the partition level outputs are combined into a artifact.
 
-## Samples
+### Samples
 
-### Run a pipeline end-to-end
+#### Run a pipeline end-to-end
 
 The simplest way to run a pipeline is to use the `esrun-local-predict` CLI command.  This command will run the entire pipeline end-to-end including partitioning, dataset building, inference, post-processing, and combining the final outputs.
 ```
@@ -79,7 +161,7 @@ for partition_id in partitions:
 runner.combine(partitions)
 ```
 
-### Run dataset building for the entire prediction request.
+#### Run dataset building for the entire prediction request.
 ```python file=run_dataset_building.py
 from pathlib import Path
 from esrun.runner.local.predict_runner import EsPredictRunner
@@ -95,7 +177,7 @@ for partition_id in runner.partition():
     runner.build_dataset(partition_id)
 ```
 
-### Run inference for a single partition.
+#### Run inference for a single partition.
 (Assumes you have an existing materialized dataset for the partition.)
 ```python file=run_inference_single_partition.py
 from pathlib import Path
@@ -111,7 +193,7 @@ partition_id = 'my-existing-partition-id'  # Replace with the actual partition I
 runner.run_inference(partition_id)
 ```
 
-### Run inference for a single window.
+#### Run inference for a single window.
 Since we don't expose window-level inference via the runner API, you can configure your partitioners to produce limited sets of partitions and windows.
 
 ```yaml file=esrun.yaml
@@ -142,13 +224,13 @@ for partition_id in partitions:
     runner.run_inference(partition_id)
 ```
 
-## Writing Your Own Partitioners
-You may supply your own partitioners by creating a new class that implements the ` PartitionInterface` class in the `esrun.runner.tools.partitioners.partition_interface` module.  You can then specify your custom partitioner in the `partition_strategies.yaml` file.  This class must exist on your PYTHONPATH and be importable by the esrunner.  As such we recommend you place your custom partitioner in the `rslp/common/partitioners` directory of this repository to ensure it gets installed into the final Dockerimage artifact.
+### Writing Your Own Partitioners
+You may supply your own partitioners by creating a new class that implements the ` PartitionInterface` class in the `esrun.runner.tools.partitioners.partition_interface` module.  You can then specify your custom partitioner in the `esrun.yaml` file.  This class must exist on your PYTHONPATH and be importable by the esrunner.  As such we recommend you place your custom partitioner in the `rslp/common/partitioners` directory of this repository to ensure it gets installed into the final Dockerimage artifact.
 
-## Writing your own post-processing strategies
+### Writing your own post-processing strategies
 You may supply your own post-processing strategies by creating a new class that implements the `PostprocessInterface` class in the `esrun.runner.tools.postprocessors.postprocess_inferface` module.  You can then specify your custom post-processing strategy in the `postprocessing_strategies.yaml` file.  This class must exist on your `PYTHONPATH` and be importable by the esrunner.  As such we recommend you place your custom post-processing strategy in the `rslp/common/postprocessing` directory of this repository to ensure it gets installed into the final Docker image artifact.
 
-### Testing Partitioner & Post-Processing Implementations
+#### Testing Partitioner & Post-Processing Implementations
 See the [earth-system-run](https://github.com/allenai/earth-system-run) repository for tests covering existing [partitioner](https://github.com/allenai/earth-system-run/tree/v1-develop/tests/unit/runner/tools/partitioners) and [post-processor](https://github.com/allenai/earth-system-run/tree/v1-develop/tests/unit/runner/tools/postprocessors) implementations.
 
 ## Longer Term Vision / Model Development Workflow
 
@@ -0,0 +1,120 @@
+{
+  "bbox": null,
+  "features": [
+    {
+      "bbox": null,
+      "geometry": {
+        "bbox": null,
+        "coordinates": [
+          -118.15229910783472,
+          33.7362279043863
+        ],
+        "type": "Point"
+      },
+      "id": null,
+      "properties": {
+        "es_annotations_task_id": "164679b9-04ed-5b35-b438-9677104067fc",
+        "es_end_time": "2024-02-24 05:57:00+00:00",
+        "es_label": 1,
+        "es_start_time": "2024-02-24 05:57:00+00:00"
+      },
+      "type": "Feature"
+    },
+    {
+      "bbox": null,
+      "geometry": {
+        "bbox": null,
+        "coordinates": [
+          -118.16075404958488,
+          33.73583415874556
+        ],
+        "type": "Point"
+      },
+      "id": null,
+      "properties": {
+        "es_annotations_task_id": "164679b9-04ed-5b35-b438-9677104067fc",
+        "es_end_time": "2024-02-24 05:57:00+00:00",
+        "es_label": 3,
+        "es_start_time": "2024-02-24 05:57:00+00:00"
+      },
+      "type": "Feature"
+    },
+    {
+      "bbox": null,
+      "geometry": {
+        "bbox": null,
+        "coordinates": [
+          -118.15642511940884,
+          33.72936522166563
+        ],
+        "type": "Point"
+      },
+      "id": null,
+      "properties": {
+        "es_annotations_task_id": "164679b9-04ed-5b35-b438-9677104067fc",
+        "es_end_time": "2024-02-24 05:57:00+00:00",
+        "es_label": 1,
+        "es_start_time": "2024-02-24 05:57:00+00:00"
+      },
+      "type": "Feature"
+    },
+    {
+      "bbox": null,
+      "geometry": {
+        "bbox": null,
+        "coordinates": [
+          -118.16548881696505,
+          33.72936522166563
+        ],
+        "type": "Point"
+      },
+      "id": null,
+      "properties": {
+        "es_annotations_task_id": "164679b9-04ed-5b35-b438-9677104067fc",
+        "es_end_time": "2024-02-24 05:57:00+00:00",
+        "es_label": 4,
+        "es_start_time": "2024-02-24 05:57:00+00:00"
+      },
+      "type": "Feature"
+    },
+    {
+      "bbox": null,
+      "geometry": {
+        "bbox": null,
+        "coordinates": [
+          -118.17495835172541,
+          33.72919646025821
+        ],
+        "type": "Point"
+      },
+      "id": null,
+      "properties": {
+        "es_annotations_task_id": "164679b9-04ed-5b35-b438-9677104067fc",
+        "es_end_time": "2024-02-24 05:57:00+00:00",
+        "es_label": 3,
+        "es_start_time": "2024-02-24 05:57:00+00:00"
+      },
+      "type": "Feature"
+    },
+    {
+      "bbox": null,
+      "geometry": {
+        "bbox": null,
+        "coordinates": [
+          -118.15933361937081,
+          33.70708584678741
+        ],
+        "type": "Point"
+      },
+      "id": null,
+      "properties": {
+        "es_annotations_task_id": "164679b9-04ed-5b35-b438-9677104067fc",
+        "es_end_time": "2024-02-24 05:57:00+00:00",
+        "es_label": 4,
+        "es_start_time": "2024-02-24 05:57:00+00:00"
+      },
+      "type": "Feature"
+    }
+  ],
+  "type": "FeatureCollection"
+}
@@ -0,0 +1,44 @@
+{
+  "bbox": null,
+  "features": [
+    {
+      "bbox": null,
+      "geometry": {
+        "bbox": null,
+        "coordinates": [
+          [
+            [
+              -118.20492778,
+              33.74270278
+            ],
+            [
+              -118.14966389,
+              33.74323056
+            ],
+            [
+              -118.14904722,
+              33.69706111
+            ],
+            [
+              -118.20428333,
+              33.69653611
+            ],
+            [
+              -118.20492778,
+              33.74270278
+            ]
+          ]
+        ],
+        "type": "Polygon"
+      },
+      "id": null,
+      "properties": {
+        "es_annotations_task_id": "164679b9-04ed-5b35-b438-9677104067fc",
+        "es_end_time": "2024-02-24 05:57:04+00:00",
+        "es_start_time": "2024-02-24 05:57:00+00:00"
+      },
+      "type": "Feature"
+    }
+  ],
+  "type": "FeatureCollection"
+}
@@ -0,0 +1,38 @@
+partition_strategies:
+  partition_request_geometry:
+    class_path: esrun.runner.tools.partitioners.noop_partitioner.NoopPartitioner
+    init_args:
+
+  prepare_window_geometries:
+    class_path: esrun.runner.tools.partitioners.fixed_window_partitioner.FixedWindowPartitioner
+    init_args:
+      window_size: 128 # intended to be a pixel value
+
+postprocessing_strategies:
+  process_dataset:
+    class_path: esrun.runner.tools.postprocessors.noop_raster.NoopRaster
+
+  process_partition:
+    class_path: esrun.runner.tools.postprocessors.noop_raster.NoopRaster
+
+  process_window:
+    class_path: esrun.runner.tools.postprocessors.noop_raster.NoopRaster
+
+window_prep:
+  sampler:
+    class_path: esrun.runner.tools.samplers.noop_sampler.NoopSampler
+  labeled_window_preparer:
+    class_path: esrun.runner.tools.labeled_window_preparers.point_to_pixel_window_preparer.PointToPixelWindowPreparer
+    init_args:
+      window_resolution: 10.0
+  data_splitter:
+    class_path: esrun.runner.tools.data_splitters.random_data_splitter.RandomDataSplitter
+    init_args:
+      train_prop: 0.8
+      val_prop: 0.2
+      test_prop: 0.0
+      seed: 42
+  label_layer: "labels"
+  label_property: "category"
+  group_name: "post_random_split"
+  split_property: "split"