Skip to content

Commit 52b31d1

Browse files
authored
Adds support and documentation for esrun style window prep (#203)
* extends esrun_data/sample with runnable fixture data + config + updates esrun_data/sample/README.md with window prep specifics and up-to-date info on prediction + adds new rslp main.py entrypoint through rslp/esrun/esrun.py for window prep
1 parent cb78159 commit 52b31d1

File tree

9 files changed

+335
-32
lines changed

9 files changed

+335
-32
lines changed

esrun_data/sample/README.md

Lines changed: 96 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,13 @@
22

33
## What is esrunner?
44

5-
ESRunner provides the [EsPredictRunner](https://github.com/allenai/earth-system-run/blob/josh/esrunner/src/esrun/runner/local/predict_runner.py) (and perhaps eventually the EsFineTuneRunner) class which can be used to run predictions and fine-tuning pipelines outside of the esrun service.
5+
ESRunner provides:
6+
7+
- the [EsPredictRunner](https://github.com/allenai/earth-system-run/blob/josh/esrunner/src/esrun/runner/local/predict_runner.py)
8+
- the [EsFineTuneRunner](https://github.com/allenai/earth-system-run/blob/josh/esrunner/src/esrun/runner/local/fine_tune_runner.py)
9+
10+
classes, which can be used to run prediction and fine-tuning pipelines outside of the esrun service architecture
11+
612

713
## Setting up your environment
814

@@ -15,11 +21,87 @@ ESRunner provides the [EsPredictRunner](https://github.com/allenai/earth-system-
1521
## Project Structure
1622
- `checkpoint.ckpt`: This is the model checkpoint file. It is required for running inference. If you are only building datasets, this file is not required. Note: You probably don't want to check this file into git repository.
1723
- `dataset.json`: This is the rslearn dataset definition file.
18-
- `esrun.yaml`: This file defines the behavior of the esrunner including partitioning, postprocessing, etc..
24+
- `esrun.yaml`: This file defines the behavior of the esrunner including partitioning, postprocessing, training window prep, etc..
1925
- `model.yaml`: This is the rslearn (pytorch) model definition file.
20-
- `prediction/test-request1.geojson`: This directory contains the prediction requests in GeoJSON format. Each file represents a set of prediction requests for a specific region or time period. Many different prediction requests can be defined within a single file as separate features in the feature collection. The esrunner will partition these requests into smaller tasks based on the partition strategies defined in `partition_strategies.yaml`.
26+
- `annotation_features.geojson`: Labeled annotation feature collection, exported from Studio. Only required for labeled window prep.
27+
- `annotation_task_features.geojson`: Studio tasks for the annotation features, also exported from Studio. Only required for labeled window prep.
28+
- `prediction/test-request1.geojson`: This directory contains the prediction requests in GeoJSON format. Each file represents a set of prediction requests for a specific region or time period. Many different prediction requests can be defined within a single file as separate features in the feature collection. The esrunner will partition these requests into smaller tasks based on the partition strategies defined in `esrun.yaml#partition_strategies`
29+
30+
## Fine-Tuning
31+
32+
Fine-tuning is encapsulated in the Fine Tuning Workflow, accessible through `EsFineTuningRunner`. It currently only exposes a method for preparing labeled RSLearn windows from geojson feature collections exported through Earth System Studio. Using it requires your `esrun.yaml` to define the following data processing pipeline:
33+
34+
```yaml
35+
window_prep:
36+
sampler:
37+
labeled_window_preparer:
38+
data_splitter:
39+
```
40+
41+
### sampler
42+
43+
Technically optional, defaulting to `NoopSampler`. These classes receive a `list[AnnotationTask]` and are expected to return the same, filtered down by whatever needs your application has.
44+
45+
### labeled_window_preparer
46+
47+
Transforms individual `AnnotationTask` instances to `list[LabeledWindow[LabeledSTGeometry]]` or `list[LabeledWindow[ndarray]]` depending on whether vector or raster label output layers are desired.
48+
49+
Available window preparers:
50+
- `PointToPixelWindowPreparer` - Converts each annotation feature in a Studio task to a 1x1pixel window with a vector class label
51+
- `PolygonToRasterWindowPreparer` - Converts a Studio task + its (multi/)polygon annotations into a uint8 2d class matrix
52+
53+
### data_splitter
54+
55+
Given a `LabeledWindow`, assign it to `train`, `val`, or `test`.
56+
57+
Available data splitters:
58+
- `RandomDataSplitter` - weighted random assignment
59+
60+
### Run a pipeline end-to-end
61+
62+
A fully functional `esrun.yaml` and set of `.geojson` files is available in `esrun_data/sample` as a reference example.
63+
Exercise it via:
2164

22-
## Partitioning Strategies
65+
```
66+
python -m rslp.main esrun prepare_labeled_windows \
67+
--project_path esrun_data/sample \
68+
--scratch_path /tmp/scratch
69+
```
70+
71+
to produce labeled training windows at:
72+
73+
```
74+
/tmp/scratch/dataset
75+
```
76+
77+
### Getting the geojson files
78+
79+
Window labeling requires ES Studio Task + Annotation-formatted FeatureCollection files. The best way to get compliant
80+
data is to upload your raw data via Studio's Command Center "Add Dataset" feature, and export to the desired
81+
format via the "Export Annotations" tab. This will create the required data files in gcs, that you can then download to your working location.
82+
83+
### Writing Your Own Samplers
84+
85+
You may supply your own data samplers by creating a new class that implements the `SamplerInterface` class in the `esrun.runner.tools.samplers.sampler_interface` module. You can then specify your custom sampler in the `esrun.yaml` file. This
86+
class must be importable via your PYTHONPATH. Include it as code in this repository or as a new implementation in earth-system-run.git.
87+
88+
### Writing Your Own LabeledWindowPreparers
89+
90+
You may supply new implementations for converting raw Studio Tasks + Annotations into LabeledWindows. To do so, implement
91+
either `esrun.runner.tools.labeled_window_preparers.labeled_window_preparer.RasterLabelsWindowPreparer` (for rasterized targets) or `esrun.runner.tools.labeled_window_preparers.labeled_window_preparer.VectorLabelsWindowPreparer` (for vector targets). As with Samplers, these must be importable from your PYTHONPATH and can be referenced by class path in `esrun.yaml`. Include as code in this repository or contribute directly to earth-system-run.git.
92+
93+
### Writing Your Own DataPartitioners
94+
95+
You may supply your own data partitioners to determine test/eval/train split assignment for a LabeledWindow. To do so, implement `esrun.runner.tools.data_splitter.data_splitter_interface.DataSplitterInterface`.
96+
97+
## Inference
98+
99+
Inference is encapsulated in the Prediction Workflow, accessible through `EsPredictRunner`. It requires your `esrun.yaml` define:
100+
101+
- partitioning strategy
102+
- post-processing strategy
103+
104+
### Partitioning Strategies
23105
These stanzas defines how esrunner will break the inference request into multiple request geometries for compute parallelization (equivalent to rslearn window groups) and prediction window geometries.
24106

25107
Partitioning strategies can be mixed and matched for flexible development.
@@ -43,15 +125,15 @@ prepare_window_geometries:
43125
window_size: 128 # intended to be a pixel value
44126
```
45127

46-
## Post-Processing Strategies
128+
### Post-Processing Strategies
47129
There are 3 different stages to postprocessing:
48130
- `postprocess_window()` - This is the stage where individual model outputs are converted into a digestible artifact for the next stage.
49131
- `postprocess_partition()` - This is the stage where the outputs from the window postprocessors are combined into a single per-partition artifact.
50132
- `postprocess_dataset()` - This is the final stage of postprocessing where the partition level outputs are combined into a artifact.
51133

52-
## Samples
134+
### Samples
53135

54-
### Run a pipeline end-to-end
136+
#### Run a pipeline end-to-end
55137

56138
The simplest way to run a pipeline is to use the `esrun-local-predict` CLI command. This command will run the entire pipeline end-to-end including partitioning, dataset building, inference, post-processing, and combining the final outputs.
57139
```
@@ -79,7 +161,7 @@ for partition_id in partitions:
79161
runner.combine(partitions)
80162
```
81163

82-
### Run dataset building for the entire prediction request.
164+
#### Run dataset building for the entire prediction request.
83165
```python file=run_dataset_building.py
84166
from pathlib import Path
85167
from esrun.runner.local.predict_runner import EsPredictRunner
@@ -95,7 +177,7 @@ for partition_id in runner.partition():
95177
runner.build_dataset(partition_id)
96178
```
97179

98-
### Run inference for a single partition.
180+
#### Run inference for a single partition.
99181
(Assumes you have an existing materialized dataset for the partition.)
100182
```python file=run_inference_single_partition.py
101183
from pathlib import Path
@@ -111,7 +193,7 @@ partition_id = 'my-existing-partition-id' # Replace with the actual partition I
111193
runner.run_inference(partition_id)
112194
```
113195

114-
### Run inference for a single window.
196+
#### Run inference for a single window.
115197
Since we don't expose window-level inference via the runner API, you can configure your partitioners to produce limited sets of partitions and windows.
116198

117199
```yaml file=esrun.yaml
@@ -142,13 +224,13 @@ for partition_id in partitions:
142224
runner.run_inference(partition_id)
143225
```
144226
145-
## Writing Your Own Partitioners
146-
You may supply your own partitioners by creating a new class that implements the ` PartitionInterface` class in the `esrun.runner.tools.partitioners.partition_interface` module. You can then specify your custom partitioner in the `partition_strategies.yaml` file. This class must exist on your PYTHONPATH and be importable by the esrunner. As such we recommend you place your custom partitioner in the `rslp/common/partitioners` directory of this repository to ensure it gets installed into the final Dockerimage artifact.
227+
### Writing Your Own Partitioners
228+
You may supply your own partitioners by creating a new class that implements the ` PartitionInterface` class in the `esrun.runner.tools.partitioners.partition_interface` module. You can then specify your custom partitioner in the `esrun.yaml` file. This class must exist on your PYTHONPATH and be importable by the esrunner. As such we recommend you place your custom partitioner in the `rslp/common/partitioners` directory of this repository to ensure it gets installed into the final Dockerimage artifact.
147229

148-
## Writing your own post-processing strategies
230+
### Writing your own post-processing strategies
149231
You may supply your own post-processing strategies by creating a new class that implements the `PostprocessInterface` class in the `esrun.runner.tools.postprocessors.postprocess_inferface` module. You can then specify your custom post-processing strategy in the `postprocessing_strategies.yaml` file. This class must exist on your `PYTHONPATH` and be importable by the esrunner. As such we recommend you place your custom post-processing strategy in the `rslp/common/postprocessing` directory of this repository to ensure it gets installed into the final Docker image artifact.
150232

151-
### Testing Partitioner & Post-Processing Implementations
233+
#### Testing Partitioner & Post-Processing Implementations
152234
See the [earth-system-run](https://github.com/allenai/earth-system-run) repository for tests covering existing [partitioner](https://github.com/allenai/earth-system-run/tree/v1-develop/tests/unit/runner/tools/partitioners) and [post-processor](https://github.com/allenai/earth-system-run/tree/v1-develop/tests/unit/runner/tools/postprocessors) implementations.
153235

154236
## Longer Term Vision / Model Development Workflow
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
{
2+
"bbox": null,
3+
"features": [
4+
{
5+
"bbox": null,
6+
"geometry": {
7+
"bbox": null,
8+
"coordinates": [
9+
-118.15229910783472,
10+
33.7362279043863
11+
],
12+
"type": "Point"
13+
},
14+
"id": null,
15+
"properties": {
16+
"es_annotations_task_id": "164679b9-04ed-5b35-b438-9677104067fc",
17+
"es_end_time": "2024-02-24 05:57:00+00:00",
18+
"es_label": 1,
19+
"es_start_time": "2024-02-24 05:57:00+00:00"
20+
},
21+
"type": "Feature"
22+
},
23+
{
24+
"bbox": null,
25+
"geometry": {
26+
"bbox": null,
27+
"coordinates": [
28+
-118.16075404958488,
29+
33.73583415874556
30+
],
31+
"type": "Point"
32+
},
33+
"id": null,
34+
"properties": {
35+
"es_annotations_task_id": "164679b9-04ed-5b35-b438-9677104067fc",
36+
"es_end_time": "2024-02-24 05:57:00+00:00",
37+
"es_label": 3,
38+
"es_start_time": "2024-02-24 05:57:00+00:00"
39+
},
40+
"type": "Feature"
41+
},
42+
{
43+
"bbox": null,
44+
"geometry": {
45+
"bbox": null,
46+
"coordinates": [
47+
-118.15642511940884,
48+
33.72936522166563
49+
],
50+
"type": "Point"
51+
},
52+
"id": null,
53+
"properties": {
54+
"es_annotations_task_id": "164679b9-04ed-5b35-b438-9677104067fc",
55+
"es_end_time": "2024-02-24 05:57:00+00:00",
56+
"es_label": 1,
57+
"es_start_time": "2024-02-24 05:57:00+00:00"
58+
},
59+
"type": "Feature"
60+
},
61+
{
62+
"bbox": null,
63+
"geometry": {
64+
"bbox": null,
65+
"coordinates": [
66+
-118.16548881696505,
67+
33.72936522166563
68+
],
69+
"type": "Point"
70+
},
71+
"id": null,
72+
"properties": {
73+
"es_annotations_task_id": "164679b9-04ed-5b35-b438-9677104067fc",
74+
"es_end_time": "2024-02-24 05:57:00+00:00",
75+
"es_label": 4,
76+
"es_start_time": "2024-02-24 05:57:00+00:00"
77+
},
78+
"type": "Feature"
79+
},
80+
{
81+
"bbox": null,
82+
"geometry": {
83+
"bbox": null,
84+
"coordinates": [
85+
-118.17495835172541,
86+
33.72919646025821
87+
],
88+
"type": "Point"
89+
},
90+
"id": null,
91+
"properties": {
92+
"es_annotations_task_id": "164679b9-04ed-5b35-b438-9677104067fc",
93+
"es_end_time": "2024-02-24 05:57:00+00:00",
94+
"es_label": 3,
95+
"es_start_time": "2024-02-24 05:57:00+00:00"
96+
},
97+
"type": "Feature"
98+
},
99+
{
100+
"bbox": null,
101+
"geometry": {
102+
"bbox": null,
103+
"coordinates": [
104+
-118.15933361937081,
105+
33.70708584678741
106+
],
107+
"type": "Point"
108+
},
109+
"id": null,
110+
"properties": {
111+
"es_annotations_task_id": "164679b9-04ed-5b35-b438-9677104067fc",
112+
"es_end_time": "2024-02-24 05:57:00+00:00",
113+
"es_label": 4,
114+
"es_start_time": "2024-02-24 05:57:00+00:00"
115+
},
116+
"type": "Feature"
117+
}
118+
],
119+
"type": "FeatureCollection"
120+
}
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
{
2+
"bbox": null,
3+
"features": [
4+
{
5+
"bbox": null,
6+
"geometry": {
7+
"bbox": null,
8+
"coordinates": [
9+
[
10+
[
11+
-118.20492778,
12+
33.74270278
13+
],
14+
[
15+
-118.14966389,
16+
33.74323056
17+
],
18+
[
19+
-118.14904722,
20+
33.69706111
21+
],
22+
[
23+
-118.20428333,
24+
33.69653611
25+
],
26+
[
27+
-118.20492778,
28+
33.74270278
29+
]
30+
]
31+
],
32+
"type": "Polygon"
33+
},
34+
"id": null,
35+
"properties": {
36+
"es_annotations_task_id": "164679b9-04ed-5b35-b438-9677104067fc",
37+
"es_end_time": "2024-02-24 05:57:04+00:00",
38+
"es_start_time": "2024-02-24 05:57:00+00:00"
39+
},
40+
"type": "Feature"
41+
}
42+
],
43+
"type": "FeatureCollection"
44+
}

esrun_data/sample/esrun.yaml

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
partition_strategies:
2+
partition_request_geometry:
3+
class_path: esrun.runner.tools.partitioners.noop_partitioner.NoopPartitioner
4+
init_args:
5+
6+
prepare_window_geometries:
7+
class_path: esrun.runner.tools.partitioners.fixed_window_partitioner.FixedWindowPartitioner
8+
init_args:
9+
window_size: 128 # intended to be a pixel value
10+
11+
postprocessing_strategies:
12+
process_dataset:
13+
class_path: esrun.runner.tools.postprocessors.noop_raster.NoopRaster
14+
15+
process_partition:
16+
class_path: esrun.runner.tools.postprocessors.noop_raster.NoopRaster
17+
18+
process_window:
19+
class_path: esrun.runner.tools.postprocessors.noop_raster.NoopRaster
20+
21+
window_prep:
22+
sampler:
23+
class_path: esrun.runner.tools.samplers.noop_sampler.NoopSampler
24+
labeled_window_preparer:
25+
class_path: esrun.runner.tools.labeled_window_preparers.point_to_pixel_window_preparer.PointToPixelWindowPreparer
26+
init_args:
27+
window_resolution: 10.0
28+
data_splitter:
29+
class_path: esrun.runner.tools.data_splitters.random_data_splitter.RandomDataSplitter
30+
init_args:
31+
train_prop: 0.8
32+
val_prop: 0.2
33+
test_prop: 0.0
34+
seed: 42
35+
label_layer: "labels"
36+
label_property: "category"
37+
group_name: "post_random_split"
38+
split_property: "split"

esrun_data/sample/partition_strategies.yaml

Lines changed: 0 additions & 9 deletions
This file was deleted.

esrun_data/sample/postprocessing_strategies.yaml

Lines changed: 0 additions & 8 deletions
This file was deleted.

0 commit comments

Comments
 (0)