You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adds support and documentation for esrun style window prep (#203)
* extends esrun_data/sample with runnable fixture data + config
+ updates esrun_data/sample/README.md with window prep
specifics and up-to-date info on prediction
+ adds new rslp main.py entrypoint through rslp/esrun/esrun.py
for window prep
Copy file name to clipboardExpand all lines: esrun_data/sample/README.md
+96-14Lines changed: 96 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,13 @@
2
2
3
3
## What is esrunner?
4
4
5
-
ESRunner provides the [EsPredictRunner](https://github.com/allenai/earth-system-run/blob/josh/esrunner/src/esrun/runner/local/predict_runner.py) (and perhaps eventually the EsFineTuneRunner) class which can be used to run predictions and fine-tuning pipelines outside of the esrun service.
5
+
ESRunner provides:
6
+
7
+
- the [EsPredictRunner](https://github.com/allenai/earth-system-run/blob/josh/esrunner/src/esrun/runner/local/predict_runner.py)
8
+
- the [EsFineTuneRunner](https://github.com/allenai/earth-system-run/blob/josh/esrunner/src/esrun/runner/local/fine_tune_runner.py)
9
+
10
+
classes, which can be used to run prediction and fine-tuning pipelines outside of the esrun service architecture
11
+
6
12
7
13
## Setting up your environment
8
14
@@ -15,11 +21,87 @@ ESRunner provides the [EsPredictRunner](https://github.com/allenai/earth-system-
15
21
## Project Structure
16
22
-`checkpoint.ckpt`: This is the model checkpoint file. It is required for running inference. If you are only building datasets, this file is not required. Note: You probably don't want to check this file into git repository.
17
23
-`dataset.json`: This is the rslearn dataset definition file.
18
-
-`esrun.yaml`: This file defines the behavior of the esrunner including partitioning, postprocessing, etc..
24
+
-`esrun.yaml`: This file defines the behavior of the esrunner including partitioning, postprocessing, training window prep, etc..
19
25
-`model.yaml`: This is the rslearn (pytorch) model definition file.
20
-
-`prediction/test-request1.geojson`: This directory contains the prediction requests in GeoJSON format. Each file represents a set of prediction requests for a specific region or time period. Many different prediction requests can be defined within a single file as separate features in the feature collection. The esrunner will partition these requests into smaller tasks based on the partition strategies defined in `partition_strategies.yaml`.
26
+
-`annotation_features.geojson`: Labeled annotation feature collection, exported from Studio. Only required for labeled window prep.
27
+
-`annotation_task_features.geojson`: Studio tasks for the annotation features, also exported from Studio. Only required for labeled window prep.
28
+
-`prediction/test-request1.geojson`: This directory contains the prediction requests in GeoJSON format. Each file represents a set of prediction requests for a specific region or time period. Many different prediction requests can be defined within a single file as separate features in the feature collection. The esrunner will partition these requests into smaller tasks based on the partition strategies defined in `esrun.yaml#partition_strategies`
29
+
30
+
## Fine-Tuning
31
+
32
+
Fine-tuning is encapsulated in the Fine Tuning Workflow, accessible through `EsFineTuningRunner`. It currently only exposes a method for preparing labeled RSLearn windows from geojson feature collections exported through Earth System Studio. Using it requires your `esrun.yaml` to define the following data processing pipeline:
33
+
34
+
```yaml
35
+
window_prep:
36
+
sampler:
37
+
labeled_window_preparer:
38
+
data_splitter:
39
+
```
40
+
41
+
### sampler
42
+
43
+
Technically optional, defaulting to `NoopSampler`. These classes receive a `list[AnnotationTask]` and are expected to return the same, filtered down by whatever needs your application has.
44
+
45
+
### labeled_window_preparer
46
+
47
+
Transforms individual `AnnotationTask` instances to `list[LabeledWindow[LabeledSTGeometry]]` or `list[LabeledWindow[ndarray]]` depending on whether vector or raster label output layers are desired.
48
+
49
+
Available window preparers:
50
+
- `PointToPixelWindowPreparer`- Converts each annotation feature in a Studio task to a 1x1pixel window with a vector class label
51
+
- `PolygonToRasterWindowPreparer`- Converts a Studio task + its (multi/)polygon annotations into a uint8 2d class matrix
52
+
53
+
### data_splitter
54
+
55
+
Given a `LabeledWindow`, assign it to `train`, `val`, or `test`.
56
+
57
+
Available data splitters:
58
+
- `RandomDataSplitter`- weighted random assignment
59
+
60
+
### Run a pipeline end-to-end
61
+
62
+
A fully functional `esrun.yaml` and set of `.geojson` files is available in `esrun_data/sample` as a reference example.
Window labeling requires ES Studio Task + Annotation-formatted FeatureCollection files. The best way to get compliant
80
+
data is to upload your raw data via Studio's Command Center "Add Dataset" feature, and export to the desired
81
+
format via the "Export Annotations" tab. This will create the required data files in gcs, that you can then download to your working location.
82
+
83
+
### Writing Your Own Samplers
84
+
85
+
You may supply your own data samplers by creating a new class that implements the `SamplerInterface` class in the `esrun.runner.tools.samplers.sampler_interface` module. You can then specify your custom sampler in the `esrun.yaml` file. This
86
+
class must be importable via your PYTHONPATH. Include it as code in this repository or as a new implementation in earth-system-run.git.
87
+
88
+
### Writing Your Own LabeledWindowPreparers
89
+
90
+
You may supply new implementations for converting raw Studio Tasks + Annotations into LabeledWindows. To do so, implement
91
+
either `esrun.runner.tools.labeled_window_preparers.labeled_window_preparer.RasterLabelsWindowPreparer` (for rasterized targets) or `esrun.runner.tools.labeled_window_preparers.labeled_window_preparer.VectorLabelsWindowPreparer` (for vector targets). As with Samplers, these must be importable from your PYTHONPATH and can be referenced by class path in `esrun.yaml`. Include as code in this repository or contribute directly to earth-system-run.git.
92
+
93
+
### Writing Your Own DataPartitioners
94
+
95
+
You may supply your own data partitioners to determine test/eval/train split assignment for a LabeledWindow. To do so, implement `esrun.runner.tools.data_splitter.data_splitter_interface.DataSplitterInterface`.
96
+
97
+
## Inference
98
+
99
+
Inference is encapsulated in the Prediction Workflow, accessible through `EsPredictRunner`. It requires your `esrun.yaml` define:
100
+
101
+
- partitioning strategy
102
+
- post-processing strategy
103
+
104
+
### Partitioning Strategies
23
105
These stanzas defines how esrunner will break the inference request into multiple request geometries for compute parallelization (equivalent to rslearn window groups) and prediction window geometries.
24
106
25
107
Partitioning strategies can be mixed and matched for flexible development.
@@ -43,15 +125,15 @@ prepare_window_geometries:
43
125
window_size: 128 # intended to be a pixel value
44
126
```
45
127
46
-
## Post-Processing Strategies
128
+
### Post-Processing Strategies
47
129
There are 3 different stages to postprocessing:
48
130
- `postprocess_window()`- This is the stage where individual model outputs are converted into a digestible artifact for the next stage.
49
131
- `postprocess_partition()`- This is the stage where the outputs from the window postprocessors are combined into a single per-partition artifact.
50
132
- `postprocess_dataset()`- This is the final stage of postprocessing where the partition level outputs are combined into a artifact.
51
133
52
-
## Samples
134
+
### Samples
53
135
54
-
### Run a pipeline end-to-end
136
+
#### Run a pipeline end-to-end
55
137
56
138
The simplest way to run a pipeline is to use the `esrun-local-predict` CLI command. This command will run the entire pipeline end-to-end including partitioning, dataset building, inference, post-processing, and combining the final outputs.
57
139
```
@@ -79,7 +161,7 @@ for partition_id in partitions:
79
161
runner.combine(partitions)
80
162
```
81
163
82
-
### Run dataset building for the entire prediction request.
164
+
####Run dataset building for the entire prediction request.
83
165
```python file=run_dataset_building.py
84
166
from pathlib import Path
85
167
from esrun.runner.local.predict_runner import EsPredictRunner
@@ -95,7 +177,7 @@ for partition_id in runner.partition():
95
177
runner.build_dataset(partition_id)
96
178
```
97
179
98
-
### Run inference for a single partition.
180
+
####Run inference for a single partition.
99
181
(Assumes you have an existing materialized dataset for the partition.)
100
182
```python file=run_inference_single_partition.py
101
183
from pathlib import Path
@@ -111,7 +193,7 @@ partition_id = 'my-existing-partition-id' # Replace with the actual partition I
111
193
runner.run_inference(partition_id)
112
194
```
113
195
114
-
### Run inference for a single window.
196
+
####Run inference for a single window.
115
197
Since we don't expose window-level inference via the runner API, you can configure your partitioners to produce limited sets of partitions and windows.
116
198
117
199
```yaml file=esrun.yaml
@@ -142,13 +224,13 @@ for partition_id in partitions:
142
224
runner.run_inference(partition_id)
143
225
```
144
226
145
-
## Writing Your Own Partitioners
146
-
You may supply your own partitioners by creating a new class that implements the `PartitionInterface` class in the `esrun.runner.tools.partitioners.partition_interface` module. You can then specify your custom partitioner in the `partition_strategies.yaml` file. This class must exist on your PYTHONPATH and be importable by the esrunner. As such we recommend you place your custom partitioner in the `rslp/common/partitioners` directory of this repository to ensure it gets installed into the final Dockerimage artifact.
227
+
### Writing Your Own Partitioners
228
+
You may supply your own partitioners by creating a new class that implements the `PartitionInterface` class in the `esrun.runner.tools.partitioners.partition_interface` module. You can then specify your custom partitioner in the `esrun.yaml` file. This class must exist on your PYTHONPATH and be importable by the esrunner. As such we recommend you place your custom partitioner in the `rslp/common/partitioners` directory of this repository to ensure it gets installed into the final Dockerimage artifact.
147
229
148
-
## Writing your own post-processing strategies
230
+
### Writing your own post-processing strategies
149
231
You may supply your own post-processing strategies by creating a new class that implements the `PostprocessInterface` class in the `esrun.runner.tools.postprocessors.postprocess_inferface` module. You can then specify your custom post-processing strategy in the `postprocessing_strategies.yaml` file. This class must exist on your `PYTHONPATH` and be importable by the esrunner. As such we recommend you place your custom post-processing strategy in the `rslp/common/postprocessing` directory of this repository to ensure it gets installed into the final Docker image artifact.
See the [earth-system-run](https://github.com/allenai/earth-system-run) repository for tests covering existing [partitioner](https://github.com/allenai/earth-system-run/tree/v1-develop/tests/unit/runner/tools/partitioners) and [post-processor](https://github.com/allenai/earth-system-run/tree/v1-develop/tests/unit/runner/tools/postprocessors) implementations.
153
235
154
236
## Longer Term Vision / Model Development Workflow
0 commit comments