diff --git a/_quarto.yml b/_quarto.yml index be470f6..f161418 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -59,6 +59,11 @@ website: - href: propagation/intercomparison.md - section: Guides contents: + - section: Project blueprints + href: guides/blueprints/index.qmd + contents: + - href: guides/blueprints/openeo_ml_project.qmd + text: openEO ML Project Blueprint - section: Developer Guides contents: - section: Authentication diff --git a/eo_service_usage/openeo_usage.md b/eo_service_usage/openeo_usage.md index 47176f3..aa72761 100644 --- a/eo_service_usage/openeo_usage.md +++ b/eo_service_usage/openeo_usage.md @@ -124,33 +124,56 @@ The following example showcases how to use the OpenEO API to execute a synchrono ```curl POST /openeo/1.2/result HTTP/1.1 -Host: openeocloud.vito.be +Host: openeofed.dataspace.copernicus.eu Content-Type: application/json -Authorization: Bearer basic//basic.cHJvag== -Content-Length: 4587 +Authorization: Bearer { "process": { "id": "biopar1", "process_graph": { "biopar1": { "process_id": "biopar", - "namespace": "vito", + "namespace": "https://raw.githubusercontent.com/ESA-APEx/apex_algorithms/refs/heads/main/algorithm_catalog/vito/biopar/openeo_udp/biopar.json", "arguments": { - "bbox": { - "west": 5.15183687210083, - "east": 5.153381824493408, - "south": 51.18192559252128, - "north": 51.18469636040683, - "crs": "EPSG:4326" - }, - "time_range": [ - "2020-05-06", - "2020-05-30" - ] + "biopar_type": "LAI", + "temporal_extent": [ + "2020-06-27", + "2020-07-27" + ], + "spatial_extent": { + "coordinates": [ + [ + [ + 5.179324150085449, + 51.2498689148547 + ], + [ + 5.178744792938232, + 51.24672597710759 + ], + [ + 5.185289382934569, + 51.24504696935156 + ], + [ + 5.18676996231079, + 51.245342479161295 + ], + [ + 5.187370777130127, + 51.24918393390799 + ], + [ + 5.179324150085449, + 51.2498689148547 + ] + ] + ], + "type": "Polygon" + } }, "result": true } - } } } ``` diff --git a/guides/blueprints/index.qmd b/guides/blueprints/index.qmd new file mode 100644 index 0000000..f2c1ad0 --- /dev/null +++ b/guides/blueprints/index.qmd @@ -0,0 +1,11 @@ +--- +title: EO Project Blueprints +--- + +The goal of APEx is to support ESA EO projects in general, and to assist in making their results more interoperable and reusable +after the end of the project. This section provides blueprints for specific types of projects, outlining how to best leverage the +platforms and standards recommended by APEx. + +The material is offered as guidance and inspiration, and is not intended to be prescriptive. + + \ No newline at end of file diff --git a/guides/blueprints/ml_training_bleuprint.svg b/guides/blueprints/ml_training_bleuprint.svg new file mode 100644 index 0000000..a30bbe0 --- /dev/null +++ b/guides/blueprints/ml_training_bleuprint.svg @@ -0,0 +1,4 @@ + + + +
uses
store result
ML Training service
OGC API Process
OR
openEO process
Training CWL
STAC

Reference data Extracts
uses
Data Extraction
openEO UDP
uses
result storage
Extract Public/Private
Training data
openEO
export_workspace
ingest metadata
User/Project Object Storage
S3
 Model training
Reference DataStore
STAC
ML Model Metadata 
Collection
Produce Map
openEO API
Inference UDP
Preprocessing UDP
uses
Application Processes

Common ML application processes supported by the openEO API. 

Validation is not included, but is in fact similar to map production.

UDP's (User defined processes)

These are the workflows defined and maintained by the project. They can be published in the APEx algorithm catalog.

The 'training CWL' is also a project specific application package to train models. 

Platform components

These generic components are provided by (openEO) processing platforms or APEx, and can b
\ No newline at end of file diff --git a/guides/blueprints/openeo_ml_project.qmd b/guides/blueprints/openeo_ml_project.qmd new file mode 100644 index 0000000..0042b78 --- /dev/null +++ b/guides/blueprints/openeo_ml_project.qmd @@ -0,0 +1,41 @@ +--- +title: Machine Learning based EO mapping projects with openEO +--- + +# Relevant project categories + +This blueprint documents an example architecture for projects that: + +- Use machine learning models that involve a training phase and an inference phase. +- Optionally allow the user to retrain the model with user provided training data. +- Produce EO data products such as land cover maps, crop type maps, or other thematic maps. +- Can or aim to use an openEO processing backend that includes the input datasets that are relevant for the project. + +# Example projects based on this blueprint + +ESA [WorldCereal](https://esa-worldcereal.org) and [World Ecosystem Extent Dynamics](https://esa-worldecosystems.org) used a similar architecture as part of their project setup. + +# Overview of the architecture + +In this type of project, we typically identify these common processes: + +- **Collection of reference data**: Gathering of in-situ measurements, expert annotations, or other reference data needed for training and validating the machine learning model. +- **Extraction of (preprocessed) training data**: Based on locations of reference data, extract corresponding preprocessed EO data. +- **Model training**: Using the extracted training data & reference data to train a machine learning model. +- **Model validation**: Evaluating the quality of the trained model using a validation dataset. +- **Map production**: Generating the map, either for a user defined extent, or at large scale (continental/global). + +![openEO ML architecture](./ml_training_bleuprint.svg) + + +## Reference data collection + +While APEx does not provide specific services for reference data collection, we do provide recommendations for storing the +collected data: + +- Use STAC collections to organise and describe the collected reference data. +- Use cloud native file formats (e.g. GeoParquet) to store tabular reference data. +- Define a suitable structure to organize data files, that avoids both too many files and too large files. + +## Extraction of training data +