-
Notifications
You must be signed in to change notification settings - Fork 2
OGC Hackaton 2019
Author: Christophe Noël, Spacebel Status: draft discussion paper Date: 26th June 2019 Summary: this reports mostly addresses guidelines about how to express a Process Description as a OpenAPI operation instead of the ususal specific schema. The document also covers a set of others enhancements that have been proposed by Spacebel during the OGC Hackaton 2019 in London.
Spacebel participated to the OGC Hackaton 2019 in London.
We have provided an OpenAPI/REST OGC Processing API as described in the OGC Testbed 14 project (NextGen thread), the server implementation was integrated in multiple clients (e.g. Solenix). Spacebel also provided some improvements suggested for the OGC Processing API.
The present report mostly addresses method about how to express a Process Description as a OpenAPI operation instead of the ususal specific schema. The document also covers a set of others enhancements that have been proposed by Spacebel during the OGC Hackaton 2019 in London.
A major discussion during the OGC Hackaton 2019 (London) was about describing the process interfaces (i.e. the Process Description including inputs and outputs definitions) directly in JSON Schema and/or OpenAPI.
Indeed, as OpenAPI already supports descriptions of schemas/interfaces, this would remove the need of using a specific Process Description schema, and this would also ease the work of Web client developers.
Nevertheless, the proposition was not understood and foreseen in the same way by various participants. The following approaches have been identified:
Only the input and output type needs to be defined using JSON Schema. The schema thus replaces literal data type (boolean, string, integer, etc.) and complex data type(actually schema + encoding + mime type) with all format supported by JSON Schema. See an example below.
{
"processDescription": {
"processVersion": "1.0.0",
"process": {
- "id": "myProcessId",
- "inputs": [
{
"id": "files",
"schema": {
"type": "string",
"format": "uri"
}
}
],
+ "outputs": [...]
}
}
}
From the point of view of Spacebel, this would be particularily relevant when the inputs are themselves JSON documents. Nevertheless, changing only the type definition would not bring much (this is only a small part of the process description). Globally, we don’t really see the purpose of such minimal change.
In the second approach, the whole Process Description is defined in JSON Schema, and typically, processes are exposed as REST resources in the service OGC Processing OpenAPI document.
For each existing process, a resource with a HTTP path is defined in the API document. Therefore, processes are discovered through the API document (no need of discovery and describe process operations) and the Client directly knows how to invoke the execution. The API document is illustrated below:
openapi: 3.0.0
paths:
# For each process the path and interface is provided as below
'/processes/mySampleProcess/jobs':
post:
summary: Start mySampleProcess execution.
requestBody:
content:
application/json:
schema:
$ref: '#/components/schemas/mySampleProcessExecuteRequest'
responses:
'201':
headers:
Location:
schema:
type: string
description: URL to check the status of the execution/job.
The example showned above shows the execution operation definition. A discussion further address the relevance of defining a specific (per process) result operation for reading outputs.
Each process is defined as an exposed API operation (the process identifier is provided in the operation path). The example below illustrates a typical schema for the execute request of a process. The detailed mappings with the WPS specific are detailed in the next sections.
mySampleProcessExecuteRequest:
allOf:
# Execution parameters (mode, response) can simply be refered by a link
- $ref: "executionRequest.yaml"
- type: object
required:
# indicates the inputs that are mandatory (optional by default)
- input1
properties:
# Process inputs are defined below
input1:
type: array
title: provided title
items:
$ref: 'genericInputType.yaml'
# alternative with format
input2:
type: string
input3:
type: boolean
# Outputs IDs list: user choose the transmission by value or ref.
outputs:
type: object
properties:
imageOutput:
type: transmissionMode.yaml
As illustrated in the example, the execution request includes the following parts:
-
A set of standard execution parameters supported by the OGC Processing API and always included all process definitions. This may include the usual mode (sync, async) and response (raw, document) attributes.
-
The list of inputs, defined as properties.
-
The list of outputs listed as properties of the output element. The output identifiers are known by the client by this mean, and the client may also include or exclude the individual outputs.
The usual Title, and Abstract properties (for each input, output, and the process itself) can be easily mapped to their JSON Schema equivalent listed below:
-
Title -→ title
-
Abstract -→ description
The Keywords property requires an specification extension (https://swagger.io/specification/#specificationExtensions) such as x-keywords (see the section 'JSON Schema extension for Processing')
input1:
type: integer
title: 'my input title'
description: 'my input abstract'
x-keywords:
- keyword1
- keyword2
Metadata providing additional information about the process , individual inputs and outputs is a fundamental part of the process description which is mandatory in most projects (OGC Testbeds, SSEGrid, ESE, etc.). The metadata fields provide additional information for different kind of purpose, e.g. :
-
Additional (geo) constraints on a particular input (spatial, temporal) or providing any additional information (e.g. unit of measure originally embedded in the ows:UOM).
-
Indication about how to retrieve the input data from another (OGC) service: this is typically handled by an OWS context document (e.g. the WFS endpoint with a list of restricted collections).
-
Transactional-related information: such process description typically include memory/cpu requirements, backend execution customization (e.g. CWL document), or target execution cluster parameters.
The usual generic <ows:Metadata> element (which includes xlink:role and xlink:ref attributes) can easily be mapped to an OpenAPI External Documentation Object for providing a single reference.
input1:
type: integer
externalDocs:
description: "myExternalDoc"
url: "https://example.com"
The very common <ows:additionalProperties> element which allows to provide key value pairs can be provided using a standard Processing API extension such as "x-properties" (see further section on Specification Extensions).
input1:
type: integer
x-properties:
- uom: "km/s"
- eoSource: "Landsat-8"
Finally, the Processing API may define some Specification Extensions properties used for embedding very common geospatial information. For example, any OWS context information could be provided by defining the "x-context" property simply defined as an JSON Object.
An example below of an input that can be retrieve from a OGC WMS service:
input1:
type: integer
x-context:
- offering:
code: "http://www.opengis.net/spec/owc-atom/1.0/req/wms"
operation:
- code: "GetMap"
method: "GET"
href: "http://acme.org/wms?REQUEST=GetMap&SRS=EPSG:4326&BBOX=3.5,-1.3,34,-1.8
We believe the OGC API for Processes specification should define the following properties as extension:
-
x-properties: list of key-value string pairs
-
x-context: JSON object (intended for embedding JSON OWS Context)
-
And optional x-links: list of links with a role attribute
Of course, this does not prevent any implementation to define additional project-specific extensions.
The following JSON Schema properties shall be used to map the minOccurs and maxOccurs definitions:
-
'Required' attribute in case the property is mandatory (minOccurs >= 1)
-
'Array' type in case the property includes multiple values (minOccurs > 1)
-
For arrays, 'minItems' and 'maxItems' for providing any specific cardinality
-
'maxItems' shall be null if the number of items is unbounded
input1:
type: array
minItems: 1
maxItems: 10
items:
$ref: 'myItemType'
When defining simple parameters, formally known as "literal data", the available datatype can be easily mapped to JSON Schema simple types. The following table shows the obvious mappings:
WPS Schema Type | JSON Schema Type |
---|---|
string |
string |
integer |
integer |
decimal |
number |
boolean |
integer |
double |
number (with format double) |
float |
number (with format float) |
Note that JSON Schema also allows additional type which would be relevant for the Processing API:
-
Date
-
Date time
-
Password
-
URI/URL (string with format uri)
The WPS specification allows to provide some constraints on the inputs, the following tables show the equivalent using JSON Schema:
WPS Constraint | JSON Schema Constraint |
---|---|
minimum value |
minimum and exclusiveMinimum |
maximum value |
maximum and exclusiveMaximum |
ranges |
only allow a single minimum and maximum value |
spacing |
none |
allowed values |
enum |
default value |
default |
values reference |
externalDocs |
JSON Schema also provide additional properties for expressing constraints:
-
multipleOf: numbers can be restricted to a multiple of a given number
-
min/maxLength: string can be restricted to a particular length range
-
pattern: ensure a string complies a regular expression
-
uniqueItem: validates only if all array items are unique
Note that LiteralDataDomains is not covered but discussion may be required to decide the relevant of porting it to JSON Schema.
Finally an example of a set of literal inputs definitions is shown below.
input2:
type: string
enum:
- allowedValue1
- allowedValue2
input3:
type: boolean
default: true
input4:
type: integer
minimum: 1
maximum: 100
default: 20
When defining input files and documents, formally known as "complex data", the WPS specification can be mapped to JSON depending on the nature of the file.
The most obvious mapping is when the input document is itself in JSON format. In such case, the input is simply defined as a JSON object (either generic, either an explicit definition) as shown below:
inputJSON:
type: object
properties:
propertyA:
type: string
propertyB:
type: integer
In case the input is a file, the WPS specification usualy allows for providing the file in multiple allowed format (and a default format). We recommend that the supported format are indicated in a 'mimeType' property (using an enum).
Also, the specification allows providing inputs raw in the request or by reference (url), this is illustrated in the example below despite we strongly believe that file shall always be provided by reference.
inputJSON:
type: object
properties:
mimeType:
type: string
enum:
- 'image/tiff'
- 'image/png'
default: 'image/tiff'
# File provided by reference
href:
type: string
format: uri
# File provided raw in the request document
data:
type: string
# !use byte instead in case of a text file
format: binary
Note that obviously, either the href xor data property must be provided, but not both. We prefer this simple definition, but this may be explicitely expressed using the construct 'oneOf' combined with a discriminator property.
Finally, remind that the mimeType property indicates the client the input format (mime type) which is critical for matching process inputs with process outputs in the context of workflows and processing chains.
The WPS specification allows the client to indicate the list of desired outputs. This is typically the mean used to indicate the output identifiers, formats and usual properties to the client. By analogy with the inputs, the format of the outputs may be provided in a mimeType property.
If needed, the definition may also include the possible transmission mode of the individual outputs (value or reference) despite we believe that the outputs shall always be provided by reference.
outputs:
type: object
properties:
output1:
type: object
properties:
mimeType:
type: string
enum:
- 'image/tiff'
- 'image/png'
default: 'image/tiff'
transmission:
type: string
enum:
- value
- reference
Note that we assume that it is not needed to indicate if the individual inputs are single files or arrays of files as it totally depends of the particular execution.
As detailed in the OpenAPI specification (https://swagger.io/specification/#specificationExtensions), additional data can be added to extend the specification at certain points. The extensions properties are implemented as patterned fields that are always prefixed by "x-".
WPS/Geospatial extensions may be defined to support domain specific information. This is typically the prefered way for embedding metadata about a process input, output or the process itself as discussed earlier in the document.
Here is a complete example of a 'classical' process description. The example is an OGC Processing API with only a single process execution operation (it does not include any other operation of the Processing API such as GetResult, GetStatus, etc.)
openapi: 3.0.0
info:
title: "Hackaton OGC Processing API Example of Process Description part"
version: "Hackaton.final"
description: 'GeomatysNDVIMultiSensor Process (example taken from Testbed 14)'
contact:
name: Open Geospatial Consortium
email: [email protected]
url: 'http://www.opengeospatial.org'
license:
name: CC-BY 4.0 license
url: 'https://creativecommons.org/licenses/by/4.0/'
paths:
# For each process the path and interface is provided as below
'/processes/mySampleProcess':
post:
summary: Start mySampleProcess execution.
requestBody:
content:
application/json:
schema:
$ref: '#/components/schemas/mySampleProcessExecuteRequest'
responses:
'201':
headers:
Location:
schema:
type: string
description: URL to check the status of the execution/job.
'/processes/mySampleProcess2':
post:
summary: Start mySampleProcess execution.
requestBody:
content:
application/json:
schema:
$ref: '#/components/schemas/mySampleProcessExecuteRequest2'
responses:
'201':
headers:
Location:
schema:
type: string
description: URL to check the status of the execution/job.
components:
schemas:
mySampleProcessExecuteRequest:
allOf:
# Execution parameters (mode, response) are simply declared by refering to the standard schema
- $ref: "executionParameters.yaml"
# Then the definitions of inputs and outputs
- type: object
required:
# indicates the inputs that are mandatory (optional by default)
- input1
properties:
inputImage:
title: 'my input image'
description: 'abstract of my input image'
type: object
properties:
# Indicate of the single support file format
mimeType:
type: string
enum:
- 'image/png'
default: 'image/png'
# File can only be provided by reference (means that by value is not supported by the implementation)
href:
type: string
format: uri
# Indication about how to retrieve the input from an OGC WMS service (metadata)
x-context:
- offering:
code: "http://www.opengis.net/spec/owc-atom/1.0/req/wms"
operation:
- code: "GetMap"
method: "GET"
href: "http://acme.org/wms?REQUEST=GetMap&SRS=EPSG:4326&BBOX=3.5,-1.3,34,-1.8"
inputCustom:
type: integer
minimum: 1
maximum: 100
default: 20
outputs:
type: object
properties:
output1:
type: object
title: 'my output title'
description: 'my output description'
properties:
mimeType:
type: string
enum:
- 'image/tiff'
- 'image/png'
default: 'image/tiff'
transmission:
type: string
enum:
- reference
tags:
- name: Capabilities
description: Essential characteristics of this API including information about the data.
- name: Processes
description: Access to processes.
servers:
- description: SwaggerHub API Auto Mocking
url: 'https://virtserver.swaggerhub.com/geoprocessing/WPS/0.1'
Regarding the posting of a new process (transactional specification of the Processing API), the client should only provide the schema definition of the submitted process.
We assume some part may also be optional and automatically added/generated by the Server: the execution parameters, the optional 'raw' input file property, etc.
The submitted OpenAPI schema of the process would look as follows:
mySampleProcessExecuteRequest:
- type: object
required:
- input1
properties:
+ inputImage: [...]
+ outputs: [...]
Since the early WPS Transactional Extension draft in 2008, Spacebel has release multiple implementations of the extension. Lastly, the extension was revised during the OGC Testbed 14 and defined in cooperation with the EOC and NextGen threads.
The result of the testbed implementated has been submitted to the offical github repository at https://github.com/opengeospatial/wps-rest-binding/commit/7bdee9df5652ad3a7ad13075c535b84f9b264d0b.
An extension supporting callbacks would allow a more modern pattern for monitoring the jobs status. Instead of observing (repeated polling), the client can be notified on the following event:
-
onSuccess
-
onFailure
-
onUpdate
Spacebel has proposed a draft specification for such notification feature (relying on OpenAPI callbacks): https://github.com/opengeospatial/wps-rest-binding/commit/8cc2bdaf6483af7d8c5a85c7ad0a45b8e68cc4b6
Spacebel has also lead a discussion using a very classical issue related to the possibility to report list of files as outputs.
List of outputs is indeed disallowed by spec : (http://www.opengis.net/spec/WPS/2.0/req/conceptual-model/process/output-value-cardinality Each process output shall have a value cardinality of one)
The following workarounds are generally used:
-
Return a list of outputs (despite is forbidden)
-
Return an archive (zip) which is very resource consuming (for large files)
-
Return a Metalink (allows also multiple formats in response)
-
Return Multipart/mixed (not supported by browsers, difficult implementation).
The conclusion of the discussion was that the new Processing API should support the presence of output file arrays.
Some fixes listrf at https://github.com/opengeospatial/wps-rest-binding/issues have been proposed during the Hackaton.