Skip to content

OGC Hackaton 2019

Spacebel sa (Christophe Noel) edited this page Nov 19, 2019 · 52 revisions

OGC Hackaton 2019 Report

Author: Christophe Noël, Spacebel
Status: draft discussion paper
Date: 26th June 2019
Summary: this reports mostly addresses guidelines about how to express a Process Description as a OpenAPI operation instead of the ususal specific schema. The document also covers a set of others enhancements that have been proposed by Spacebel during the OGC Hackaton 2019 in London.

Introduction

Spacebel participated to the OGC Hackaton 2019 in London.

We have provided an OpenAPI/REST OGC Processing API as described in the OGC Testbed 14 project (NextGen thread), the server implementation was integrated in multiple clients (e.g. Solenix). Spacebel also provided some improvements suggested for the OGC Processing API.

The present report mostly addresses method about how to express a Process Description as a OpenAPI operation instead of the ususal specific schema. The document also covers a set of others enhancements that have been proposed by Spacebel during the OGC Hackaton 2019 in London.

Using JSON Schema for Process Descriptions

A major discussion during the OGC Hackaton 2019 (London) was about describing the process interfaces (i.e. the Process Description including inputs and outputs definitions) directly in JSON Schema and/or OpenAPI.

Indeed, as OpenAPI already supports descriptions of schemas/interfaces, this would remove the need of using a specific Process Description schema, and this would also ease the work of Web client developers.

Two understandings, two approaches

Nevertheless, the proposition was not understood and foreseen in the same way by various participants. The following approaches have been identified:

1. Input and Output Type

Only the input and output type needs to be defined using JSON Schema. The schema thus replaces literal data type (boolean, string, integer, etc.) and complex data type(actually schema + encoding + mime type) with all format supported by JSON Schema. See an example below.

Example of type definitions in JSON Schema
{
  "processDescription": {
    "processVersion": "1.0.0",
    "process": {
      - "id": "myProcessId",
      - "inputs": [
        {
          "id": "files",
          "schema": {
            "type": "string",
            "format": "uri"
          }
        }
      ],
      + "outputs": [...]
    }
  }
}

From the point of view of Spacebel, this would be particularily relevant when the inputs are themselves JSON documents. Nevertheless, changing only the type definition would not bring much (this is only a small part of the process description). Globally, we don’t really see the purpose of such minimal change.

2. Whole Process Description in OpenAPI

In the second approach, the whole Process Description is defined in JSON Schema, and typically, processes are exposed as REST resources in the service OGC Processing OpenAPI document.

For each existing process, a resource with a HTTP path is defined in the API document. Therefore, processes are discovered through the API document (no need of discovery and describe process operations) and the Client directly knows how to invoke the execution. The API document is illustrated below:

Example of processes exposed as OpenAPI operations
openapi: 3.0.0
paths:
 # For each process the path and interface is provided as below
 '/processes/mySampleProcess/jobs':
   post:
      summary: Start mySampleProcess execution.
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/mySampleProcessExecuteRequest'
      responses:
        '201':
          headers:
            Location:
              schema:
                type: string
              description: URL to check the status of the execution/job.

The example showned above shows the execution operation definition. A discussion further address the relevance of defining a specific (per process) result operation for reading outputs.

Mappings of Process Description to JSON Schema

Each process is defined as an exposed API operation (the process identifier is provided in the operation path). The example below illustrates a typical schema for the execute request of a process. The detailed mappings with the WPS specific are detailed in the next sections.

Basic Example
    mySampleProcessExecuteRequest:
      allOf:
      # Execution parameters (mode, response) can simply be refered by a link
      - $ref: "executionRequest.yaml"
      - type: object
        required:
          # indicates the inputs that are mandatory (optional by default)
          - input1
        properties:
          # Process inputs are defined below
          input1:
            type: array
            title: provided title
            items:
              $ref: 'genericInputType.yaml'
          # alternative with format
          input2:
            type: string
          input3:
            type: boolean
          # Outputs IDs list: user choose the transmission  by value or ref.
          outputs:
            type: object
              properties:
                imageOutput:
                  type: transmissionMode.yaml

As illustrated in the example, the execution request includes the following parts:

  • A set of standard execution parameters supported by the OGC Processing API and always included all process definitions. This may include the usual mode (sync, async) and response (raw, document) attributes.

  • The list of inputs, defined as properties.

  • The list of outputs listed as properties of the output element. The output identifiers are known by the client by this mean, and the client may also include or exclude the individual outputs.

Mapping Title, Abstract and Keywords to JSON Schema

The usual Title, and Abstract properties (for each input, output, and the process itself) can be easily mapped to their JSON Schema equivalent listed below:

  • Title -→ title

  • Abstract -→ description

The Keywords property requires an specification extension (https://swagger.io/specification/#specificationExtensions) such as x-keywords (see the section 'JSON Schema extension for Processing')

Example of Title, Description and Keywords part
input1:
  type: integer
  title: 'my input title'
  description: 'my input abstract'
  x-keywords:
   - keyword1
   - keyword2

Mapping Metadata to JSON Schema

Metadata providing additional information about the process , individual inputs and outputs is a fundamental part of the process description which is mandatory in most projects (OGC Testbeds, SSEGrid, ESE, etc.). The metadata fields provide additional information for different kind of purpose, e.g. :

  • Additional (geo) constraints on a particular input (spatial, temporal) or providing any additional information (e.g. unit of measure originally embedded in the ows:UOM).

  • Indication about how to retrieve the input data from another (OGC) service: this is typically handled by an OWS context document (e.g. the WFS endpoint with a list of restricted collections).

  • Transactional-related information: such process description typically include memory/cpu requirements, backend execution customization (e.g. CWL document), or target execution cluster parameters.

The usual generic <ows:Metadata> element (which includes xlink:role and xlink:ref attributes) can easily be mapped to an OpenAPI External Documentation Object for providing a single reference.

input1:
  type: integer
  externalDocs:
    description: "myExternalDoc"
    url: "https://example.com"

The very common <ows:additionalProperties> element which allows to provide key value pairs can be provided using a standard Processing API extension such as "x-properties" (see further section on Specification Extensions).

input1:
  type: integer
  x-properties:
    - uom: "km/s"
    - eoSource: "Landsat-8"

Finally, the Processing API may define some Specification Extensions properties used for embedding very common geospatial information. For example, any OWS context information could be provided by defining the "x-context" property simply defined as an JSON Object.

An example below of an input that can be retrieve from a OGC WMS service:

input1:
  type: integer
  x-context:
    - offering:
      code: "http://www.opengis.net/spec/owc-atom/1.0/req/wms"
      operation:
        - code: "GetMap"
          method: "GET"
          href: "http://acme.org/wms?REQUEST=GetMap&SRS=EPSG:4326&amp;BBOX=3.5,-1.3,34,-1.8

We believe the OGC API for Processes specification should define the following properties as extension:

  • x-properties: list of key-value string pairs

  • x-context: JSON object (intended for embedding JSON OWS Context)

  • And optional x-links: list of links with a role attribute

Of course, this does not prevent any implementation to define additional project-specific extensions.

Mapping minOccurs and maxOccurs

The following JSON Schema properties shall be used to map the minOccurs and maxOccurs definitions:

  • 'Required' attribute in case the property is mandatory (minOccurs >= 1)

  • 'Array' type in case the property includes multiple values (minOccurs > 1)

  • For arrays, 'minItems' and 'maxItems' for providing any specific cardinality

  • 'maxItems' shall be null if the number of items is unbounded

input1:
  type: array
  minItems: 1
  maxItems: 10
  items:
    $ref: 'myItemType'

Mapping LiteralDataType to JSON Schema

When defining simple parameters, formally known as "literal data", the available datatype can be easily mapped to JSON Schema simple types. The following table shows the obvious mappings:

WPS Schema Type JSON Schema Type

string

string

integer

integer

decimal

number

boolean

integer

double

number (with format double)

float

number (with format float)

Note that JSON Schema also allows additional type which would be relevant for the Processing API:

  • Date

  • Date time

  • Password

  • URI/URL (string with format uri)

The WPS specification allows to provide some constraints on the inputs, the following tables show the equivalent using JSON Schema:

WPS Constraint JSON Schema Constraint

minimum value

minimum and exclusiveMinimum

maximum value

maximum and exclusiveMaximum

ranges

only allow a single minimum and maximum value

spacing

none

allowed values

enum

default value

default

values reference

externalDocs

JSON Schema also provide additional properties for expressing constraints:

  • multipleOf: numbers can be restricted to a multiple of a given number

  • min/maxLength: string can be restricted to a particular length range

  • pattern: ensure a string complies a regular expression

  • uniqueItem: validates only if all array items are unique

Note that LiteralDataDomains is not covered but discussion may be required to decide the relevant of porting it to JSON Schema.

Finally an example of a set of literal inputs definitions is shown below.

input2:
  type: string
  enum:
   - allowedValue1
   - allowedValue2
input3:
  type: boolean
  default: true
input4:
  type: integer
  minimum: 1
  maximum: 100
  default: 20

Mapping ComplexDataType to JSON Schema

When defining input files and documents, formally known as "complex data", the WPS specification can be mapped to JSON depending on the nature of the file.

The most obvious mapping is when the input document is itself in JSON format. In such case, the input is simply defined as a JSON object (either generic, either an explicit definition) as shown below:

inputJSON:
  type: object
    properties:
      propertyA:
        type: string
      propertyB:
        type: integer

In case the input is a file, the WPS specification usualy allows for providing the file in multiple allowed format (and a default format). We recommend that the supported format are indicated in a 'mimeType' property (using an enum).

Also, the specification allows providing inputs raw in the request or by reference (url), this is illustrated in the example below despite we strongly believe that file shall always be provided by reference.

Example of complex data by reference
inputJSON:
  type: object
    properties:
      mimeType:
        type: string
        enum:
         - 'image/tiff'
         - 'image/png'
        default: 'image/tiff'
      # File provided by reference
      href:
        type: string
        format: uri
      # File provided raw in the request document
      data:
        type: string
        # !use byte instead in case of a text file
        format: binary

Note that obviously, either the href xor data property must be provided, but not both. We prefer this simple definition, but this may be explicitely expressed using the construct 'oneOf' combined with a discriminator property.

Finally, remind that the mimeType property indicates the client the input format (mime type) which is critical for matching process inputs with process outputs in the context of workflows and processing chains.

Mapping Output Definitions to JSON Schema

The WPS specification allows the client to indicate the list of desired outputs. This is typically the mean used to indicate the output identifiers, formats and usual properties to the client. By analogy with the inputs, the format of the outputs may be provided in a mimeType property.

If needed, the definition may also include the possible transmission mode of the individual outputs (value or reference) despite we believe that the outputs shall always be provided by reference.

Example of complex data by reference
outputs:
  type: object
    properties:
    output1:
      type: object
      properties:
        mimeType:
          type: string
          enum:
           - 'image/tiff'
           - 'image/png'
          default: 'image/tiff'
        transmission:
          type: string
          enum:
           - value
           - reference

Note that we assume that it is not needed to indicate if the individual inputs are single files or arrays of files as it totally depends of the particular execution.

OpenAPI Specification Extensions

As detailed in the OpenAPI specification (https://swagger.io/specification/#specificationExtensions), additional data can be added to extend the specification at certain points. The extensions properties are implemented as patterned fields that are always prefixed by "x-".

WPS/Geospatial extensions may be defined to support domain specific information. This is typically the prefered way for embedding metadata about a process input, output or the process itself as discussed earlier in the document.

Complete Example

Here is a complete example of a 'classical' process description. The example is an OGC Processing API with only a single process execution operation (it does not include any other operation of the Processing API such as GetResult, GetStatus, etc.)

Full Example
openapi: 3.0.0
info:
  title: "Hackaton OGC Processing API Example of Process Description part"
  version: "Hackaton.final"
  description: 'GeomatysNDVIMultiSensor Process (example taken from Testbed 14)'
  contact:
    name: Open Geospatial Consortium
    email: [email protected]
    url: 'http://www.opengeospatial.org'
  license:
    name: CC-BY 4.0 license
    url: 'https://creativecommons.org/licenses/by/4.0/'
paths:
 # For each process the path and interface is provided as below
 '/processes/mySampleProcess':
   post:
      summary: Start mySampleProcess execution.
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/mySampleProcessExecuteRequest'
      responses:
        '201':
          headers:
            Location:
              schema:
                type: string
          description: URL to check the status of the execution/job.
 '/processes/mySampleProcess2':
   post:
      summary: Start mySampleProcess execution.
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/mySampleProcessExecuteRequest2'
      responses:
        '201':
          headers:
            Location:
              schema:
                type: string
          description: URL to check the status of the execution/job.
components:
  schemas:
    mySampleProcessExecuteRequest:
      allOf:
      # Execution parameters (mode, response) are simply declared by refering to the standard schema
      - $ref: "executionParameters.yaml"
      # Then the definitions of inputs and outputs
      - type: object
        required:
          # indicates the inputs that are mandatory (optional by default)
          - input1
        properties:
          inputImage:
            title: 'my input image'
            description: 'abstract of my input image'
            type: object
            properties:
              # Indicate of the single support file format
              mimeType:
                type: string
                enum:
                 - 'image/png'
                default: 'image/png'
              # File can only be provided by reference (means that by value is not supported by the implementation)
              href:
                type: string
                format: uri
            # Indication about how to retrieve the input from an OGC WMS service (metadata)
            x-context:
              - offering:
                code: "http://www.opengis.net/spec/owc-atom/1.0/req/wms"
                operation:
                  - code: "GetMap"
                    method: "GET"
                    href: "http://acme.org/wms?REQUEST=GetMap&SRS=EPSG:4326&amp;BBOX=3.5,-1.3,34,-1.8"
          inputCustom:
            type: integer
            minimum: 1
            maximum: 100
            default: 20
          outputs:
            type: object
            properties:
              output1:
                type: object
                title: 'my output title'
                description: 'my output description'
                properties:
                  mimeType:
                    type: string
                    enum:
                     - 'image/tiff'
                     - 'image/png'
                    default: 'image/tiff'
                  transmission:
                    type: string
                    enum:
                     - reference
tags:
  - name: Capabilities
    description: Essential characteristics of this API including information about the data.
  - name: Processes
    description: Access to processes.
servers:
  - description: SwaggerHub API Auto Mocking
    url: 'https://virtserver.swaggerhub.com/geoprocessing/WPS/0.1'

Transactional Considerations

Regarding the posting of a new process (transactional specification of the Processing API), the client should only provide the schema definition of the submitted process.

We assume some part may also be optional and automatically added/generated by the Server: the execution parameters, the optional 'raw' input file property, etc.

The submitted OpenAPI schema of the process would look as follows:

mySampleProcessExecuteRequest:
  - type: object
    required:
      - input1
    properties:
      + inputImage: [...]
      + outputs:    [...]

Conclusion

In conclusion, we believe that the mapping of a Process Description into a JSON Schema is quite straigthforward. Most of the classical specification elements can be easily mapped to JSON Schema properties, and additional extensions may be defined for embedding any required element.

Adopting Transactional Extension

Since the early WPS Transactional Extension draft in 2008, Spacebel has release multiple implementations of the extension. Lastly, the extension was revised during the OGC Testbed 14 and defined in cooperation with the EOC and NextGen threads.

The result of the testbed implementated has been submitted to the offical github repository at https://github.com/opengeospatial/wps-rest-binding/commit/7bdee9df5652ad3a7ad13075c535b84f9b264d0b.

Callback Support

An extension supporting callbacks would allow a more modern pattern for monitoring the jobs status. Instead of observing (repeated polling), the client can be notified on the following event:

  • onSuccess

  • onFailure

  • onUpdate

Spacebel has proposed a draft specification for such notification feature (relying on OpenAPI callbacks): https://github.com/opengeospatial/wps-rest-binding/commit/8cc2bdaf6483af7d8c5a85c7ad0a45b8e68cc4b6

Support of multiple outputs

Spacebel has also lead a discussion using a very classical issue related to the possibility to report list of files as outputs.

List of outputs is indeed disallowed by spec : (http://www.opengis.net/spec/WPS/2.0/req/conceptual-model/process/output-value-cardinality Each process output shall have a value cardinality of one)

The following workarounds are generally used:

  • Return a list of outputs (despite is forbidden)

  • Return an archive (zip) which is very resource consuming (for large files)

  • Return a Metalink (allows also multiple formats in response)

  • Return Multipart/mixed (not supported by browsers, difficult implementation).

The conclusion of the discussion was that the new Processing API should support the presence of output file arrays.

Fixes of the Processing API

Some fixes listrf at https://github.com/opengeospatial/wps-rest-binding/issues have been proposed during the Hackaton.