Version 1.0 | Heliophysics Data and Model Consortium (HDMC) | October 11, 2016
Table of Contents
- Introduction
- Endpoints
- HTTP Status Codes
- Representation of Time
- Additional Keyword / Value Pairs
- More About
- Security Notes
- References
- Contact
This document describes the Heliophysics Application Programmer’s Interface (HAPI) specification, which is a standard set of services for delivering digital Heliophysics time series data. The intent of HAPI is to enhance interoperability among time series data providers. Many providers offer data through a custom API, and therefore getting data from multiple providers involves dealing with the details of each provider's API. The HAPI specification describes the lowest common denominator of services that any provider of time series data could implement. The hope is that if many providers make data available through HAPI compliant servers, it becomes possible to obtain time series science data content seamlessly from many sources.
This document is intended to be used two groups of people: first by data providers who want to make time series data available through a HAPI server, and second by data users who want to understand how data is made available from a HAPI server, or perhaps to write client software to obtain data from an existing HAPI server.
HAPI constitutes a minimum but complete set of capabilities needed for a server to allow access to the time series data values within one or more data collections. Because of this focus on access to data content, HAPI is very light on metadata and data discovery. Within the metadata offered by HAPI are optional ways to indicate where further descriptive details for any dataset could be found.
The API itself is built using REST principles that emphasize URLs as stable endpoints through which clients can request data. Because it is based on well-established HTTP request and response rules, a wide range of HTTP clients can be used to interact with HAPI servers.
These definitions are provided first to ensure clarity in ensuing descriptions.
parameter – a measured science quantity or a related ancillary quantity at one instant in time; may be scalar as a function of time, or a 1-D array at each time step; can have units and can have a fill value that represents no measurement or absent information
dataset – a collection with a conceptually uniform of set of parameters; one instance of all the parameters together with a time value constitutes a data record. Although a dataset is a logical entity, it often physically resides in one or more files that are accessible online. A HAPI service presents a dataset as a seamless collection of time ordered records, offering a way to retrieve the parameters while hiding actual storage details
request parameter – keywords that appear after the ‘?’ in a URL with a GET request.
Consider this example GET request:
http://example.com/hapi/data?id=alpha&time.min=2016-07-13
The two request parameters are id
and time.min
. They are shown in bold and have values of alpha
and 2016-07-13
respectively. This document will always use the full phrase "request parameter" to refer to these URL elements to draw a clear distinction from a parameter in a dataset.
The HAPI specification consists of four required endpoints that give clients a precise way to first determine the data holdings of the server and then to request data from the server. The functionality of each endpoint is as follows:
- describe the capabilities of the server; lists the output formats the server can emit (CSV, binary, or JSON)
- list the catalog of datasets that are available; each dataset is associated with a unique id and may optionally have a title
- show information about a dataset with a given id; the description defines the parameters in every dataset
- stream data content for a dataset of a given id; the streaming request must have time bounds (specified by request parameters
time.min
andtime.max
) and may indicate a subset of parameters (default is all parameters)
There is also an optional landing page endpoint that returns human-readable HTML. Although there is recommended content for this landing page, it is not essential to the functioning of the server.
The four required endpoints behave like REST-style services, in that the resulting HTTP response is the complete response for each endpoint. In particular, the fourth endpoint does not just give URLs or links to the data, but rather streams the data content in the HTTP response. The full specification for each endpoint is discussed below.
All endpoints must be directly below a hapi path element in the URL:
http://example.com/hapi/capabilities
http://example.com/hapi/catalog
http://example.com/hapi/info
http://example.com/hapi/data
The input specification for each endpoint (the request parameters and their allowed values) must be strictly enforced by the server. Only the request parameters described below are allowed, and no extensions are permitted. If a HAPI server sees a request parameter that it does not recognize, it is required to throw an error indicating that the request is invalid (via HTTP 400 error – see below). A server that ignored an unknown request parameter would falsely indicate to clients that the request parameter was understood and was taken into account when creating the output.
All requests to a HAPI server are for retrieving resources and must not change the server state. Therefore, all HAPI endpoints must respond only to HTTP GET requests. POST requests should result in an error. This represents a RESTful approach in which GET requests are restricted to be read-only operations from the server. The HAPI specification does not allow any input to the server (which for RESTful services are often implemented using POST requests).
The outputs from a HAPI server to the catalog
, capabilities
, and info
endpoints are JSON strutures, the formats of which are described below in the sections detailing each endpoint. The data
endpoint must be able to deliver Comma Separated Value (CSV) data, but may optionally deliver data content in JSON.
The following is the detailed specification for the four main HAPI endpoints described above and an additional optional endpoint.
This root endpoint is optional and serves as a human-readable landing page for the server. Unlike the other endpoints, there is no strict definition for the output, but if present, it should include a brief description of the other endpoints, and links to documentation on how to use the server. An example landing page that can be easily customized for a new server is available here: https://github.com/hapi-server/data-specification/example_hapi_landing_page.html
Sample Invocation
http://example.com/hapi
Request Parameters
None
Response
The response is in HTML format and is not strictly defined, but should look something like the example below.
Example
Retrieve a listing of resources shared by this server.
http://example.com/hapi
Example Response:
<html>
<head> </head>
<body>
<h2> HAPI Server</h2>
<p> This server supports the HAPI 1.0 specification for delivery of time series
data. The server consists of the following 4 REST-like endpoints that will
respond to HTTP GET requests.
</p>
<ol>
<li> <a href="capabilities">capabilities</a> describe the capabilities of the server; this lists the output formats the server can emit (CSV and binary)</li>
<li><a href="catalog">catalog</a> list the datasets that are available; each dataset is associated with a unique id</li>
<li><a href="info">info</a> obtain a description for dataset of a given id; the description defines the parameters in every dataset record</li>
<li><a href="data">data</a> stream data content for a dataset of a given id; the streaming request must have time bounds (specified by request parameters time.min and time.max) and may indicate a subset of parameters (default is all parameters)</li>
</ol>
<p> For more information, see <a href="http://spase-group.org/hapi">this HAPI description</a> at the SPASE web site. </p>
</body>
<html>
This endpoint describes relevant implementation capabilities for this server. Currently, the only possible variability from server to server is the list of output formats that are supported.
A server must support CSV output format, but binary output format and JSON output may optionally be supported. The details for all output formats are described below.
Sample Invocation
http://example.com/hapi/capabilities
Request Parameters
None
Response
Response is in JSON format [3] as defined by RFC-7159 and has a mime type of application/json
. Any capabilities are described using keyword value pairs, with "outputFormats" being the only keyword currently in use.
Capabilities
Name | Type | Description |
---|---|---|
HAPI | string | Required The version number of the HAPI specification this description complies with. |
outputFormats | string array | Required The list of output formats the serve can emit. The allowed values in the last are csv , binary , and json . All HAPI servers must support at least csv output format, but binary and json output formats are optional. |
Example
Retrieve a listing of resources shared by this server.
http://example.com/hapi/capabilities
Example Response:
{
"HAPI": "1.0",
"outputFormats": [ "csv", "binary", "json" ]
}
If a server only reports an output format of CSV, then requesting data in binary form should cause the server to issue an HTTP return code of 400 (bad request).
Servers may support their own custom output formats, which would be advertised here.
This endpoint provides a list of datasets available via this server.
Sample Invocation
http://example.com/hapi/catalog
Request Parameters
None
Response
The response is in JSON format [3] as defined by RFC-7159 and has a mime type of application/json
. An example is given here, and the content of the JSON response is fully defined in the section "HAPI JSON Content." The catalog is a simple listing of identifiers for the datasets available through the server providing the catalog. Additional metadata about each dataset is available through the info
endpoint (described below). The catalog takes no query parameters and always lists the full catalog.
Catalog
Name | Type | Description |
---|---|---|
HAPI | string | Required The version number of the HAPI specification this description complies with. |
catalog | array(dataset) | Required A list of datasets available from this server. |
Dataset Object
Name | Type | Description |
---|---|---|
id | string | Required The computer friendly identifier that the host system uses to locate the dataset. Each identifier must be unique within the HAPI server where it is provided. |
title | string | Optional A short human readable name for the dataset. If none is given, it defaults to the id. The suggested maximum length is 40 characters. |
Example
Retrieve a listing of datasets shared by this server.
http://example.com/hapi/catalog
Example Response:
{
"HAPI" : "1.0",
"catalog" :
[
{"id": "ACE_MAG", title:"ACE Magnetometer data"},
{"id": "data/IBEX/ENA/AVG5MIN"},
{"id": "data/CRUISE/PLS"},
{"id": "any_identifier_here"}
]
}
The identifiers must be unique within a single HAPI server. Also, dataset identifiers in the catalog should be stable over time. Including version numbers or other revolving elements (dates, processing ids, etc.) in the datasets identifiers should be avoided. The intent of the HAPI specification is to allow data to be referenced using RESTful URLs that have a reasonable lifetime.
Also, note that the identifiers can have slashes in them.
This endpoint provides a data header for a given dataset, including a descriptive list of the parameters in the dataset.
By default, all the parameters are included in the header. If you already know what the parameters are and want to obtain a header for just a subset of the parameters, you can specify the subset of interest as a comma separated list via the request parameter called parameters
. This reduced header is potentially useful because it is also possible to request a subset of parameters when asking for data (see the data
endpoint), and a reduced header can be requested that would then match the subset of parameters in the data.
Sample Invocation
http://example.com/hapi/info?id=ACE_MAG
Request Parameters
Name | Description |
---|---|
id | Required The identifier for the resource. |
parameters | Optional A subset of the parameters to include in the header. |
Response
The response is in JSON format [3] and provides metadata about one dataset. The main job of this metadata is to list and describe the parameters in the dataset. There are several required and many optional descriptive keywords that can be included.
The server may also put custom (server-specific) keywords or keyword/value pairs in the header, but any non-standard keywords must begin with the prefix x_
.
NOTE: The first parameter in the data must be a time column (type of isotime
-- see the table below describing the Parameter items and their allowed types). The time column must be the independent variable for the dataset.
Info
Dataset Attribute | Type | Description |
---|---|---|
HAPI | string | Required The version number of the HAPI specification with which this description complies. |
format | string | Required (when header is prefixed to data stream) Format of the data as csv or binary or json . |
parameters | array(Parameter) | Required Description of the parameters in the data. |
startDate | string | Required ISO 8601 date of first record of data. |
stopDate | string | Optional ISO 8601 date for the last record of data. For actively growing datasets, the end date can be approximate, but should be kept up to date. |
sampleStartDate | string | Optional The end time of a sample time period for a dataset, where the time period must contain a manageable, representative example of valid, non-fill data. |
sampleStopDate | string | Optional The end time of a sample time period for a dataset, where the time period must contain a manageable, representative example of valid, non-fill data. |
description | string | Optional A brief description of the resource. |
resourceURL | string | Optional URL linking to more detailed information about this data. |
resourceID | string | Optional An identifier by which this data is known in another setting, for example, the SPASE ID. |
creationDate | string | Optional ISO 8601 Date and Time of the dataset creation. |
modificationDate | string | Optional Last modification time of the data content in the dataset as an ISO 8601 date. |
deltaTime | string | Optional Time difference between records as an ISO 8601 duration. This is meant as a guide to the nominal cadence of the data and not a precise statement about the time between measurements. |
contact | string | Optional Relevant contact person and possibly contact information. |
contactID | string | Optional The identifier in the discovery system for information about the contact. For example, the SPASE ID of the person. |
Parameter
Parameter Attribute | Type | Description |
---|---|---|
name | string | Required |
type | string | Required One of string , double , integer , isotime . Content for double is always 8 bytes in IEEE 754 format, integer is 4 bytes little-endian. There is no default length for string and isotime types. See below for more information on data types. |
length | integer | Required for type string and isotime ; not allowed for othersThe number of bytes or characters that contain the value. Valid only if data is streamed in binary format. |
units | string | Optional The units for the data values represented by this parameter. Default is ‘dimensionless’ for everything but ‘isotime’ types. |
size | array of integers | Required for array parameters; not allowed for others Must be a 1-D array whose first and only value is the number of array elements in this parameter. For example, "size"=[7] indicates an array of length 7. For the csv and binary output, there must be 7 columns for this parameter -- one column for each array element, effectively unwinding this array. The json output for this data parameter must contain an actual JSON array (whose elements would be enclosed by [ ] ). See below for more about array sizes. |
fill | string | Optional A fill value indicates no valid data is present. See below for issues related to specifying fill values as strings. |
description | string | Optional A brief description of the parameter. |
bins | object | Optional For array parameters, the bins object describes the values associated with each element in the array. If the parameter represents a frequency spectrum, the bins object captures the frequency values for each frequency bin. The center value for each bin is required and the min and max values are optional. If min or max is present, the other is also required. The bins object has an optional units keyword (any string value is allowed) and a required values keyword that holds an array of objects containing the min , center , and max for each bin. See below for an example showing a parameter that holds a proton energy spectrum. |
Example
http://example.com/hapi/info?id=ACE_MAG
Example Response:
{ "HAPI": "1.0",
"creationDate”: "2016-06-15T12:34"
"parameters": [
{ "name": "Time",
"type": "isotime",
"length": 24 },
{ "name": "radial_position",
"type": "double",
"units": "km",
"description": "radial position of the spacecraft" },
{ "name": "quality flag",
"type": "integer",
"units ": "none ",
"description ": "0=OK and 1=bad " },
{ "name": "mag_GSE",
"type": "double",
"units": "nT",
"size" : [3],
"description": "hourly average Cartesian magnetic field in nT in GSE" }
]
}
There is an interaction between the info
endpoint and the data
endpoint, because the header from the info
endpoint describes the record structure of data emitted by the data
endpoint. Thus after a single call to the info
endpoint, a client could make multiple calls to the data
endpoint (for multiple time ranges, for example) with the expectation that each data response would contain records described by the single call to the info
endpoint. The data
endpoint can optionally prefix the data stream with header information, potentially obviating the need for the info
endpoint. But the info
endpoint is useful in that allows clients to learn about a dataset without having to make a data request.
Both the info
and data
endpoints take an optional request parameter (recall the definition of request parameter in the introduction) called parameters
that allows users to restrict the dataset parameters listed in the header and data stream, respectively. This enables clients (that already have a list of dataset parameters from a previous info or data request) to request a header for a subset of parameters that will match a data stream for the same subset of parameters. Consider the following dataset header for a fictional dataset with the identifier MY_MAG_DATA.
{ "HAPI": "1.0",
"creationDate”: "2016-06-15T12:34"
"parameters": [
{ "name": "Time",
"type": "isotime",
"length": 24 },
{ "name": "Bx", "type": "double", "units": "nT" },
{ "name": "By", "type": "double", "units": "nT" },
{ "name": "Bz", "type": "double", "units": "nT" },
]
}
An info
request like this:
http://example.com/hapi/info?id=MU_MAG_DATA¶meters=Bx
would result in a header listing only the one dataset parameter:
{ "HAPI": "1.0",
"creationDate”: "2016-06-15T12:34"
"parameters": [
{ "name": "Time",
"type": "isotime",
"length": 24 },
{ "name": "Bx", "type": "double", "units": "nT" },
]
}
Note that the time parameter is still included as the first dataset parameter, even if not requested.
Here is a summary of the effect of asking for a subset of dataset parameters:
- do not ask for any specific parameters (i.e., there is no request parameter called ‘parameters’): all columns
- ask for just the time parameter: just the time column
- ask for a single, non-time dataset parameter (like ‘parameters=Bx’): a time column and one data column
The data endpoint also takes the parameters
option, and so behaves the same way as the info
endpoint in terms of which columns are included in the response.
Provides access to a data resource and allows for selecting time ranges and fields to return. Data is returned as a stream in CSV[2], binary, or JSON format. The “Data Stream Formats” section describes the stream contents.
The resulting data stream can be thought of as a stream of records, where each record contains one value for each of the dataset parameters. Each data record must contain a data value or a fill value (of the same data type) for each parameter.
Request Parameters
Name | Description |
---|---|
id | Required The identifier for the resource |
time.min | Required The smallest value of time to include in the response |
time.max | Required The largest value of time to include in the response |
parameters | Optional A comma separated list of parameters to include in the response. Default is all parameters. |
include | Optional Has one possible value of "header" to indicate that the info header should precede the data. The header lines will be prefixed with the "#" character. |
format | Optional The desired format for the data stream. Possible values are "csv", "binary", and "json". |
Response
Response is in one of three formats: CSV format as defined by RFC-4180 with a mime type of "text/csv"; binary format where floating points number are in IEEE 754[5] format and byte order is LSB; JSON format with the structure as described below. The default data format is CSV. See the section on Data Stream Content for more details.
If the header is requested, then for binary and CSV formats, each line of the header must begin with a hash (#) character. For JSON output, no prefix character should be used, because the data object will just be another JSON element within the response. Other than the possible prefix character, the contents of the header should be the same as returned from the info endpoint. When a data stream has an attached header, the header must contain an additional "format" attribute to indicate if the content after the header is "csv", "binary", or "json". Note that when a header is included in a CSV response, the data stream is not strictly in CSV format.
The first parameter in the data must be a time column (type of "isotime") and this must be the independent variable for the dataset. If a subset of parameters is requested, the time column is always provided, even if it is not requested.
The three possible output formats are "csv", "binary", and "json". A HAPI server must support CSV, while binary and JSON are optional.
In the CSV stream, each record is one line of text, with commas between the values for each dataset parameter. Array parameters are unwound in the sense that each element in the array goes into its own column. Currently, only 1-D arrays are supported, so the ordering of the unwound columns is just the index ordering of the array elements. It is up to the server to decide how much precision to include in the ASCII values for the dataset parameters.
The binary data output is essentially a binary version of the CSV stream. Recall that the dataset header provides type information for each dataset parameter, and this definitively indicates the number of bytes and the byte structure of each parameter, and thus of each binary record in the stream. All numeric values are little endian, integers are always four byte, and floating point values are always IEEE 754 double precision values.
Dataset parameters of type ‘string’ and ‘isotime’ (which are just strings of ISO 8601 dates) must have in their header a length element. All strings in the binary stream should be null terminated, and so the length element in the header should include the null terminator as part of the length for that string parameter.
For the JSON output, an additional "data" element in the header contains an array of data records. These records are very similar to the CSV output, except that strings must be quoted and arays should be delimited with aray brackets in standard JSON fashion. An example helps illustrate what the JSON format looks like. Consider a dataset with four parameters: time, a scalar value, an 1-D array value with array length of 3, and a string value. The header might look like this:
{ "HAPI": "1.0",
"creationDate”: "2016-06-15T12:34"
"parameters": [
{ "name": "Time", "type": "isotime", "length": 24 },
{ "name": "quality_flag", "type": "integer", "description": "0=ok; 1=bad" },
{ "name": "mag_GSE", "type": "double", "units": "nT", "size" : [3],
"description": "hourly average Cartesian magnetic field in nT in GSE" },
{ "name": "region", "type": "string", "length": 20}
]
}
The data rows with records from this dataset should look like this:
"data" : [
["2010-001T12:01:00",0,[0.44302,0.398,-8.49],"sheath"],
["2010-001T12:02:00",0,[0.44177,0.393,-9.45],"sheath"],
["2010-001T12:03:00",0,[0.44003,0.397,-9.38],"sheath"],
["2010-001T12:04:00",1,[0.43904,0.399,-9.16],"sheath"]
]
The overall data element is a JSON array of records. Each record is itself an array of parameters. The time and string values are in quotes, and any data parameter in the record that is an array must be inside square brackets. This overall data element appears as the last JSON element in the header.
The record-oriented arrangement of the JSON format is designed to allow client readers to begin reading (and processing) the JSON data stream before it is complete. Note also that servers can start streaming the data as soon as records are avaialble. In other words, the JSON format can be read and written without first having to hold all the records in memory. This may rquire some custom elements in the JSON parser, but preserving this streaming capabliity is important for keeping the HAPI spec scalable. If pulling all the data content into memory is not a problem, then ordinary JSON parsers will have no trouble with this JSON arrangement.
Examples
Two examples of data requests and responses are given – one with the header and one without.
Data with Header
Note that in this request, the header is requested, so the same header from the info request will be prepended to the data, but with a ‘#’ character as a prefix for every header line.
http://example.com/hapi/data?id=path/to/ACE_MAG&time.min=2016-01-01&time.max=2016-02-01&include=header
Example Response: Data with Header
#{
# "HAPI": "1.0",
# "creationDate”: "2016-06-15T12:34"
# "format": "csv",
# "parameters": [
# { "name": "Time",
# "type": "isotime",
# "length": 24
# },
# { "name": "radial_position",
# "type": "double",
# "units": "km",
# "description": "radial position of the spacecraft"
# },
# { "name": "quality flag",
# "type": "integer",
# "units ": "none ",
# "description ": "0=OK and 1=bad "
# },
# { "name": "mag_GSE",
# "type": "double",
# "units": "nT",
# "size" : [3],
# "description": "hourly average Cartesian magnetic field in nT in GSE"
# }
# ]
#}
Time,radial_position,quality_flag,mag_GSE_0,mag_GSE_1,mag_GSE_2
2016-01-01T00:00:00.000,6.848351,0,0.05,0.08,-50.98
2016-01-01T01:00:00.000,6.890149,0,0.04,0.07,-45.26
?
?
2016-01-01T02:00:00.000,8.142253,0,2.74,0.17,-28.62
Data Only
The following example is the same, except it lacks the request to include the header.
http://example.com/hapi/data?id=path/to/ACE_MAG&time.min=2016-01-01&time.max=2016-02-01
Example Response: Data Only
If the data resource contains a time field, plus 3 data fields the response will be something like:
Time,radial_position,quality_flag,mag_GSE_0,mag_GSE_1,mag_GSE_2
2016-01-01T00:00:00.000,6.848351,0,0.05,0.08,-50.98
2016-01-01T01:00:00.000,6.890149,0,0.04,0.07,-45.26
?
?
2016-01-01T02:00:00.000,8.142253,0,2.74,0.17,-28.62
Error reporting by the server should be done use standard HTTP status codes. In an error response, the HTTP header should include the code and the body of the error message should include relevant details about the error. It is up to clients then how to ingest and/or display this error information. Servers are encouraged but not required to limit their return codes to the following suggested set:
Status Code | Description |
---|---|
200 | OK – the user request was fully met with no errors |
400 | Bad request – something was wrong with the request, such as an unknown request parameter (misspelled or incorrect capitalization perhaps?), an unknown data parameter, an invalid time range, unknown dataset id |
500 | Internal error – the server had some internal fault or failure due to an internal software or hardware problem |
Because servers are not required to limit HTTP return codes to those in the above table, clients should be able to handle the full range of HTTP responses. Also, a HAPI server may not be the only software to interact with a URL-based request from a HAPI server (there may be a load balancer or upstream request routing or caching mechanism in place), and thus it is just good client-side practice to be able to handle any HTTP errors.
The HAPI specification is focused on access to time series data, so understanding how the server parses and emits time values is important.
When making a request to the server, the time range (time.min
and time.max
) values must each be valid time strings according to the ISO 8601 standard. Only two flavors of ISO 8601 time strings are allowed, namely those formatted at year-month-day (yyyy-mm-ddThh:mm:ss.sss) or day-of-year (yyyy-dddThh:mm:ss.sss). Servers should be able to handle either of these time string formats, but do not need to handle some of the more esoteric ISO 8601 formats, such as year + week-of-year. Any date or time elements missing from the string are assumed to take on their smallest possible value. For example, the string 2017-01-10T12
is the same as 2017-01-10T12:00:00.000.
Servers should be able to parse and properly interpret these types of truncated time strings.
Time values in the outgoing data stream must be ISO 8601 strings. A server may use either the yyyy-mm-ddThh:mm:ss or the yyyy-dddThh:mm:ss form, but should use just one format within any given dataset. Emitting truncated time strings is allowed, and again missing date or time elments are assumed to have the lowest value. Therefore, clients must be able to transparently handle truncated ISO strings of both flavors. For binary
and csv
data, a truncated time string is indicated by setting the length
attribute for the time parameter.
Note that a fill value can be provided for time parameters. If no fill value for time is specified, the string "0001-01-01T00:00:00.000" is used as the default.
Servers should strictly obey the time bounds requested by clients. No data values outside the requested time range should be returned.
Note that the ISO 8601 time format allows arbitrary precision on the time values. HAPI servers should therefore also accept time values with high precision. As a practical limit, servers should at least handle time values down to the nanosecond or picosecond level.
While the HAPI server strictly checks all request parameters (servers must return an error code given any unrecognized request parameter as described earlier), the JSON content output by a HAPI server may contain additional, user-defined metadata elements. All non-standard metadata keywords must begin with the prefix “x_” to indicate to HAPI clients that these are extensions. Custom clients could make use of the additional keywords, but standard clients would ignore the extensions. By using the standard prefix, the custom keywords will never conflict with any future keywords added to the HAPI standard.
Note that there are only a few supported data types: isotime, string, integer, and double. This is intended to keep the client code simple in terms of dealing with the data stream. However, the spec may be expanded in the future to include other types, such as 4 byte floating point values (which would be called float), or 2 byte integers (which would be called short).
The 'size' attribute is required for array parameters and not allowed for others. Currently, data parameters can be only up to 1-D arrays. The size attribute must be a 1-D JSON array of length one, with the one element in the JSON array indicating the number of elements in the data array. For a spectrum, this number of elements is the number of wavelengths or energies in the spectrum. Thus "size":[9]
refers to a data parameter that is a 1-D array of length 9. In the csv
and binary
output formats, there will be 9 columns for this data parameter. In the json
output for this data parameter, each record will contain a JSON array of 9 elements (enclosed in brackets [ ]
).
(Note: Since the data arrays are at most 1-D, the size
atribute could be a scalar, but having it as an array allows for easier future expansion of the spec to include 2-D or higher data arrays.)
Because the fill values for all types must be specified as a string, care must be taken to accurately represent any desired numeric fill value. For integers, string fill values that are small enough will properly parse to the right internal binary representation. For ‘double’ parameters, the fill string must parse to an exact IEEE 754 double representation. The string ‘NaN’ is allowed, in which case ‘csv’ output should contain the string ‘NaN’ for fill values. For double NaN values, the bit pattern for quiet NaN should be used, as opposed to the signaling NaN, which should not be used (see reference [6]). For ‘string’ and ‘isotime’ parameters, the string ‘fill’ value is used at face value.
The following two examples illustrate two different ways to represent a magnetic field dataset. The first lists a time column and three data columns, Bx, By, and Bz for the Cartesian components.
{
"HAPI": "1.0",
"firstDate": "2016-01-01T00:00:00.000",
"lastDate": "2016-01-31T24:00:00.000",
"time": "timestamp",
"parameters": [
{"name" : "timestamp", "type": "isotime", "units": "none"},
{"name" : "bx", "type": "double", "units": "nT"},
{"name" : "by", "type": "double", "units": "nT"},
{"name" : "bz", "type": "double", "units": "nT"}
}
}
This example shows a header for the same conceptual data (time and three magnetic field components), but with the three components grouped into a one-dimensional array of size 3.
{
"HAPI": "1.0",
"firstDate": "2016-01-01T00:00:00.000",
"lastDate": "2016-01-31T24:00:00.000",
"time": "timestamp",
"parameters": [
{ "name" : "timestamp", "type": "isotime", "units": "none" },
{ "name" : "b_field", "type": "double", "units": "nT","size": [3] }
]
}
These two different representations affect how a subset of parameters could be requested from a server. The first example, by listing Bx, By, and Bz as separate parameters, allows clients to request individual components:
http://example.com/hapi/data?id=MY_MAG_DATA&time.min=2001&time.max=2010¶meters=Bx
This request would just return a time column (always included as the first column) and a Bx column. But in the second example, the components are all inside a single parameter named ‘b_field’ and so a request for this parameter must always return all the components of the parameter. There is no way to request individual elements of an array parameter.
The following example shows a proton energy spectrum and illustrates the use of the ‘bins’ element. Note also that the uncertainty of the values associated with the proton spectrum are a separate variable. There is currently no way in the HAPI spec to explicitly link a variable to its uncertainties.
{"HAPI": "1.0",
"creationDate": "2016-06-15T12:34",
"parameters": [
{ "name": "Time",
"type": "isotime",
"length": 24
},
{ "name": "qual_flag",
"type": "int"
},
{ "name": "maglat",
"type": "double",
"units": "degrees",
"description": "magnetic latitude"
},
{ "name": "MLT",
"type": "string",
"length": 5,
"units": "hours:minutes",
"description": "magnetic local time in HH:MM"
},
{ "name": "proton_spectrum",
"type": "double",
"size": [3],
"units": "particles/(sec ster cm^2 keV)"
"bins": {
"units": "keV",
"values": [
{ "min": 10.1, "center": 15, "max": 20 },
{ "min": 20, "center": 25, "max": 30.3 },
{ "min": 30.3, "center": 35, "max": 40 }
]
},
{ "name": "proton_spectrum_uncerts",
"type": "double",
"size": [3],
"units": "particles/(sec ster cm^2 keV)"
"bins": {
"units": "keV",
"values": [
{ "min": 10.1, "center": 15, "max": 20 },
{ "min": 20, "center": 25, "max": 30.3 },
{ "min": 30.3, "center": 35, "max": 40 }
]
}
]
}
When the server sees a request parameter that it does not recognize, it should throw an error.
So given this query
http://example.com/hapi/data?id=DATA&time.min=T1&time.max=T2&fields=mag_GSE&avg=5s
the server should throw an error with a BAD_REQUEST response code to indicate that an invalid request parameter was provided.
In following general security practices, HAPI servers should carefully screen incoming request parameter names values. Unknown request parameters and values should not be echoed in the error response.
In terms of adopting HAPI as a data delivery mechanism, data providers will likely not want to change existing services, so a HAPI compliant access mechanism could be added alongside existing services. Several demonstration servers exist, but there are not yet any libraries or tools available for providers to use or adapt. These will be made available as they are created. The goal is to create a reference implementation as a full-fledged example that providers could adapt. On the client side, there are also demonstration level capabilities, and Autoplot currently can access HAPI compliant servers. Eventually, libraries in several languages will be made available to assist in writing clients that extract data from HAPI servers. However, even without example code, the HAPI specification is designed to be simple enough so that even small data providers could add HAPI compliant access to their holdings.
[1] ISO 8601:2004, http://dotat.at/tmp/ISO_8601-2004_E.pdf
[2] CSV format, https://tools.ietf.org/html/rfc4180
[3] JSON Format, https://tools.ietf.org/html/rfc7159
[4] "JSON Schema", http://json-schema.org/
[5] EEE Computer Society (August 29, 2008). "IEEE Standard for Floating-Point Arithmetic". IEEE. doi:10.1109/IEEESTD.2008.4610935. ISBN 978-0-7381-5753-5. IEEE Std 754-2008
[6] IEEE Standard 754 Floating Point Numbers, http://steve.hollasch.net/cgindex/coding/ieeefloat.html
Todd King ([email protected])
Jon Vandegriff ([email protected])
Robert Weigel ([email protected])
Robert Candey ([email protected])
Aaron Roberts ([email protected])
Bernard Harris ([email protected])
Nand Lal ([email protected])
Jeremy Faden ([email protected])