diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 000000000..e69de29bb diff --git a/404.html b/404.html new file mode 100644 index 000000000..62279c42a --- /dev/null +++ b/404.html @@ -0,0 +1,1066 @@ + + + +
+ + + + + + + + + + + + + + + + + +Citation
+Beacon v2 and Beacon Networks: a "lingua franca" for federated data discovery in biomedical genomics, and beyond. +Jordi Rambla, Michael Baudis, Tim Beck, Lauren A. Fromont, Arcadi Navarro, Manuel Rueda, Gary Saunders, Babita Singh, J.Dylan Spalding, Juha Tornroos, Claudia Vasallo, Colin D.Veal, Anthony J.Brookes. Human Mutation (2022) DOI.
+The Beacon Framework describes the overall structure of the API +requests, responses, parameters etc. One can implement e.g. a Boolean beacon (cf. the +original protocol) without any use of the model, just by providing a well-formed +JSON response upon a request very similar to the (pre-)v1 allele request.
+This example is for a minimal SNV-type variant query.
+/beacon/g_variants/?referenceName=refseq:NC_000017.11&start=7577120&referenceBases=G&alternateBases=A
+
In this minimal response to the query above the beacon indicates that its default
+response is Boolean and that it could interpreted it against the genomicVariant
entity and in the context of the same Beacon version.
In principle one could launch a Beacon instance using the example response document as a template
+in whatever server environment one has at hand. However, a proper Beacon v2
+installation also has to provide informational endpoints (/info
, /map
...)
+to allow it's integration through aggregators.
{
+ "meta": {
+ "apiVersion": "v2.0.0",
+ "beaconId": "org.progenetix.beacon",
+ "receivedRequestSummary": {
+ "apiVersion": "v2.0.0",
+ "pagination": {
+ "limit": 2000,
+ "skip": 0
+ },
+ "requestedGranularity": "boolean",
+ "requestedSchemas": [
+ {
+ "entityType": "genomicVariant",
+ "schema": "https://progenetix.org/services/schemas/genomicVariant/"
+ }
+ ],
+ "requestParameters": {
+ "alternateBases": "A",
+ "referenceBases": "G",
+ "referenceName": "refseq:NC_000017.11",
+ "start": [
+ 7577120
+ ]
+ }
+ },
+ "returnedGranularity": "boolean",
+ "returnedSchemas": [
+ {
+ "entityType": "genomicVariant",
+ "schema": "https://progenetix.org/services/schemas/genomicVariant/"
+ }
+ ]
+ },
+ "responseSummary": {
+ "exists": true
+ }
+}
+
Beacon
or beacon
?The uppercase Beacon
is used to label API, framework or protocol and their
+components - while lower case beacons
are instances of these, i.e. individual
+resources using the protocol.
Beacon v2.0 does not provide a mechanism to detect what types of genomic variant +queries are supported by a given instance.
+Beacon had been originally designed to handle the "simplest" type of genomic
+variant queries in which a position
, alternateBases
(i.e. one or more base
+sequence of the variant at the position) and - sometimes optional - the reference
+sequence at this position (necessary e.g. for small deletions).
Beacon v1.1 in principle supported "bracketed" queries and a variantType
parameter
+(pointing to the VCF use) - see the current documentation for details. However, the support & interpretation was - and still is (2022-12-13) -
+left to implementers. Similar for Beacon Range Queries.
However, the Beacon documentation
+provides information about use and expected interpretation of variantType
values, specifically
+for copy number variations.
Ages are queried as ISO8601 durations
+such as P65Y
(i.e. 65 years) with a comparator (=
, <=
, >
...). However,
+the value needs an indication of what the duration refers to and resources
+may provide different ways to indicate this (as then shown in their /filtering_terms
)
+endpoint).
We recommend that all Beacon instances that support age queries support at
+minimum the syntax of age:<=P65Y
and map such values to the internal datapoint
+most relevant for the resource's context (in most cases probably corresponding
+to "age at diagniosis").
However, different scenarios may be supported (e.g. EFO_0005056:<=P1Y6M
for
+an "age at death" scenario).
The Beacon framework currently (v2.0 and earlier) considers genomic
+variants to be allelic and does not support the query for multiple alleles
+or "haplotype shorthand expressions" (e.g. C,T
).
Workarounds In case of a specific need for haplotype queries implementers +of a given beacon with control of its data content in principle can extend their +query model to support shorthand haploype expressions, as long as they support +the standard format, too. However, such an approach may be superseeded or in conflict +with future direct protocol support.
+An approach in line with the current protocol would be to query for one allelic
+variant with a record-level genomicVariation
response, and then query the
+retrieved variants individually by their id
in combination with the second
+allele.
As with queries the Beacon "legacy" format does not support haplotype representation +but would represent each allelic variation separately. The same is true for the +VRSified variant representation which for v2.0 corresponds to VRS v1.2. +However, draft versions of the VRS standard (will) address haplotype and genotype +representations and will be adopted by Beacon v2.n after reaching a release state.
+About UI
+Most of the information that you will find here is related to the Beacon v2 specification. For that reason, the examples are shown as REST API requests/responses in the form of JSON. If you are only interested in using beacon with a graphical interface please visit the implementations page.
+While the original Beacon v1 only provided Boolean (i.e. YES/NO) responses +on queries for the existence of specific genomic variants, Beacon v2 is a flexible +protocol that supports different usage scenarios - also called "flavours", since +they are more a representation of usage types w/o prescribing their specific details.
+Importantly, the Beacon framework separates query options from the response side. In that way +a privacy-protecting1 Boolean Beacon still may offer more query features - and therefore better +usability - compared to the first Beacon concept implementations.
+Technical Notes
+For detailed information about the technical implementation of the different logical +scopes please see the Framework documentation.
+A Boolean Response Beacon is in it's response similar to Beacon v1 - i.e. responding +with a true or false value when queried for the existence of some data in a resource. Similarly +a Count Response Beacon only returns aggregate information, i.e. the number of matched +entries (e.g. genomic variants), a feature also part of the Beacon v1 protocol.
+However, in contrast to earlier versions, in Beacon v2 in principle a beaconized resource +may implement all types of query options (e.g. combinations of various filters and +genomic query parameters) but still only offer a Boolean and optionally Count response.
+Also, all Beacons should implement the Boolean Response format as fallback option and +handle extended options depending on the user's authentication status.
+{
+ "meta": {
+ "apiVersion": "v2.0.0",
+ "__other_meta_parameters__": "..."
+ "receivedRequestSummary": {
+ "requestedGranularity": "boolean",
+ "__other_request_parameters__": "..."
+ },
+ "returnedGranularity": "boolean"
+ },
+ "responseSummary": {
+ "exists": true
+ }
+}
+
{
+ "meta": {
+ "apiVersion": "v2.0.0",
+ "__other_meta_parameters__": "..."
+ "receivedRequestSummary": {
+ "requestedGranularity": "count",
+ "__other_request_parameters__": "..."
+ },
+ "returnedGranularity": "count"
+ },
+ "responseSummary": {
+ "exists": true,
+ "numTotalResults": 42
+ }
+}
+
Technical Notes
+For detailed information about the technical implementation of the different logical +scopes please see the Models documentation.
+Information about the different data delivery options can be found here:
+Privacy protecting as in "reasonably protecting by design but not immune to complex +re-identification attacks". ↩
+This page only lists changes w/ regard to the documentation and general organization +of the Beacon project site(s) as well as with overarching repository organization.
+GET
contextHTTPS
issue (by brute-forcing all links on site to https://
)website-docs
branch¶To protect the code branches we are using now a separate website-docs
branch in beacon-v2
for documentation
+website updates. Please make sure all documentation edits happen there!
beacon-framework-v2
and beacon-v2-Models
repos¶archived
w/ pointers to this one here and
+archived (i.e. set to read only)implementations-v2
repository (part of documentation)filters.md
from section Beacon Components to Implement...._rest-api.md
and _tips-for-implementers.md
).bin
files that parse JSON schemasdocs/*.md
¶mermaid
to mermaid2
plugin.networks.md and
roles.md`security.md
under Beacon Types.implementations-v2
repository to the Beacon v2 Documentation - web access here.beacon-v2
¶beacon-v2-unity-testing
+to beacon-v2
.implementations-and-networks
to other-implementations
and left only the "Networks" Part.mkdocs-mermaid2-plugin
both to mkdocs.yaml
and to github workflows.Beacon Compoments/Models
implement-and-deploy.md
The mkdocs-macros-plugin
has been activated, allowing the use of site-wide variables:
repo_model_url: https://github.com/ga4gh-beacon/beacon-v2/tree/main/models/src
{{ no such element: mkdocs.config.defaults.MkDocsConfig object['repo_model_url'] }}
Implementations and Networks
and Standards IntegrationAs of today the new/emerging Beacon v2 documentation is meintained in this repository. We're testing rendered versions (same text/code base) through Github actions (here) and ReadTheDocs.
+material
themed buildyaml
export version¶Since moving to source in YAML the existence of a separate yaml
export seems unnecessary & maybe confusing. Removed.
The structure of the models
directory has now be changed to have the default model as one of possibly multiple
+options as per the discussions in #1.
+The current structure (below) might not be final (e.g. placing of the beaconConfiguration.yaml
, beaconMap.yaml
, endpoints.yaml
files?).
beacon
+ |
+ |-- framework ...
+ |-- models
+ | |-- src
+ | | |-- beacon-v2-default-model
+ | | |-- analyses ...
+ | | |-- biosamples ...
+ | | |-- genomicVariations ...
+ | | |-- ...
+ | | |-- endpoints.yaml
+ | |
+ | |-- json
+ | |-- beacon-v2-default-model
+ | |-- analyses ...
+ | |-- biosamples ...
+ | |-- genomicVariations ...
+ | |-- ...
+ | |-- endpoints.yaml
+ |
+ |-- bin ...
+ |-- docs ...
+...
+
git -C $BEACONMODELPATH pull
+git -C $BEACONFRAMEWORKPATH pull
+
yamler.py
with a dedicated beaconYamler.py
The development of Beacon code and documentation happens in the beacon-v2
repository.
main
¶The main
branch is the branch used for production, it reflects the last version that beacon v2 has reached by accomplishing the milestones that ga4gh has set for the beacon to be considered as a new version. It can only be committed by a PR from the develop branch and exceptionally by some hotfixes to correct errors spotted after its official deployment.
develop
¶The develop
branch is the branch used for development, it reflects the current state of the progress of development. It can be modified by all the PR from the feature branches that have been finished (this means that must include all the merges from the subfeature branches) and the PR must reach a consensus to be finally accepted.
website-docs
¶This branch is used to maintain the website at docs.genomebeacons.org. The relevant files consists of anything under /docs
as well as the configuration file (/mkdocs.yaml
) and the workflow file for processing the pages under /.github/workflows/mk-beacon-docs.yaml
.
Changes to the Markdown files in the /docs
directory (and its children) will initiate the processing of the workflow file; updating of the website than may take some minutes.
gh-pages
¶The gh-pages
branch is generated from the /docs
directory through its mkdocs
workflow and contains the website itself. Do not edit
TBD
+ + + + + + +The Beacon API & standard is a driver project of the Global Alliance for Genomics +and Health GA4GH. Since 2016 Beacon development has been +organized through projects supported by ELIXIR with additional contributions from +outside organizations and individual developers and implementers.
+TBD
+ + + + + + +Filters represent a powerful addition to the Beacon query API. They are rules for selecting records based upon the field values those records contain. The rules can refer to bio-ontology or custom terms, numerical or alphanumerical values, and employ wildcards, standard operators or other principles of selection. This empowers such options as queries for phenotypes, disease codes or technical parameters associated with observed genomic variants.
+Using Filters
+Please see Using Filters in Queries for +more information on how to use filters in Beacon requests.
+A Beacon can support three general types of Filters.
+OntologyFilters
are identified using the full term/class identifier
+ as CURIE, e.g. “HP:0100526”.HP:0032443
Past medical history), a
+ comparator and a numerical, pseudo-numerical (e.g. ISO8601 period) or string
+ valueThe /filtering_terms endpoint returns a list of all data fields whose values may be subjected to filtering, plus the data type(s) for those fields, and/or the list of extant values for each of those data fields in the current dataset. In addition, for each bio-ontology used by a Beacon, the endpoint response includes a description of the bio-ontology in Phenopackets Resource format.
+The endpoint's filteringTerms
response identifies the Filter types.
Bio-ontology and custom term Filter types contain:
+type
= resource name (required) id
= term id (required) label
= term label (optional)"response":{
+ "resources":[
+ {
+ "id":"hp",
+ "name":"Human Phenotype Ontology",
+ "url":"https://purl.obolibrary.org/obo/hp.owl",
+ "version":"27-03-2020",
+ "namespacePrefix":"HP",
+ "iriPrefix":"https://purl.obolibrary.org/obo/HP_"
+ },
+ ...
+ ],
+ "filteringTerms": [
+ {
+ "type": "ontologyTerm",
+ "id": "HP:0008773",
+ "label": "neoplasm of the lung"
+ },
+ ...
+ ]
+}
+
Alphanumerical value Filter types contain:
+type
= data type as 'alphanumeric' (required) id
= field id (required) label
= field label (optional) "filteringTerms": [
+ {
+ "type": "alphanumeric",
+ "id": "PATO:0000011",
+ "label": "age"
+ },
+ ...
+]
+
For all query types, the logical AND
is implied between Filters. The Filter id
is required for all query types.
Filters in GET
Requests
GET
requests use a filters
parameter for one or more (comma-separated) filter id
values.
+In this case general filter defaults apply (e.g. { "includeDescendantTerms": true }
). Generally,
+use of filters other than CURIE values for filter ids is discouraged.
List Parameters in GET Requests
+Since the direct interpretation of list parameters in queries is not supported by
+some server environments (e.g. PHP, GO…), list parameters such as start
and end
+should be provided as comma-concatenated strings when using them in GET requests.
Hierarchical term expansion
+It is recomended that the use of terms from hierarchical ontologies/classicfications +uses an internal term expansion mechanism - i.e. records with parameters containing +a child term are matched when the parent term is being queried. +This default behaviour can be modoiified (see below).
+The following query retrieves (or filters retrieved...) data matching the diagnosis of +Papillary Renal Cell Carcinoma (NCIT:C6975) from a publication identified through its PubMed id (22824167):
+/biosamples?filters=PMID:22824167,NCIT:C6975
+
"filters": [
+ {
+ "id": "PMID:22824167"
+ },
+ {
+ "id": "NCIT:C6975"
+ }
+]
+
A Beacon will query for entities associated with the submitted bio-ontology term(s), and by default, all descendent terms.
+The optional includeDescendantTerms
parameter can be set to either true
or false
. The default and assumed value
+of includeDescendantTerms
is true
, thus if the parameter is not set, then the use of bio-ontology terms in a Beacon
+request implies that a hierarchical ontology search is requested.
Request example of two filters, where one filter excludes matches with descendent terms:
+"filters": [
+ {
+ "id": "HP:0100526",
+ "includeDescendantTerms": false
+ },
+ {
+ "id": "HP:0005978"
+ }
+]
+
A Beacon will query for entities that are associated with bio-ontology terms that are similar to the submitted terms. The Beacon API is agnostic to the semantic similarity model implemented by a Beacon and how a Beacon applies the relative thresholds of similarity. A semantic similarity query request contains the required similarity
parameter with a value set to define the relative threshold level of high
, medium
or low
.
POST request example of two Filters using differing relative similarity thresholds:
+"filters": [
+ {
+ "id": "HP:0100526",
+ "similarity": "high"
+ },
+ {
+ "id": "HP:0005978",
+ "similarity": "medium"
+ }
+]
+
A Beacon will query for quantitative properties when the required operator
and
+numerical value
parameters are set in the filters request. The id
parameter
+identifies the logical scope (with the exact field depending on the internal data
+model at the resource), the operator
parameter defines the operator to use,
+and the value
parameter provides the field query value. Equality and relational
+operators (= < >) can be used between field name and field value pairs, and field
+values can be associated with units if applicable.
filters=age:>P70Y
filters=PATO_0000011:>P70Y
("age")filters=EFO_0004847:>P70Y
("age at onset")"filters": [
+ {
+ "id": "PATO:0000011",
+ "operator": ">",
+ "value": "P70Y"
+ }
+]
+
We recommend that implementers provide term expansions for equivalent terms, +depending on the context. Also, it is up to the implementers to provide the +correct tooling for e.g. transformation of input values (e.g. numerical age in +years and comparator) to the standardized wire format (e.g. ages/durations are +always transmitted as ISO8601 periods) as well as the correct deparsing and +use (e.g. the ISO values probably will be converted to some numerical format for +database matches).
+A Beacon will query free-text values within fields when the required operator
+and alphanumerical value
parameters are set in the filters request. Queries can
+be for exact alphanumerical values, used to exclude alphanumerical values, or employ
+wildcards to match patterns within alphanumerical values. In all query classes,
+the id
parameter identifies the field name, the operator
parameter defines the
+operator to use, and the value
parameter provides the field query value.
The operator
parameter is set to the equality (=) operator.
POST request example of using free-text to filter medical history (past medical history = HP:0032443):
+"filters": [
+ {
+ "id": "HP:0032443",
+ "operator": "=",
+ "value": "unknown medical history"
+ }
+]
+
'LIKE' value query
+The inclusion of a percent sign (%) wildcard character within the value
parameter represents zero or more characters within a LIKE style string match. The wildcard character can lead the query string, end the string, or surround the string.
POST request example to filter medical history free-text for any reference to cancer:
+"filters": [
+ {
+ "id": "HP:0032443",
+ "operator": "=",
+ "value": "%cancer%"
+ }
+]
+
The operator
parameter is set to the logical not (!) operator. The value
parameter should not be present in field value. The wildcard character can be used if required. The following example shows how to filter medical history free-text for records that do not include the query string:
filters=HP_0032443:!unknown+medical+history
"filters": [
+ {
+ "id": "HP:0032443",
+ "operator": "!",
+ "value": "unknown medical history"
+ }
+]
+
For historical reasons, in the names of entities, parameters and URLs we are following these conventions:
+PascalCase
camelCase
snake_case
The only exception is: service-info
which is a required GA4GH standard and has a different word separation convention.
The Beacon v2 API follows OpenAPI 3.0.2 specification for the endpoints, in conjuntion with JSON Schema (2020-12) to define the Framework and the Models components. The specification uses JSON references ($ref
) to reference internal (e.g., definitions) or external concepts/terms (e.g., VRS).
The Beacon v2 specification is written in YAML. The original files are located under src
directory (see below). For technical purposes, we also provide a copy of the original YAML in JSON format (see json
directory below). Changes in the specification must be performed in the YAML version and are then rewritten to the JSON version.
framework
+|-- json
+| |-- common
+| | `-- examples
+| |-- configuration
+| | `-- examples
+| |-- requests
+| | |-- examples-fullDocuments
+| | `-- examples-sections
+| `-- responses
+| |-- sections
+| |-- examples-fullDocuments
+| `-- examples-sections
+`-- src
+ |-- common
+ | `-- examples
+ |-- configuration
+ | `-- examples
+ |-- requests
+ | |-- examples-fullDocuments
+ | `-- examples-sections
+ `-- responses
+ |-- sections
+ |-- examples-fullDocuments
+ `-- examples-sections
+
+models
+|-- json
+| `-- beacon-v2-default-model
+| |-- analyses
+| | `-- examples
+| |-- biosamples
+| | `-- examples
+| |-- cohorts
+| | `-- examples
+| |-- common
+| |-- datasets
+| | `-- examples
+| |-- genomicVariations
+| | `-- examples
+| |-- individuals
+| | `-- examples
+| `-- runs
+| `-- examples
+`-- src
+ `-- beacon-v2-default-model
+ |-- analyses
+ | `-- examples
+ |-- biosamples
+ | `-- examples
+ |-- cohorts
+ | `-- examples
+ |-- common
+ |-- datasets
+ | `-- examples
+ |-- genomicVariations
+ | `-- examples
+ |-- individuals
+ | `-- examples
+ `-- runs
+ `-- examples
+
+GA4GH Genome Coordinate Use Recommendation1
+Date and time formats are specified as ISO8601 +compatible strings, both for time points as well as for durations. Some of the ISO8601 +compatible formats have not (yet) been used in the Beacon v2 default model.
+"type": "string", format": "date-time"
The development of the Beacon v2 framework and default model closely follows +and widely adopts concepts and schemas from approved GA4GH products such as +Phenopackets and the Variant Representation Standard (VRS).
+The GA4GH Variant Representation Standard (VRS) constitutes the reference one should use +when implementing representations of genomic variations. The current version 1.2 +has been approved and covers a set of use cases and requirements, especially with respect +to genomic (including cytogenetic or feature based) locations. However, it is not yet +suitable for a number of practical use cases, especially the representation of some structural variations.
+The Beacon v2 default model for GenomicVariation
makes use of the VRS standard to represent
+the variation
part, i.e. the location and sequence or copy number changes of the
+genomic variation. While a "legacy" alternative is still allowed this one too has been adjusted
+to make use of the VRS Location
format.
The examples are for different forma of the location
property inside a genomicVariation
.
"variation": {
+ "type": "Allele",
+ "state": {
+ "sequence": "G",
+ "type": "LiteralSequenceExpression"
+ },
+ "location": {
+ "type": "SequenceLocation",
+ "sequence_id": "refseq:NC_000017.11",
+ "interval": {
+ "type": "SequenceInterval",
+ "start": {
+ "type": "Number",
+ "value": 7577120
+ },
+ "end": {
+ "type": "Number",
+ "value": 7577121
+ }
+ }
+ }
+}
+
"variation": {
+ "type": "RelativeCopyNumber",
+ "relative_copy_class": "partial loss",
+ "location": {
+ "type": "SequenceLocation",
+ "sequence_id": "refseq:NC_000018.10",
+ "interval": {
+ "start": {
+ "type": "Number",
+ "value": 23029501
+ },
+ "end": {
+ "type": "Number",
+ "value": 62947165
+ }
+ }
+ }
+}
+
"variation": {
+ "variantType": "SNP",
+ "referenceBases": "C",
+ "alternateBases": "G",
+ "location": {
+ "type": "SequenceLocation",
+ "sequence_id": "refseq:NC_000017.11",
+ "interval": {
+ "type": "SequenceInterval",
+ "start": {
+ "type": "Number",
+ "value": 7577120
+ },
+ "end": {
+ "type": "Number",
+ "value": 7577121
+ }
+ }
+ }
+}
+
"variation": {
+ "variantType": "DEL",
+ "location": {
+ "type": "SequenceLocation",
+ "sequence_id": "refseq:NC_000018.10",
+ "interval": {
+ "start": {
+ "type": "Number",
+ "value": 23029501
+ },
+ "end": {
+ "type": "Number",
+ "value": 62947165
+ }
+ }
+ }
+}
+
In the Beacon v2 default data model, many schemas are either directly compatible to +Phenopackets v2 building blocks +or at least reflect them but with some adjustments. While the Beacon v2 default model's schemas do not per se have to reflect +PXF schemas, we target an as-close-as-possible alignment to promote/leverage GA4GH-wide +standardization.
+The Phenopackets model is centered around the Phenopacket
, which is the collector
+and integrator of all sub-schemas (with the addition of the external Family
and
+Cohort
schemas). While Phenopacket
usually describes information related to a
+subject
- which is defined in an Individual
- and the top level elements in
+Phenopacket
relate to a specific proband
(measurements
as "Measurements performed
+in the proband"), the phenopacket itself does not explicitely represent an individual.
In contrast, the Beacon v2 default model uses a hierarchy in which biosamples
+reference individuals directly (if existing). For most purposes one can equate Beacon's
+Individual
with a merge of Phenopacket's core Phenopacket
and Individual
parameters.
==
PXF v2¶Age
¶AgeRange
¶Evidence
¶KaryotypicSex
¶ReferenceRange
¶While unit
in Beacon points to a Unit
definition, this is itself an OntologyTerm
i.e. structurally the same.
Value
¶=~
PXF v2 (e.g. renamed or additional parameters)¶ComplexValue
¶Renamed ComplexValue.TypedQuantity.quantityType
compared to GA4GH Phenopackets v2 ComplexValue.TypedQuantity.type
due to problematic use of type
as parameter
ExternalReference
¶Renamed ExternalReference.notes
compared to GA4GH Phenopackets v2 ExternalReference.description
due to problematic use of description
as parameter
Measurement
¶Added notes
and date
.
PhenotypicFeature
¶Beacon | +Phenopackets | +
---|---|
featureType |
+type |
+
severity (re-used definition reflecting an ontology term) |
+severity (ontology class) |
+
notes |
++ |
Procedure
¶Beacon | +Phenopackets | +
---|---|
procedureCode |
+code |
+
ageAtProcedure (TimeElement) |
+performed (TimeElement ) |
+
dateOfProcedure (ISO date) |
++ |
TimeElement
¶The specific parameters have been aligned w/ minimal differences in naming or use of general parameters.
+Beacon | +Phenopackets | +
---|---|
ontologyTerm |
+ontology_class |
+
age |
+age (Age ) |
+
ageRange |
+age_range (AgeRange ) |
+
gestationalAge |
+gestational_age (GestationalAge ) |
+
...Timestamp |
+timestamp (TimeStamp ) |
+
timeInterval |
+interval (TimeInterval ) |
+
Treatment
¶Beacon still has an ageOfOnset
parameter (?). Also, PXF agent
has been renamed to a more general treatmentCode
.
~
PXF v2 (e.g. multiple/complex differences)¶Disease
¶Pedigree
¶While the Beacon & Phenopackets schemas for "pedigree" representation are not aligned, they may become superseded by the GA4GH pedigree standard currenty under development.
+Sex
¶Beacon directly uses the (IMO preferable) representation through an ontology term, while PXF uses an ordinal mapping
+Source: @andrewyatz at SchemaBlocks {S}[B] ↩
+The GA4GH Beacon specification is composed by two parts:
+ +The Beacon Framework is the part that describes the overall structure of the API requests, responses, parameters, the common components, etc. It could also be referred in this document as simply the Framework.
+A Beacon Model describes the set of concepts included in a Beacon version (e.g. Beacon v2), like individual or biosample. It could also be referred in this document as simply the Model.
+The Framework could be considered the syntax and the Model as the semantics.
+Refer to the Models for further information about the default model and how to use it.
+The Framework doesn't include anything related to specific entities but only the mechanisms for querying them and parsing the responses. +The BF is, therefore, independent from/agnostic to any specific Model. It can be leveraged to describe models from other domains like proteomics, imaging, biobanking, etc.
+A Beacon instance is just an implementation of a Beacon Model that follows the rules stated by the Beacon Framework.
+If you are a Beacon implementer, then, you don't need to clone this (Framework) repo, you only need to copy (or clone) the Beacon Model and modify it to your specific instance. You will find plenty of references to the Framework in the Model copy, and you will use the Json schemas in the Framework to validate that both the structure of your requests and responses are compliant with the Beacon Framework. The Beacon verifier tool would help in such validation.
+The Framework repo includes the elements that are common to all Beacons:
+Please visit the Standards Page
+The above listed elements are organized in several folders (in alphabetical order):
+The root folder only contains the endpoints.json document, an OpenAPI 3.0.2 description of the endpoints that every Beacon instance MUST implement.
+The endpoints are:
+* the root (/
) and /info
that MUST return information (metadata) about the Beacon service and the organization supporting it.
+* the /service-info
endpoint that returns the Beacon metadata in the GA4GH Service Info schema.
+* the /configuration
endpoint that returns some configuration aspects and the definition of the entry types (e.g. genomic variants, biosamples, cohorts) implemented in that specific Beacon server or instance.
+* the /entry_types
endpoints that only return the section of the configuration that describes the entry types in that Beacon.
+* the /map
endpoint that returns a map (like a web sitemap) of the different endpoints implemented in that Beacon instance.
+* the /filtering_terms
endpoint that returns a list of the filtering terms accepted by that Beacon instance.
Most of these endpoints simply return the configuration files that are in the Beacon configuration folder. Of course, every Beacon instance would have their particular instance of such documents, including the configuration of such instance.
+Note: It could be argued that the Beacon configuration files are different for every Beacon instance and, hence, they should be part of the Model. However, the configuration files MUST be used, exactly with the same schema, by any model, independently if that Beacon follows the Beacon v2 Model or any other. Additionally, these endpoints and configuration files are critical for a Beacon client to be able to understand and use a Beacon instance. Therefore, we have considered it to be an essential part of the Framework and belonging to it.
+Contains the Json schema files that describe the Beacon configuration, its contents are described in the section above, as they have almost a 1-to-1 relationship with such endpoints. Further details about the specific content of each file could be find in the corresponding sections below.
+Contains the following Json schemas:
+RequestBody
to keep the same nomenclature used by OpenAPI v3, but it actually contains the definition of the whole HTTP POST request payload.MIN
in the name shows the minimal required attributes for the request to be compliant. The example labelled with MAX
in the name includes a richer case with all the sections filled in.Both, the filters (filteringTerms) and the parameters (requestParameters), are used to refine the query. The availability of two mechanisms to refine the queries could sound initially confusing, but that separation is taylored to facilitate the interpretation of the request by the Beacon server.
+An basic difference is that, in HTTP GET requests, each parameters is named (e.g. 'id', 'skip','limit') while filters go under the same named parameter 'filters'. For HTTP POST requests, the difference relays on paramaters having each one a separate definition (e.g. id
is a string
, while skip
is an integer
), while all filters follow the schema described in /requests/filteringTerms.json
.
An unrestricted query like /datasets
should return the list of all datasets in a Beacon instance. That query could be refined by adding a generic condition like: "return only datasets which could be used for 'general research'" or "return only the first 10 datasets". The former belong to the filter category, the latter to the parameters. If you are a beacon implementer, a rule of thumb could be:
The Beacon concept includes several types of responses: some informative or informational and some with actual data payloads, and the error one.
+A Beacon is able to return information, details, about itself. Many of the schema responses included in the responses
folder have a 1-to-1 relationship with the corresponding configuration documents and their equivalent root endpoints, e.g. the beaconEntryTypeResponse.json
is the schema of a response that wraps the beaconConfiguration.json
document, and is then used as the payload of the /entry_types
root endpoint. Schematically:
+* configuration/an_schema.json: describes the schema of the configuration file itself.
+* responses/an_schema_response.json: describes the format of the response that returns these configuration information.
+* root/endpoints.json: describes the API endpoints to be called and parameters to be used to retrieve such responses.
The following schemas refer to informational responses: beaconConfigurationResponse, beaconEntryTypeResponse, beaconFilteringTermsResponse, ând beaconMapResponse.
+A Beacon could return responses at different granularity levels:
+exists: true
('Yes') or exists: false
('No') to a given query.Yes
/No
and the number of matching results.Yes
/No
, the number of matching results and all documents
+ corresponding to the requested entities. Documents are wrapped in "result set"
+ objects for every collection (e.g. every dataset or cohort). Even for record
+ level responses each beacon can control the details of data exposed in record
+ besides the minimal requirements of the entry type's schema.Each of these granularity levels has an equivalent response schema:
+beaconBooleanResponse
beaconCountResponse
beaconResultSetsResponse
An additional schema, beaconCollectionsResponse, describes such responses that returns details about the collections in a Beacon, but not the collection content themselves. Otherwise said, the response describes a dataset, but not returns the contents of any dataset.
+Some elements are transerval to the Framework and to any model, e.g. the schema +for describing an ontology term or the reference to an external schema (like the +reference to GA4GH Phenopackets or GA4GH Service Info schemas).
+skip
and limit
¶Record level responses potentially may return many (i.e. thousands and beyond)
+documents which usually would be "paginated", i.e. split into may chunks ("pages").
+Beacon handles pagination through the skip
and limit
parameters as part of the
+request:
limit
in the request tells the server the maximum number of records that should
+ be returned in a single response (i.e. the "page size")skip
indicates how many of those pages should be skipped over when delivering
+ the resultsTherefore, skip: 2
and limit: 8
will return records 17-24 (if those exist).
Given that the flexibility allowed in the implentation of each Beacon instance, and the security restrictions that could apply (e.g. only answering after authentication of the user), a mechanism is required for allowing testing the compliance of a Beacon. A first step in this compliance testing is done by the implementer by checking that received requests are correct and that the generated responses match the provided schemas. However, an external compliance testing is desirable when the Beacon instance plans to be integrated in a network or to engage in dialogs with a diversity of clients. For this second scenario, the testMode parameter was included.
+A Beacon instance could receive a request with the testMode parameter activated (value= true) in which case the Beacon MUST respond, with actual or fake contents, using the response format and skipping any user authentication. The fact that a response has been generated for testing purposes is included in the meta section of the response.
+The file /configuration/beaconConfiguration.json
defines the schema (in Json schema draft-07) of the Json file that includes core aspects of a Beacon instance configuration.
+The schema includes four sections:
boolean
(true/false) responses, and only if the user is authenticated and explicitly authorized to access the Beacon resources. Although this is the safest set of settings, it is not recommended unless the Beacon shares very sensitive information. Non sensitive Beacons should preferably opt for a record
and PUBLIC
combination.Granularity | +Description | +
---|---|
boolean |
+returns 'true/false' responses. | +
count |
+adds the total number of positive results found. | +
record |
+returns details for every row. | +
For those cases where a Beacon prefers to return records with less, not all, attributes, different strategies have been considered, e.g.: keep non-mandatory attributes empty, or Beacon to provide a minimal record definition, but these strategies still need to be tested in real world cases and hence no design decision has been taken yet.
+security level | +description | +
---|---|
PUBLIC |
+Any anonymous user can read the data | +
REGISTERED |
+Only known users can read the data | +
CONTROLLED |
+Only specificly granted users can read the data | +
"maturityAttributes": {
+ "productionStatus": "DEV"
+ },
+ "securityAttributes": {
+ "defaultGranularity": "boolean",
+ "securityLevels": ["PUBLIC", "REGISTERED", "CONTROLLED"]
+ }
+
The Beacon in the example is in development status, returns boolean answers by default, and has queries available in any of the access levels.
+ + + + + + +While the Beacon v1 response was restricted to aggregate data and Beacon v2 itself provides +schemas for structuring response objects (e.g. henomic variation or biosample data) +the protocol can be expanded by providing custom access methods to data elements +matched by a Beacon query. Since November 2018, Beacon v1.n has included support for a "handover" protocol, +in which rich data content can be provided from linked services, initiated through a Beacon query1.
+Typical examples of Handover
use include:
In the following example a minimal boolean response is shown which contains
+a single handover in the general resultsHandovers
list.
{
+ "meta": {
+ ...
+ },
+ "responseSummary": {
+ "exists": true
+ },
+ "resultsHandovers": [
+ {
+ "handoverType": {
+ "id": "EDAM:3016",
+ "label": "VCF"
+ },
+ "url": "https://my.genomeserver.space/data/vcf/grch38/gizsgf8oaoiteowgfdhhpoiuy/variants.vcf",
+ "note": "VCFv4.4 file with sample mapped variants (authentication required)"
+ }
+ ]
+}
+
An early discussion of the topic can e.g. be found in the Beacon developer area on Github. As of 2018-11-13, the handover concept had become part of the code development. ↩
+Important
+As previously described, Beacon v2 is an specification for sharing/discovery of data. Thus, a priori, it has nothing to do with any particular software, database or computer language.
+Two elements are needed to implement (or "light") a Beacon v2:
+In this section we are going to present three implementation options, going from no involvement/delegate to CRG software to full delegate to CRG software.
+Let's say that you have your data organized and structured in a database (e.g. SQL or NoSQL which may or may not have an internal layer to get access to it). Let's also say that you have the resources (and knowledge) to read the "instructions" (i.e., Beacon v2 specification) to build an API on top of your existing solution. If that's your case, then this is the option for you. You are one of what we call Beacon v2 API implementers. We have a few of them already in the Beacon v2 Service Registry:
+bycon
Python stack driving full featured v2 under the Progenetix resourceLet's say that you have a solution to organize your data but you don't have the resources (or knowledge) to implement a Beacon v2 API yourself. In some pilot studies, CRG has been helping individual institutions to build their Beacon v2 API. However, this option is not practical and does not scalate well so you may want to check Option C.
+Let's say that you have your data somewhat structured (you may have Excel files, PDFs, VCFs... or maybe a SQL database, or an EHR solution with phenoclinic information).
+You want to "beaconize" your data to be part of a larger ecosystem, but you're unsure where to start, and/or don't want to invest a lot of resources because you are still unsure if the whole thing will pay off. Well, you're a not alone! Most centers are in this situation. For that reason at CRG we developed the Beacon v2 Reference Implementation.
+Important
+People that download and install B2RI or another pre-packaged solution are named Beacon v2 deployers.
+The Beacon+ implementation - developed in the Python & MongoDB based bycon
project -
+implements an expanding set of Beacon v2 paths for the Progenetix
+resource .
In queries with a complete beaconRequestBody
the type of the delivered data is independent
+of the path and determined in the requestedSchemas
. So far, Beacon+ will compare the first
+of those to its supported responses and provide the results accordingly; it doesn't matter
+if the endpoint was /beacon/biosamples/
or /beacon/variants/
etc.
Below is an example for the standard test "small deletion CNVs in the CDKN2A locus, in gliomas"
+Progenetix test query, here responding with the matched variants. Exchanging the entityType
+entry to
{ "entityType": "biosample", "schema:": "https://progenetix.org/services/schemas/Biosample/"}
would change this to a biosample response. The example ccan be tested by POSTing this as application/json
+to https://progenetix.org/beacon/variants/
or https://progenetix.org/beacon/biosamples/
.
{
+ "$schema":"beaconRequestBody.json",
+ "meta": {
+ "apiVersion": "2.0",
+ "requestedSchemas": [
+ {
+ "entityType": "genomicVariant",
+ "schema:": "https://progenetix.org/services/schemas/genomicVariant"
+ }
+ ]
+ },
+ "query": {
+ "requestParameters": {
+ "datasets": {
+ "datasetIds": ["progenetix"]
+ },
+ "assemblyid": "GRCh38",
+ "referenceName": "9",
+ "start": [21500001, 21975098],
+ "end": [21967753, 22500000],
+ "variantType": "DEL"
+ }
+ },
+ "filters": [
+ { "id": "NCIT:C3058", "includeDescendantTerms": true }
+ ]
+}
+
/
¶The root path provides the standard BeaconInfoResponse
.
/filtering_terms
¶/filtering_terms/
¶/biosamples
¶/biosamples/
+ query¶/biosamples/{id}/
¶/biosamples/?testMode=true
¶/biosamples/{id}/g_variants
¶/individuals
¶/individuals
+ query¶/individuals
+ query + requestedSchema=phenopacket
¶Progenetix provides phenopacket
as (currently experimental) alternative schema (requestedSchema
) for /individuals
.
+This feature allows the combined delivery of attributes annotated w/ the biosamples and such general of the individual, as well as
+e.g. linking to genomic variation data.
/individuals/{id}
¶/individuals/?testMode=true
¶/individuals/{id}/g_variants
¶/g_variants
¶There is currently (April 2021) still some discussion about the implementation and naming
+of the different types of genomic variant endpoints. Since the Progenetix collections
+follow a "variant observations" principle all variant requests are directed against
+the local variants
collection.
variants
is used as alias.
/g_variants?testMode=true
¶/g_variants
+ query¶/g_variants/{id}
¶/g_variants/{id}/biosamples
¶/analyses
¶The Beacon v2 /analyses
endpoint accesses the Progenetix callsets
collection
+documents, i.e. information about the genomic variants derived from a single
+analysis. In Progenetix the main use of these documents is the storage of e.g.
+CNV statistics or binned genome calls.
/callsets
is an alias in Progenetix
/analyses?testMode=true
¶/analyses
+ query¶variants_in_sample
)/testMode
example/map
endpoint (incomplete/unser construction)/configuration
endpoint (incomplete/unser construction)/filteringTerms
endpoint to v2b4datasets
parameter as objectresponse_summary
response
root element & direct use of result_sets
entityType
format fixedfilters
now objectsvariants_interpretations
exampleresultSets
response formatbycon
backend/analyses
BeaconInfoResponse
Beacon v2 is a protocol and specification established by the Global Alliance for Genomics and Health (GA4GH) that defines an open standard for the +discovery of genomic (and phenoclinic) data in biomedical research and clinical applications. +Beacon facilitates the discovery of genomic variants and biomedical +data in single or distributed resources with the goal to empower federated data +models - i.e. the discovery (and potential retrieval) of data from different +organisational and geographic locations.
+ + + +The Beacon specification is developed by an international team of sientists and +technology experts, as a product of the GA4GH Discovery work stream +and with major support from the European bioinformatics infrastructure organization +ELIXIR.
+The current version of the protocol is Beacon v2 represents a complete revision of +the original code base and introduced a number of powerful new features which were +considered important by the community such as:
+Move to Beacon v2!
+On 2022-04-21 Beacon v2 has been approved as an official GA4GH standard through the GA4GH steering committee.
+With the release of Beacon v2 implementations of v1 and earlier are not longer supported. +Deployers of Beacon instances or networks are advised to migrate to v2 of the +standard. The functionality of Beacon v1 can be easily implemented in v2.
+This website represents information about the Beacon protocol, its use for data +discovery and data delivery but also about ways towards +its implementation to "beaconize" genomics datasets and resources as well as discussions +of the technical details of the Beacon framework and data model.
+Additional information about the Beacon project - including news, events, publications - is available +through the separate website at beacon-project.io.
+Historical Tip
+Originally, the Beacon protocol (versions 0 and 1) allowed researchers to get information about the presence/absence of a given, specific, genomic mutation in a set of data, from patients of a given disease or from the population in general. Early versions of Beacon did not support +query parameters beyond genomic variations and did provide ways for the optional +retrieval of matched recors.
+Beacon v2 consists of two components, the Framework and the Models.
+The Framework contains the format for the requests and responses, whereas the Models define the structure of the biological data response. The overall function of these components is to provide the instructions to design a REST API (REpresentational State Transfer Application Programming Interface) with OpenAPI Specification (OAS). The OAS defines a standard, language-agnostic interface that is used by software developers to implement REST APIs.
+Framework interdependency, releases and alternative models
+In principle, this dual system allows for different Models (in other domains outside of the Beacon v2 realm, e.g. "Imaging Beacon" to be built using the same Framework. However, in the current context of Beacon v2, we consider the two elements interdependent and likely to be updated together for subsequent major versions (e.g. from v2 to v3).
+The Beacon documentation provides information for different types of users, +depending on their interests and use cases. Although those will overlap, we highlight +information relevant for some general scenarios throughout the documentation.
+A Beacon user (or end-user) is interested in querying Beacon instances and networks, either through +web interfaces by using the Beacon API. While users of Beacon web forms in principle +do not need to understand the underlying query syntax and response formats they too may +benefit from some insights into the general capabilities of the underlying protocol.
+User
+A Beacon Deployer is someone who wants to make their genomics resource accessible +through the Beacon protocol, without necessarily being interested or experienced in the +computational aspects; while a Beacon Implementer provides the technical expertise (and +potentially may get involved with Beacon development itself, e.g. to extend the protocol +for novel use cases).
+Deployer
+Beacon v2 Models
+Reference Implementation Link
+Implementer
+ +Stakeholder
+The GA4GH Beacon specification is composed by two parts:
+ +The Beacon Framework (in Framework repo ) is the part that describes the overall structure of the API requests, responses, parameters, the common components, etc. It could also be referred in this document as simply the Framework.
+Beacon Models (in the Models repo ) describes the set of concepts included in a Beacon version (e.g. Beacon v2), like individual or biosample, and also the relationships between them. It could also be referred in this document as simply the Model.
+The Framework could be considered the syntax and the Model as the semantics.
+Refer to the Framework for further information about the Framework and its parts.
+A beacon instance is just an implementation of a Beacon Model that follows the rules stated by the Beacon Framework.
+Beacon default model vs. beacon instances
+While the Beacon default model provides templates for responses and formats for uniform data delivery +- especially for networked beacons - it does not prescribe how data should be organised in individual +instances or what schemas should be used for local storage.
+If you are a Beacon implementer, then, you don't need to clone the Framework repo, you only need to copy (or clone) the Beacon Model and modify it to your specific case. You will find plenty of references to the Framework in the Model copy, and you will use the Json schemas there to validate that both the structure of your requests and responses are compliant with the Beacon Framework. The Framework is not used to check the schema in the responses payload (e.g. the actual details of a biosample of a cohort). The schemas for that are included in the Model that you should have copied.
+ +The above entities are defined as follows;
+Beacon v1 Model: Repo
+Provided as an example for Beacon v1 implementers that want to update to Beacon v2 but not planning to add any additional entry type to their Beacon.
+Although a Beacon can be instantiated as stand-alone solution Many Beacon instances +will be part of managed networks, e.g. multi-institunional projects where individual +beacons are combined through a single interface. Additionally, open beacon instances may +be accessed from aggregators which can register these resources, federate queries +and aggregate the responses, possibly without any direct support from the instances' +maintainers.
+Beacon Networks
+... are collections of multiple beacon instances - possibly from different institutions +or providers. Beacon networks rely on some sort of central service managing the integration +of nodes and provide a unified access through a customized interface and possibly with active alignment of the +instances' features (such as harmonized filtering terms). One may think of a +beacon network as a "managed aggregator" with some active alignment of the individual +resources.
+Beacon Aggregator
+... provides a single interface and API for accessing multiple Beacon instances where +the individual beacons may not necessarily be harmonized (or even aware of their +integration through the aggregator). An aggregator may include functionality to +remap requests and responses for beacons with e.g. different versions or such using +different standards (genome editions, ontology terms...).
+The Beacon framework includes several features aimed to be consumed by Beacon network +aggregators. For example, a Beacon endpoint declares which entities are implemented in +that particular instance, which filtering terms are being supported or the URL endpoints through which +different entities (such as biosamples or genomic variants) can be queried.
+ +In addition to genomic variation queries with Boolean responses +the Beacon v2 protocol permits the implementers to support different types of +entities (e.g. biosample and analysis data) both to be queried against and to be +returned in Beacon responses - so a request may retrieve information about the +samples in which an indicated genomic variant had been found or information about +technical parameters used to detect such a variant.
+However, individual beacons will have different profiles regarding the supported +parameters, supported entities or the filtering terms recognized. Here, a number +of information endpoints allow the profiling of beacons which is especially important +when designing Beacon networks and aggregating their responses.
+Filters represent a powerful way to query various features
+in beacon entities. When designing a network of multiple beacons the
+filtering_terms
informational endpoints
+can be utilized to e.g. implement translators for harmonizing the possibly differing
+terms used in the individual Beacon instances.
TBD
+ + + + + + +The Beacon registry server, hosted through the European Genome-Phenome Archive, monitors +a number of implementations of the Beacon v2 protocol by various organisations actively involved +in Beacon protocol development.
+The Progenetix database and cancer genomic information resource contains genome profiles +of more than 140000 individual cancer genome screening experiments, with the majority +representing results from genomic copy number assessment studies. With its +Beacon+ forward-looking test +implementation, since 2016 Progenetix has been developing concepts for Beacon protocol extensions +such as CNV query options or handover data delivery.
+bycon
Python-based full stack API / middleware (documentation here)progenetix-web
React based front-end (modular for Beacon instances as well as the whole Progenetix UI)Find below some tips to get you started:
+ + + + + + + +A beacon instance will allow to retrieve data - in contrast to the aggregated
+boolean and count responses - if it supports record
granularity. The type of
+document(s) is selected either through the REST path
+or by specifying the entity through the requestedEntityId
.
While any beacon can in principle choose its own data model - and thereby the +schemas of records it supports - for biomedical genomics beacons we recommend the +support of the Beacon default data model
+The Beacon v2 default data model provides a set of schemas for common data entities with +a focus on biomedical genomics (although neither specific to medical application or human genomics per se).
+In contrast to earlier versions of the protocol, the Beacon v2 default models provide +the technical blueprint for rich, structured data responses to Beacon queries, such as +annotated genomic variations, biosamples from which matched variants were retrieved +or data about individuals and study cohorts, where available and authorized.
+Detailed information is available through the Models Introduction +and the default schemas documented from there.
+This example is a single biosample response, e.g. as the result of a REST path
+call (.../biosamples/{id}/
). The response just demonstrates some of the available
+biosample parameters and removes some technical/meta information for clarity.
+Also, the sample contains fields which are not defined in the default
+schema (such as icdoMorphology
...); but although the use of custom fields is discouraged to
+enhance interoperability, the use of additionalProperties
is allowed so the
+data itself remains schema conform.
{
+ "meta": {
+ "apiVersion": "v2.0.0",
+ "beaconId": "org.progenetix",
+ "receivedRequestSummary": {
+ ...
+ },
+ "returnedGranularity": "record",
+ "returnedSchemas": [
+ {
+ "entityType": "biosample",
+ "schema": "https://progenetix.org/services/schemas/biosample/"
+ }
+ ],
+ },
+ "responseSummary": {
+ "exists": true,
+ "numTotalResults": 1
+ },
+ "response": {
+ "resultSets": [
+ {
+ "exists": true,
+ "setType": "dataset",
+ "id": "progenetix",
+ "resultsCount": 1,
+ "results": [
+ {
+ "id": "pgxbs-kftvi9i0",
+ "individualId": "pgxind-kftvi9i0",
+ "notes": "Primary Tumor",
+ "biosampleStatus": {
+ "id": "EFO:0009656",
+ "label": "neoplastic sample"
+ },
+ "collectionMoment": "P44Y1M24D",
+ "sampleOriginType": {
+ "id": "OBI:0001479",
+ "label": "specimen from organism"
+ },
+ "dataUseConditions": {
+ "id": "DUO:0000004",
+ "label": "no restriction"
+ },
+ "externalReferences": [
+ {
+ "id": "pgx:TCGA.933b9daf-a5bf-46cf-92b6-5ddd8279919c",
+ "label": "TCGA case_id"
+ },
+ {
+ "id": "pgx:TCGA.TCGA-76-6663",
+ "label": "TCGA submitter_id"
+ },
+ {
+ "id": "pgx:TCGA.005cb7ce-5050-43aa-85ff-cd56ed830535",
+ "label": "TCGA sample_id"
+ },
+ {
+ "id": "pgx:TCGA.GBM",
+ "label": "TCGA GBM project"
+ }
+ ],
+ "histologicalDiagnosis": {
+ "id": "NCIT:C3058",
+ "label": "Glioblastoma"
+ },
+ "icdoMorphology": {
+ "id": "pgx:icdom-94403",
+ "label": "Glioblastoma, NOS"
+ },
+ "icdoTopography": {
+ "id": "pgx:icdot-C71.9",
+ "label": "Brain, NOS"
+ },
+ "pathologicalStage": {
+ "id": "NCIT:C92207",
+ "label": "Stage Unknown"
+ },
+ "sampleOriginDetail": {
+ "id": "UBERON:0000955",
+ "label": "brain"
+ },
+ "updated": "2020-09-10 17:44:04.888000"
+ }
+ ]
+ }
+ ]
+ }
+}
+
In principle, the separation of framework and models allows for different models in domains + outside of the genomics focussed Beacon v2 realm, e.g. “Imaging Beacon”, to be built using the same Framework.
+ + + + + + +While the full power of the Beacon API can be unlocked through the use of structured +queries using JSON serialization ("POST" requests), the majority of common queries can +be implemented through standard query URLs with parameters (GET queries).
+Beacon REST paths in general follow the format
+__APIroot__/__entryType__/{id}/
or
+__APIroot__/__entryType__/{id}/__requestedSchema__
A typical example would e.g. the request to retrieve all genomic variants associated with a biosample
+https://example.com/beacon/api/biosamples/bios-st4582/g_variants
The endpoind paths available for a given Beacon instance are defined in
+__APIroot__/beaconMap/
Github
POST
requests¶In POST
requests queries and metadata are defined in JSON objects as specified
+in the model supported by the Beacon instance. For more information see
GET
queries¶By default the Beacon model supports a limited set of query parameters, most notably +such addressing genomic variations. Examples can be found in the Genomic Queries +documentation and in the requests section of the default model.
+GET
queries¶Several of the common query parameters have a multiple value option, i.e. are
+assumed to be lists. A typical use case here would be the construction of Bracket Queries
+which use 2 of each start
and end
values.
,
separator for list values in GET
Due to the problem of some web frameworks with the interpretation of multiple +values for the same parameter we recommend the consistant use of a single +parameter name and comma-concatenated values.
+&start=1234000&start=5234000
&start=1234000,5234000
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
aligner | +Reference to mapping/alignment software | +string | +NA | +bwa-0.7.8 | +NA | +
analysisDate | +Date at which analysis was performed. | +string | +NA | +2021-10-17 | +NA | +
biosampleId | +Reference to the id of the biosample this analysis is reporting on. |
+string | +NA | +S0001 | +NA | +
id | +Analysis reference ID (external accession or internal ID) | +string | +NA | +NA | +NA | +
individualId | +Reference to the id of the individual this analysis is reporting on. |
+string | +NA | +P0001 | +NA | +
info | +Placeholder to allow the Beacon to return any additional information that is necessary or could be of interest in relation to the query or the entry returned. It is recommended to encapsulate additional informations in this attribute instead of directly adding attributes at the same level than the others in order to avoid collision in the names of attributes in future versions of the specification. | +object | +NA | +NA | +NA | +
pipelineName | +Analysis pipeline and version if a standardized pipeline was used | +string | +NA | +Pipeline-panel-0001-v1 | +NA | +
pipelineRef | +Link to Analysis pipeline resource | +string | +NA | +doi.org/10.48511/workflowhub.workflow.111.1 | +NA | +
runId | +Run identifier (external accession or internal ID). | +string | +NA | +SRR10903401 | +NA | +
variantCaller | +Reference to variant calling software / pipeline | +string | +NA | +GATK4.0 | +NA | +
These are examples extracted directly from the GitHub repository.
+{
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "analysisDate": "2021-10-17",
+ "id": "analyses-example-0001",
+ "pipelineName": "Pipeline-panel-0001-v1"
+}
+
{
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "aligner": "bwa-0.7.8",
+ "analysisDate": "2021-10-17",
+ "biosampleId": "S0001",
+ "id": "analyses-example-0001",
+ "individualId": "P0001",
+ "pipelineName": "Pipeline-panel-0001-v1",
+ "pipelineRef": "https://doi.org/10.48511/workflowhub.workflow.111.1",
+ "runId": "SRR10903401",
+ "variantCaller": "GATK4.0"
+}
+
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
biosampleStatus | +Ontology value from Experimental Factor Ontology (EFO) Material Entity term (BFO:0000040). Classification of the sample in abnormal sample (EFO:0009655) or reference sample (EFO:0009654). | +object | +id, label | +[{"id": "EFO:0009654", "label": "reference sample"}, {"id": "EFO:0009655", "label": "abnormal sample"}, {"id": "EFO:0009656", "label": "neoplastic sample"}, {"id": "EFO:0010941", "label": "metastasis sample"}, {"id": "EFO:0010942", "label": "primary tumor sample"}, {"id": "EFO:0010943", "label": "recurrent tumor sample"}] |
+NA | +
collectionDate | +Date of biosample collection in ISO8601 format. | +string | +NA | +2021-04-23 | +NA | +
collectionMoment | +Individual's or cell cullture age at the time of sample collection in the ISO8601 duration format P[n]Y[n]M[n]DT[n]H[n]M[n]S . |
+string | +NA | +P32Y6M1D, P7D | +NA | +
diagnosticMarkers | +NA | +array | +id, label | +NA | +NA | +
histologicalDiagnosis | +Disease diagnosis that was inferred from the histological examination. RECOMMENDED. | +object | +id, label | +[{"id": "NCIT:C3778", "label": "Serous Cystadenocarcinoma"}] |
+NA | +
id | +Biosample identifier (external accession or internal ID). | +string | +NA | +S0001 | +NA | +
individualId | +Reference to the individual from which that sample was obtained. | +string | +NA | +P0001 | +NA | +
info | +Placeholder to allow the Beacon to return any additional information that is necessary or could be of interest in relation to the query or the entry returned. It is recommended to encapsulate additional informations in this attribute instead of directly adding attributes at the same level than the others in order to avoid collision in the names of attributes in future versions of the specification. | +object | +NA | +NA | +NA | +
measurements | +Definition of a measurement class. Provenance: GA4GH Phenopackets v2 Measurement |
+array | +assayCode, date, measurementValue, notes, observationMoment, procedure | +NA | +NA | +
notes | +Any relevant info about the biosample that does not fit into any other field in the schema. | +string | +NA | +Some free text | +NA | +
obtentionProcedure | +Ontology value from NCIT Intervention or Procedure ontology term (NCIT:C25218) describing the procedure for sample obtention, e.g. NCIT:C15189 (biopsy). | +object | +ageAtProcedure, bodySite, dateOfProcedure, procedureCode | +[{"code": {"id": "NCIT:C15189", "label": "biopsy"}}, {"code": {"id": "NCIT:C157179", "label": "FGFR1 Mutation Analysis"}}] |
+NA | +
pathologicalStage | +Pathological stage, if applicable, preferably as subclass of NCIT:C28108 - Disease Stage Qualifier. RECOMMENDED. | +object | +id, label | +[{"id": "NCIT:C27977", "label": "Stage IIIA"}] |
+NA | +
pathologicalTnmFinding | +NA | +array | +id, label | +[{"id": "NCIT:C48725", "label": "T2a Stage Finding"}, {"id": "NCIT:C48709", "label": "N1c Stage Finding"}, {"id": "NCIT:C48699", "label": "M0 Stage Finding"}] |
+NA | +
phenotypicFeatures | +Used to describe a phenotype that characterizes the subject or biosample. | +array | +evidence, excluded, featureType, modifiers, notes, onset, resolution, severity | +NA | +NA | +
sampleOriginDetail | +Tissue from which the sample was taken or sample origin matching the category set in 'sampleOriginType'. Value from Uber-anatomy ontology (UBERON) or BRENDA tissue / enzyme source (BTO), Ontology for Biomedical Investigations (OBI) or Cell Line Ontology (CLO), e.g. 'cerebellar vermis' (UBERON:0004720), 'HEK-293T cell' (BTO:0002181), 'nasopharyngeal swab specimen' (OBI:0002606), 'cerebrospinal fluid specimen' (OBI:0002502). | +object | +id, label | +[{"id": "UBERON:0000474", "label": "female reproductive system"}, {"id": "BTO:0002181", "label": "HEK-293T cell"}, {"id": "OBI:0002606", "label": "nasopharyngeal swab specimen"}] |
+NA | +
sampleOriginType | +Category of sample origin. Value from Ontology for Biomedical Investigations (OBI) material entity (BFO:0000040) ontology, e.g. 'specimen from organism' (OBI:0001479),'xenograft' (OBI:0100058), 'cell culture' (OBI:0001876) | +object | +id, label | +[{"id": "OBI:0001479", "label": "specimen from organism"}, {"id": "OBI:0001876", "label": "cell culture"}, {"id": "OBI:0100058", "label": "xenograft"}] |
+NA | +
sampleProcessing | +Status of how the specimen was processed,e.g. a child term of EFO:0009091. | +object | +id, label | +[{"id": "EFO:0009129", "label": "mechanical dissociation"}] |
+NA | +
sampleStorage | +Status of how the specimen was stored. | +object | +id, label | ++ | NA | +
tumorGrade | +Term representing the tumor grade. Child term of NCIT:C28076 (Disease Grade Qualifier) or equivalent. | +object | +id, label | +[{"id": "NCIT:C28080", "label": "Grade 3a"}] |
+NA | +
tumorProgression | +Tumor progression category indicating primary, metastatic or recurrent progression. Ontology value from Neoplasm by Special Category ontology (NCIT:C7062), e.g. NCIT:C84509 (Primary Malignant Neoplasm). | +object | +id, label | +[{"id": "NCIT:C84509", "label": "Primary Malignant Neoplasm"}, {"id": "NCIT:C4813", "label": "Recurrent Malignant Neoplasm"}] |
+NA | +
These are examples extracted directly from the GitHub repository.
+{
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "biosampleStatus": {
+ "id": "EFO:0009655",
+ "label": "abnormal sample"
+ },
+ "id": "sample-example-0001",
+ "sampleOriginType": {
+ "id": "UBERON:0000474",
+ "label": "female reproductive system"
+ }
+}
+
{
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "biosampleStatus": {
+ "id": "EFO:0009655",
+ "label": "abnormal sample"
+ },
+ "collectionDate": "2020-09-11",
+ "collectionMoment": "P32Y6M1D",
+ "id": "sample-example-0001",
+ "obtentionProcedure": {
+ "procedureCode": {
+ "id": "OBI:0002654",
+ "label": "needle biopsy"
+ }
+ },
+ "sampleOriginType": {
+ "id": "UBERON:0000992",
+ "label": "ovary"
+ }
+}
+
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
cohortDataTypes | +NA | +array | +id, label | +[{"id": "OGMS:0000015", "label": "clinical history"}, {"id": "OBI:0000070", "label": "genotyping assay"}, {"id": "OMIABIS:0000060", "label": "survey data"}] |
+NA | +
cohortDesign | +Cohort type by its design. A plan specification comprised of protocols (which may specify how and what kinds of data will be gathered) that are executed as part of an investigation and is realized during a study design execution. Value from Ontologized MIABIS (OMIABIS) Study design ontology term tree (OBI:0500000). | +object | +id, label | +[{"id": "OMIABIS:0001017", "label": "case control study design"}, {"id": "OMIABIS:0001019", "label": "longitudinal study design"}, {"id": "OMIABIS:0001024", "label": "twin study design"}] |
+NA | +
cohortSize | +Count of unique Individuals in cohort (individuals meeting criteria for user-defined cohorts). If not previously known, it could be calculated by counting the individuals in the cohort. |
+integer | +NA | +14765, 20000 | +NA | +
cohortType | +Cohort type by its definition. If a cohort is declared study-defined or beacon-defined criteria are to be entered in cohort_inclusion_criteria ; if a cohort is declared user-defined cohort_inclusion_criteria could be automatically populated from the parameters used to perform the query. |
+string | +NA | +NA | +study-defined, beacon-defined, user-defined | +
collectionEvents | +TBD | +array | +eventAgeRange, eventCases, eventControls, eventDataTypes, eventDate, eventDiseases, eventEthnicities, eventGenders, eventLocations, eventNum, eventPhenotypes, eventSize, eventTimeline | +NA | +NA | +
exclusionCriteria | +Exclusion criteria used for defining the cohort. It is assumed that NONE of the cohort participants will match such criteria. | +object | +ageRange, diseaseConditions, ethnicities, genders, locations, phenotypicConditions | +NA | +NA | +
id | +Cohort identifier. For study-defined or beacon-defined cohorts this field is set by the implementer. For user-defined this unique identifier could be generated upon the query that defined the cohort, but could be later edited by the user. |
+string | +NA | +cohort-T2D-2010 | +NA | +
inclusionCriteria | +Inclusion criteria used for defining the cohort. It is assumed that all cohort participants will match such criteria. | +object | +ageRange, diseaseConditions, ethnicities, genders, locations, phenotypicConditions | +NA | +NA | +
name | +Name of the cohort. For user-defined this field could be generated upon the query, e.g. a value that is a concatenationor some representation of the user query. |
+string | +NA | +Wellcome Trust Case Control Consortium, GCAT Genomes for Life | +NA | +
These are examples extracted directly from the GitHub repository.
+{
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "cohortType": "study-defined",
+ "id": "cohort0001",
+ "name": "GCAT Genomes for Life"
+}
+
{
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "cohortDataTypes": [
+ {
+ "id": "OGMS:0000015",
+ "label": "clinical history"
+ },
+ {
+ "id": "OBI:0000070",
+ "label": "genotyping assay"
+ },
+ {
+ "id": "OMIABIS:0000060",
+ "label": "survey data"
+ }
+ ],
+ "cohortDesign": {
+ "id": "OMIABIS:0001019",
+ "label": "longitudinal study design"
+ },
+ "cohortSize": 20000,
+ "cohortType": "study-defined",
+ "id": "cohort0001",
+ "inclusionCriteria": {
+ "ageRange": {
+ "end": {
+ "iso8601duration": "P40Y"
+ },
+ "start": {
+ "iso8601duration": "P18Y"
+ }
+ },
+ "genders": [
+ {
+ "id": "NCIT:C16576",
+ "label": "female"
+ },
+ {
+ "id": "NCIT:C20197",
+ "label": "male"
+ }
+ ],
+ "locations": [
+ {
+ "id": "GAZ:00004501",
+ "label": "Catalonia Autonomous Community"
+ }
+ ]
+ },
+ "name": "GCAT Genomes for Life"
+}
+
{
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "cohortDataTypes": [
+ {
+ "id": "OGMS:0000015",
+ "label": "clinical history"
+ },
+ {
+ "id": "OBI:0000070",
+ "label": "genotyping assay"
+ },
+ {
+ "id": "OMIABIS:0000060",
+ "label": "survey data"
+ }
+ ],
+ "cohortDesign": {
+ "id": "OMIABIS:0001019",
+ "label": "longitudinal study design"
+ },
+ "cohortSize": 20000,
+ "cohortType": "study-defined",
+ "collectionEvents": [
+ {
+ "eventDataTypes": {
+ "availability": true,
+ "distribution": {
+ "dataTypes": {
+ "blood collected from fasting subject": 51,
+ "survey data": 98
+ }
+ }
+ },
+ "eventDate": "2019-04-23",
+ "eventEthnicities": {
+ "availability": true,
+ "availabilityCount": 101,
+ "distribution": {
+ "ethnicities": {
+ "African": 3,
+ "European": 90,
+ "Latin American": 8
+ }
+ }
+ },
+ "eventGenders": {
+ "availability": true,
+ "availabilityCount": 101,
+ "distribution": {
+ "genders": {
+ "female": 51,
+ "male": 50
+ }
+ }
+ },
+ "eventNum": 1,
+ "eventSize": 101
+ }
+ ],
+ "id": "cohort0001",
+ "inclusionCriteria": {
+ "ageRange": {
+ "end": {
+ "iso8601duration": "P40Y"
+ },
+ "start": {
+ "iso8601duration": "P18Y"
+ }
+ },
+ "genders": [
+ {
+ "id": "NCIT:C16576",
+ "label": "female"
+ },
+ {
+ "id": "NCIT:C20197",
+ "label": "male"
+ }
+ ],
+ "locations": [
+ {
+ "id": "GAZ:00004501",
+ "label": "Catalonia Autonomous Community"
+ }
+ ]
+ },
+ "name": "GCAT Genomes for Life"
+}
+
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
createDateTime | +The time the dataset was created (ISO 8601 format) | +string | +NA | +2017-01-17T20:33:40Z | +NA | +
dataUseConditions | +Data use conditions applying to this dataset. | +object | +duoDataUse | +NA | +NA | +
description | +Description of the dataset | +string | +NA | +This dataset provides examples of the actual data in this Beacon instance. | +NA | +
externalUrl | +URL to an external system providing more dataset information (RFC 3986 format). | +string | +NA | +example.org/wiki/Main_Page | +NA | +
id | +Unique identifier of the dataset | +string | +NA | +ds01010101 | +NA | +
info | +Placeholder to allow the Beacon to return any additional information that is necessary or could be of interest in relation to the query or the entry returned. It is recommended to encapsulate additional informations in this attribute instead of directly adding attributes at the same level than the others in order to avoid collision in the names of attributes in future versions of the specification. | +object | +NA | +NA | +NA | +
name | +Name of the dataset | +string | +NA | +Dataset with synthetic data | +NA | +
updateDateTime | +The time the dataset was updated in (ISO 8601 format) | +string | +NA | +2017-01-17T20:33:40Z | +NA | +
version | +Version of the dataset | +string | +NA | +v1.1 | +NA | +
These are examples extracted directly from the GitHub repository.
+{
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "createDateTime": "2017-01-17T20:33:40Z",
+ "dataUseConditions": {
+ "duoDataUse": [
+ {
+ "id": "DUO:0000007",
+ "label": "disease specific research",
+ "modifiers": [
+ {
+ "id": "EFO:0001645",
+ "label": "coronary artery disease"
+ }
+ ],
+ "version": "17-07-2016"
+ }
+ ]
+ },
+ "description": "This dataset provides examples of the actual data in this Beacon instance.",
+ "externalUrl": "https://example.org/wiki/Main_Page",
+ "id": "ds01010101",
+ "name": "Dataset with synthetic data",
+ "updateDateTime": "2017-01-17T20:33:40Z",
+ "version": "v1.1"
+}
+
{
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "id": "ds01010101",
+ "name": "Dataset with synthetic data"
+}
+
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
caseLevelData | ++ | array | +alleleOrigin, analysisId, biosampleId, clinicalInterpretations, id, individualId, phenotypicEffects, runId, zygosity | +NA | +NA | +
frequencyInPopulations | +NA | +array | +frequencies, source, sourceReference, version | +NA | +NA | +
identifiers | +NA | +object | +clinvarVariantId, genomicHGVSId, proteinHGVSIds, transcriptHGVSIds, variantAlternativeIds | +NA | +NA | +
molecularAttributes | +NA | +object | +aminoacidChanges, geneIds, genomicFeatures, molecularEffects | +NA | +NA | +
variantInternalId | +Reference to the internal variant ID. This represents the primary key/identifier of that variant inside a given Beacon instance. Different Beacon instances may use identical id values, referring to unrelated variants. Public identifiers such as the GA4GH Variant Representation Id (VRSid) MUST be returned in the identifiers section. A Beacon instance can, of course, use the VRSid as their own internal id but still MUST represent this then in the identifiers section. |
+string | +NA | +var00001, v110112 | +NA | +
variantLevelData | +NA | +object | +clinicalInterpretations, phenotypicEffects | +NA | +NA | +
variation | +NA | +oneOf | +LegacyVariation, MolecularVariation, SystemicVariation | +NA | +NA | +
These are examples extracted directly from the GitHub repository.
+{
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "variantInternalId": "GRCh37-1-55505652-G-A",
+ "variation": {
+ "alternateBases": "A",
+ "location": {
+ "interval": {
+ "end": {
+ "type": "Number",
+ "value": 5505653
+ },
+ "start": {
+ "type": "Number",
+ "value": 5505652
+ },
+ "type": "SequenceInterval"
+ },
+ "sequence_id": "refseq:NC_000001.10",
+ "type": "SequenceLocation"
+ },
+ "variantType": "SNP"
+ }
+}
+
{
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "variantInternalId": "GRCh37-1-55505652-G-A",
+ "variation": {
+ "location": {
+ "interval": {
+ "end": {
+ "type": "Number",
+ "value": 5505653
+ },
+ "start": {
+ "type": "Number",
+ "value": 5505652
+ },
+ "type": "SequenceInterval"
+ },
+ "sequence_id": "refseq:NC_000001.10",
+ "type": "SequenceLocation"
+ },
+ "state": {
+ "sequence": "A",
+ "type": "SequenceState"
+ },
+ "type": "Allele"
+ }
+}
+
{
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "frequencyInPopulations": [
+ {
+ "frequencies": [
+ {
+ "alleleFrequency": 2.939e-05,
+ "population": "European (non-Finish)"
+ },
+ {
+ "alleleFrequency": 0,
+ "population": "Other"
+ }
+ ],
+ "source": "gnomaD Genomes",
+ "sourceReference": "https://gnomad.broadinstitute.org/",
+ "version": "v3.1.1"
+ },
+ {
+ "frequencies": [
+ {
+ "alleleFrequency": 9e-05,
+ "population": "Total"
+ },
+ {
+ "alleleFrequency": 6e-05,
+ "population": "European"
+ },
+ {
+ "alleleFrequency": 0,
+ "population": "African"
+ }
+ ],
+ "source": "ALFA",
+ "sourceReference": "https://www.ncbi.nlm.nih.gov/snp/docs/gsr/alfa/",
+ "version": "20201027095038"
+ }
+ ],
+ "identifiers": {
+ "clinVarIds": [
+ "434136",
+ "VCV000440707.6"
+ ],
+ "genomicHGVSId": "NC_000001.11:g.55039979G>A",
+ "proteinHGVSIds": [
+ "NP_777596.2:p.Glu48Lys"
+ ],
+ "transcriptHGVSIds": [
+ "NM_174936.4:c.142G>A"
+ ],
+ "variantAlternativeIds": [
+ "dbSNP:rs3975092470",
+ "ClinGen: CA340482854"
+ ]
+ },
+ "molecularAttributes": {
+ "aminoacidChanges": [
+ "E48K"
+ ],
+ "geneIds": [
+ "PCSK9",
+ "LRG_275"
+ ],
+ "molecularEffects": [
+ {
+ "id": "ENSGLOSSARY:0000150",
+ "label": "Missense variant"
+ }
+ ]
+ },
+ "variantInternalId": "var123",
+ "variantLevelData": {
+ "clinicalInterpretations": [
+ {
+ "category": {
+ "id": "MONDO:0000001",
+ "label": "disease or disorder"
+ },
+ "clinicalRelevance": "pathogenic",
+ "conditionId": "famchol1",
+ "effect": {
+ "id": "MONDO:0007750",
+ "label": "Familial hypercholesterolemia 1"
+ }
+ },
+ {
+ "category": {
+ "id": "MONDO:0000001",
+ "label": "disease or disorder"
+ },
+ "clinicalRelevance": "uncertain significance",
+ "conditionId": "famchol3",
+ "effect": {
+ "id": "MONDO:0011369",
+ "label": "hypercholesterolemia, autosomal dominant, 3"
+ }
+ }
+ ]
+ },
+ "variation": {
+ "alternateBases": "A",
+ "location": {
+ "interval": {
+ "end": {
+ "type": "Number",
+ "value": 55039980
+ },
+ "start": {
+ "type": "Number",
+ "value": 55039979
+ },
+ "type": "SequenceInterval"
+ },
+ "sequence_id": "refseq:NC_000001.11",
+ "type": "SequenceLocation"
+ },
+ "referenceBases": "G",
+ "variantType": "SNP"
+ }
+}
+
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
diseases | +Diseases diagnosed e.g. to an individual, defined by diseaseID, age of onset, stage, level of severity, outcome and the presence of family history. Similarities to GA4GH Phenopackets v2 Disease |
+array | +ageOfOnset, diseaseCode, familyHistory, notes, severity, stage | +NA | +NA | +
ethnicity | +Ethnic background of the individual. Value from NCIT Race (NCIT:C17049) ontology term descendants, e.g. NCIT:C126531 (Latin American). A geographic ancestral origin category that is assigned to a population group based mainly on physical characteristics that are thought to be distinct and inherent. [ NCI ] | +object | +id, label | +[{"id": "NCIT:C42331", "label": "African"}, {"id": "NCIT:C41260", "label": "Asian"}, {"id": "NCIT:C126535", "label": "Australian"}, {"id": "NCIT:C43851", "label": "European"}, {"id": "NCIT:C77812", "label": "North American"}, {"id": "NCIT:C126531", "label": "Latin American"}, {"id": "NCIT:C104495", "label": "Other race"}] |
+NA | +
exposures | +Exposures (lifestyle, behavioural exposures) occurred to individual, defined by exposure ID, date and age of onset, dose, and duration. | +array | +ageAtExposure, date, duration, exposureCode, unit, value | +NA | +NA | +
geographicOrigin | +Individual's country or region of origin (birthplace or residence place regardless of ethnic origin). Value from GAZ Geographic Location ontology (GAZ:00000448), e.g. GAZ:00002459 (United States of America). | +object | +id, label | +[{"id": "GAZ:00002955", "label": "Slovenia"}, {"id": "GAZ:00002459", "label": "United States of America"}, {"id": "GAZ:00316959", "label": "Municipality of El Masnou"}, {"id": "GAZ:00000460", "label": "Eurasia"}] |
+NA | +
id | +Individual identifier (internal ID). | +string | +NA | +P0001 | +NA | +
info | +Placeholder to allow the Beacon to return any additional information that is necessary or could be of interest in relation to the query or the entry returned. It is recommended to encapsulate additional informations in this attribute instead of directly adding attributes at the same level than the others in order to avoid collision in the names of attributes in future versions of the specification. | +object | +NA | +NA | +NA | +
interventionsOrProcedures | +Class describing a clinical procedure or intervention. Provenance: GA4GH Phenopackets v2 Procedure |
+array | +ageAtProcedure, bodySite, dateOfProcedure, procedureCode | +NA | +NA | +
karyotypicSex | +The chromosomal sex of an individual represented from a selection of options. | +string | +NA | +NA | +UNKNOWN_KARYOTYPE, XX, XY, XO, XXY, XXX, XXYY, XXXY, XXXX, XYY, OTHER_KARYOTYPE | +
measures | +Definition of a measurement class. Provenance: GA4GH Phenopackets v2 Measurement |
+array | +assayCode, date, measurementValue, notes, observationMoment, procedure | +NA | +NA | +
pedigrees | +Pedigree studies in which the individual is part of. | +array | +disease, id, members, numSubjects | +NA | +NA | +
phenotypicFeatures | +Used to describe a phenotype that characterizes the subject or biosample. | +array | +evidence, excluded, featureType, modifiers, notes, onset, resolution, severity | +NA | +NA | +
sex | +Sex of the individual. Value from NCIT General Qualifier (NCIT:C27993): 'unknown' (not assessed or not available) (NCIT:C17998), 'female' (NCIT:C16576), or 'male', (NCIT:C20197). | +object | +id, label | +[{"id": "NCIT:C16576", "label": "female"}, {"id": "NCIT:C20197", "label": "male"}, {"id": "NCIT:C1799", "label": "unknown"}] |
+NA | +
treatments | +Treatment(s) prescribed/administered, defined by treatment ID, date and age of onset, dose, schedule and duration. | +array | +ageAtOnset, cumulativeDose, doseIntervals, routeOfAdministration, treatmentCode | +NA | +NA | +
These are examples extracted directly from the GitHub repository.
+{
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "id": "Ind001",
+ "sex": {
+ "id": "NCIT:C16576",
+ "label": "female"
+ }
+}
+
{
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "diseases": [
+ {
+ "ageOfOnset": {
+ "ageGroup": {
+ "id": "NCIT:C49685",
+ "label": "Adult 18-65 Years Old"
+ }
+ },
+ "diseaseCode": {
+ "id": "OMIM:164400",
+ "label": "Spinocerebellar ataxia 1"
+ },
+ "familyHistory": false,
+ "severity": {
+ "id": "HP:0012829",
+ "label": "Profound"
+ },
+ "stage": {
+ "id": "OGMS:0000119",
+ "label": "acute onset"
+ }
+ }
+ ],
+ "ethnicity": {
+ "id": "NCIT:C43851",
+ "label": "European"
+ },
+ "geographicOrigin": {
+ "id": "GAZ:00002955",
+ "label": "Slovenia"
+ },
+ "id": "Ind001",
+ "measures": [
+ {
+ "assayCode": {
+ "id": "LOINC:26515-7",
+ "label": "Platelets [#/volume] in Blood"
+ },
+ "date": "2017-05-03",
+ "measurementValue": {
+ "units": {
+ "id": "NCIT:C103452",
+ "label": "Per Milliliter"
+ },
+ "value": 55345
+ },
+ "observationMoment": {
+ "age": {
+ "iso8601duration": "P55Y8M12D"
+ }
+ }
+ }
+ ],
+ "sex": {
+ "id": "NCIT:C16576",
+ "label": "female"
+ }
+}
+
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
Age | +Age value definition. Provenance: GA4GH Phenopackets v2 Age |
+object | +iso8601duration | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
Complex Value | +Definition of a complex value class. Provenance: GA4GH Phenopackets v2 TypedQuantity |
+object | +typedQuantities | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
CopyNumber | +NA | +allOf | +VRS definition for CopyNumber | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
GestationalAge | +Gestational age (or menstrual age) is the time elapsed between the first day of the last normal menstrual period and the day of delivery. The first day of the last menstrual period occurs approximately 2 weeks before ovulation and approximately 3 weeks before implantation of the blastocyst. Because most women know when their last period began but not when ovulation occurred, this definition traditionally has been used when estimating the expected date of delivery. In contrast, chronological age (or postnatal age) is the time elapsed after birth. Provenance: Phenopackets v2 | +object | +days, weeks | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
LegacyVariation | +NA | +object | +alternateBases, location, referenceBases, variantType | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
SystemicVariation | +NA | +oneOf | +CopyNumber | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
Value | +NA | +oneOf | +Quantity, ontologyTerm | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
affected | +Is the individual affected by the disease in the pedigree? | +boolean | +NA | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
ageAtExposure | +Age value definition. Provenance: GA4GH Phenopackets v2 Age |
+object | +iso8601duration | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
ageAtOnset | +Age value definition. Provenance: GA4GH Phenopackets v2 Age |
+object | +iso8601duration | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
ageAtProcedure | +NA | +oneOf | +Age, AgeRange, GestationalAge, TimeInterval | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
ageOfOnset | +NA | +oneOf | +Age, AgeRange, GestationalAge, TimeInterval | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
aligner | +Reference to mapping/alignment software | +string | +NA | +bwa-0.7.8 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
alleleFrequency | +Allele frequency between 0 and 1. | +number | +NA | +3.186e-05 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
alleleOrigin | +Ontology value for allele origin of variant in sample from the Variant Origin (SO:0001762). Categories are somatic variant , germline variant , maternal variant , paternal variant , de novo variant , pedigree specific variant , population specific variant . Corresponds to Variant Inheritance in FHIR. |
+object | +id, label | +[{"id": "SO:0001777", "label": "somatic variant"}, {"id": "SO:0001778", "label": "germline variant"}, {"id": "SO:0001775", "label": "maternal variant"}, {"id": "SO:0001776", "label": "paternal variant"}, {"id": "SO:0001781", "label": "de novo variant"}, {"id": "SO:0001779", "label": "pedigree specific variant"}, {"id": "SO:0001780", "label": "population specific variant"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
alternateBases | +Alternate bases for this variant (starting from start ). * Accepted values: IUPAC codes for nucleotides (e.g. https://www.bioinformatics.org/sms/iupac.html ). * N is a wildcard, that denotes the position of any base, and can beused as a standalone base of any type or within a partially knownsequence. As example, a query of ANNT the Ns can take take any form of[ACGT] and will match ANNT , ACNT , ACCT , ACGT ... and so forth. an empty value is used in the case of deletions with the maximally trimmed, deleted sequence being indicated in ReferenceBases Categorical variant queries, e.g. such not being represented through sequence & position, make use of the variantType parameter.* Either alternateBases or variantType is required.' |
+string | +NA | +T, G, N, AG, | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
aminoacidChanges | +NA | +array | +NA | +["V304*"] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
analysisDate | +Date at which analysis was performed. | +string | +NA | +2021-10-17 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
analysisId | +Reference to the bioinformatics analysis ID (analysis.id ) |
+string | +NA | +pgxcs-kftvldsu | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
annotatedWith | +NA | +object | +toolName, toolReferences, version | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
availability | +data availability | +boolean | +NA | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
availabilityCount | +Count of individuals with data available | +integer | +NA | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
biosampleId | +Reference to the biosample ID. | +string | +NA | +008dafdd-a3d1-4801-8c0a-8714e2b58e48 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
biosampleStatus | +Ontology value from Experimental Factor Ontology (EFO) Material Entity term (BFO:0000040). Classification of the sample in abnormal sample (EFO:0009655) or reference sample (EFO:0009654). | +object | +id, label | +[{"id": "EFO:0009654", "label": "reference sample"}, {"id": "EFO:0009655", "label": "abnormal sample"}, {"id": "EFO:0009656", "label": "neoplastic sample"}, {"id": "EFO:0010941", "label": "metastasis sample"}, {"id": "EFO:0010942", "label": "primary tumor sample"}, {"id": "EFO:0010943", "label": "recurrent tumor sample"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
caseLevelData | ++ | array | +alleleOrigin, analysisId, biosampleId, clinicalInterpretations, id, individualId, phenotypicEffects, runId, zygosity | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
category | +Ontology term for the type of disease, condition, phenotypic measurement, etc. | +object | +id, label | +[{"id": "MONDO:0000001", "label": "disease or disorder"}, {"id": "HP:0000118", "label": "phenotypic abnormality"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
clinicalInterpretations | +List of annotated effects on disease or phenotypes. | +array | +annotatedWith, category, clinicalRelevance, conditionId, effect, evidenceType | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
clinicalRelevance | +Indication of the clinical relevance of the variant Recommended: A value from the five-tiered classification from the American College of Medical Genetics (ACMG) designed to describe the likelihood that a genomic sequence variant is causative of an inherited disease. (NCIT:C168798). | +string | +NA | +pathogenic | +benign, likely benign, uncertain significance, likely pathogenic, pathogenic | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
clinvarVariantId | +ClinVar variant id. Other id values used by ClinVar can be added to variantAlternativeIds |
+string | +NA | +clinvar:12345, 9325 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
cohortDataTypes | +NA | +array | +id, label | +[{"id": "OGMS:0000015", "label": "clinical history"}, {"id": "OBI:0000070", "label": "genotyping assay"}, {"id": "OMIABIS:0000060", "label": "survey data"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
cohortDesign | +Cohort type by its design. A plan specification comprised of protocols (which may specify how and what kinds of data will be gathered) that are executed as part of an investigation and is realized during a study design execution. Value from Ontologized MIABIS (OMIABIS) Study design ontology term tree (OBI:0500000). | +object | +id, label | +[{"id": "OMIABIS:0001017", "label": "case control study design"}, {"id": "OMIABIS:0001019", "label": "longitudinal study design"}, {"id": "OMIABIS:0001024", "label": "twin study design"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
cohortSize | +Count of unique Individuals in cohort (individuals meeting criteria for user-defined cohorts). If not previously known, it could be calculated by counting the individuals in the cohort. |
+integer | +NA | +14765, 20000 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
cohortType | +Cohort type by its definition. If a cohort is declared study-defined or beacon-defined criteria are to be entered in cohort_inclusion_criteria ; if a cohort is declared user-defined cohort_inclusion_criteria could be automatically populated from the parameters used to perform the query. |
+string | +NA | +NA | +study-defined, beacon-defined, user-defined | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
collectionDate | +Date of biosample collection in ISO8601 format. | +string | +NA | +2021-04-23 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
collectionEvents | +TBD | +array | +eventAgeRange, eventCases, eventControls, eventDataTypes, eventDate, eventDiseases, eventEthnicities, eventGenders, eventLocations, eventNum, eventPhenotypes, eventSize, eventTimeline | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
collectionMoment | +Individual's or cell cullture age at the time of sample collection in the ISO8601 duration format P[n]Y[n]M[n]DT[n]H[n]M[n]S . |
+string | +NA | +P32Y6M1D, P7D | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
conditionId | +Internal identifier of the phenotype or clinical effect. | +string | +NA | +disease1, phen2234 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
createDateTime | +The time the dataset was created (ISO 8601 format) | +string | +NA | +2017-01-17T20:33:40Z | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
cumulativeDose | +Definition of a quantity class. Provenance: GA4GH Phenopackets v2 Quantity |
+object | +referenceRange, unit, value | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
dataUseConditions | +Data use conditions applying to this dataset. | +object | +duoDataUse | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
date | +Date of the exposure in ISO8601 format. | +string | +NA | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
dateOfProcedure | +Date of procedure, in ISO8601 format | +string | +NA | +2010-07-10 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
description | +Description of the dataset | +string | +NA | +This dataset provides examples of the actual data in this Beacon instance. | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
disease | +Diseases diagnosed e.g. to an individual, defined by diseaseID, age of onset, stage, level of severity, outcome and the presence of family history. Similarities to GA4GH Phenopackets v2 Disease |
+object | +ageOfOnset, diseaseCode, familyHistory, notes, severity, stage | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
diseaseCode | +Definition of an ontology term. | +object | +id, label | +[{"id": "HP:0004789", "label": "lactose intolerance"}, {"id": "ICD10CM:E73", "label": "lactose intolerance"}, {"id": "OMIM:164400", "label": "Spinocerebellar ataxia 1"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
diseaseConditions | +Diseases diagnosed e.g. to an individual, defined by diseaseID, age of onset, stage, level of severity, outcome and the presence of family history. Similarities to GA4GH Phenopackets v2 Disease |
+array | +ageOfOnset, diseaseCode, familyHistory, notes, severity, stage | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
diseases | +Diseases diagnosed e.g. to an individual, defined by diseaseID, age of onset, stage, level of severity, outcome and the presence of family history. Similarities to GA4GH Phenopackets v2 Disease |
+array | +ageOfOnset, diseaseCode, familyHistory, notes, severity, stage | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
distribution | +List of categories and results or counts for each category. | +object | ++ | [{"genders": {"female": "51", "male": "50"}}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
doseIntervals | +This element represents a block of time in which the dosage of a medication was constant. For example, to represent a period of 30 mg twice a day for an interval of 10 days, we would use a Quantity element to represent the individual 30 mg dose, and OntologyClass element to represent twice a day, and an Interval element to represent the 10-day interval. Provenance: Phenopackets v2 | +array | +interval, quantity, scheduleFrequency | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
duration | +Exposure duration in ISO8601 format | +string | +NA | +P2Y6M1D | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
effect | +Ontology term for the phenotypic or clinical effect | +object | +id, label | +[{"id": "MONDO:0003582", "label": "hereditary breast ovarian cancer syndrome"}, {"id": "HP:0000256", "label": "macrocephaly"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
end | +Represents age as an ISO8601 duration (e.g., P59Y). | +object | +iso8601duration | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
ethnicities | +Ethnic background of the individual. Recommended is the use of a value from NCIT Race (NCIT:C17049) ontology term descendants, e.g. NCIT:C126531 (Latin American). A geographic ancestral origin category that is assigned to a population group based mainly on physical characteristics that are thought to be distinct and inherent. [ NCI ] | +array | +id, label | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
ethnicity | +Ethnic background of the individual. Value from NCIT Race (NCIT:C17049) ontology term descendants, e.g. NCIT:C126531 (Latin American). A geographic ancestral origin category that is assigned to a population group based mainly on physical characteristics that are thought to be distinct and inherent. [ NCI ] | +object | +id, label | +[{"id": "NCIT:C42331", "label": "African"}, {"id": "NCIT:C41260", "label": "Asian"}, {"id": "NCIT:C126535", "label": "Australian"}, {"id": "NCIT:C43851", "label": "European"}, {"id": "NCIT:C77812", "label": "North American"}, {"id": "NCIT:C126531", "label": "Latin American"}, {"id": "NCIT:C104495", "label": "Other race"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
eventAgeRange | +Individual age range, obtained from individual level info of the cohort members | +object | +availability, availabilityCount, distribution | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
eventCases | +number of cases | +integer | +NA | +543, 20 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
eventControls | +number of controls | +integer | +NA | +1000, 22 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
eventDataTypes | +Aggregated data type information available for each cohort data type as declared in cohortDataTypes , and obtained from individual level info of the cohort members |
+object | +availability, availabilityCount, distribution | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
eventDate | +date of collection event/data point | +string | +NA | +2018-10-01T13:23:45Z, 2019-04-23T09:11:13Z, 2017-01-17T20:33:40Z | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
eventDiseases | +Aggregated information of disease/condition(s) obtained from individual level info of the cohort members | +object | +availability, availabilityCount, distribution | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
eventEthnicities | +Aggregated information of ethnicity obtained from individual level info of the cohort members | +object | +availability, availabilityCount, distribution | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
eventGenders | +Aggregated information of gender(s) obtained from individual level info of the cohort members | +object | +availability, availabilityCount, distribution | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
eventLocations | +Aggregated information of geographic location obtained from individual level info of the cohort members | +object | +availability, availabilityCount, distribution | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
eventNum | +cardinality of the collection event / data point in a series | +integer | +NA | +1, 2, 3, 4 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
eventPhenotypes | +Aggregated information of phenotype(s) obtained from individual level info of the cohort members | +object | +availability, availabilityCount, distribution | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
eventSize | +Count of individuals in cohort at data point (for ´user-defined´ cohorts, this is individuals meeting criteria) obtained from individual level info in database. | +integer | +NA | +1543, 42 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
evidence | +The evidence for an assertion of the observation of a type. RECOMMENDED. | +object | +evidenceCode, reference | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
evidenceType | +Ontology term for the type of evidence supporting variant-disease association Recommended: values from the Evidence & Conclusion Ontology (ECO) | +object | +id, label | +[{"id": "ECO:0000361", "label": "inferential evidence"}, {"id": "ECO:0000006", "label": "experimental evidence"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
excluded | +Flag to indicate whether the phenotypic feature was observed or not. Default is ‘false’, in other words the phenotype was observed. Therefore it is only used in cases where the phenotype was looked for but found to be absent. More formally, this modifier indicates the logical negation of the OntologyClass used in the featureType field. CAUTION: It is imperative to check this field for correct interpretation of the phenotype! Source: Phenopackets v2 |
+boolean | +NA | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
exclusionCriteria | +Exclusion criteria used for defining the cohort. It is assumed that NONE of the cohort participants will match such criteria. | +object | +ageRange, diseaseConditions, ethnicities, genders, locations, phenotypicConditions | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
exposures | +Exposures (lifestyle, behavioural exposures) occurred to individual, defined by exposure ID, date and age of onset, dose, and duration. | +array | +ageAtExposure, date, duration, exposureCode, unit, value | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
externalUrl | +URL to an external system providing more dataset information (RFC 3986 format). | +string | +NA | +example.org/wiki/Main_Page | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
familyHistory | +Boolean indicating determined or self-reported presence of family history of the disease. | +boolean | +NA | +1 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
featureClass | +Ontology term that describes the class of genomic feature affected by the variant. Values from SO (Sequence ontology) are recommended, e.g. SO:0001623: 5 prime UTR variant |
+object | +id, label | +[{"id": "SO:0001623", "label": "5 prime UTR variant"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
featureID | +Where applicable, ID/accession/name of genomic feature related to the featureClass , preferably in CURIE format. If the value is a gene id or name, it points to the gene related to the featureClass , e.g. the 5 prime UTR upstream of TP53 |
+object | +id, label | +[{"id": "HGNC:11998", "label": "TP53"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
featureType | +Definition of an ontology term. | +object | +id, label | +[{"id": "HP:0000002", "label": "Abnormality of body height"}, {"id": "HP:0002006", "label": "Facial cleft"}, {"id": "HP:0012469", "label": "Infantile spasms"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
frequencies | +NA | +array | +alleleFrequency, population | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
frequencyInPopulations | +NA | +array | +frequencies, source, sourceReference, version | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
genders | +Sex of the individual. Recommended values from NCIT General Qualifier (NCIT:C27993): "unknown" (not assessed or not available) - NCIT:C17998; "female" - NCIT:C16576; "male" - NCIT:C20197 | +array | +id, label | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
geneIds | +NA | +array | +NA | +["ACE2"] ,["BRCA1"] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
genomicFeatures | +Genomic feature(s) related to the variant. NOTE: Although genes could also be referenced using these attributes, they have an independent section to allow direct queries. | +array | +featureClass, featureID | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
genomicHGVSId | +HGVSId descriptor. | +string | +NA | +NC_000017.11:g.43057063G>A | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
geographicOrigin | +Individual's country or region of origin (birthplace or residence place regardless of ethnic origin). Value from GAZ Geographic Location ontology (GAZ:00000448), e.g. GAZ:00002459 (United States of America). | +object | +id, label | +[{"id": "GAZ:00002955", "label": "Slovenia"}, {"id": "GAZ:00002459", "label": "United States of America"}, {"id": "GAZ:00316959", "label": "Municipality of El Masnou"}, {"id": "GAZ:00000460", "label": "Eurasia"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
histologicalDiagnosis | +Disease diagnosis that was inferred from the histological examination. RECOMMENDED. | +object | +id, label | +[{"id": "NCIT:C3778", "label": "Serous Cystadenocarcinoma"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
id | +Run ID. | +string | +NA | +SRR10903401 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
identifiers | +NA | +object | +clinvarVariantId, genomicHGVSId, proteinHGVSIds, transcriptHGVSIds, variantAlternativeIds | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
inclusionCriteria | +Inclusion criteria used for defining the cohort. It is assumed that all cohort participants will match such criteria. | +object | +ageRange, diseaseConditions, ethnicities, genders, locations, phenotypicConditions | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
individualId | +Reference to the individual ID. | +string | +NA | +TCGA-AO-A0JJ | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
info | +Placeholder to allow the Beacon to return any additional information that is necessary or could be of interest in relation to the query or the entry returned. It is recommended to encapsulate additional informations in this attribute instead of directly adding attributes at the same level than the others in order to avoid collision in the names of attributes in future versions of the specification. | +object | +NA | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
interventionsOrProcedures | +Class describing a clinical procedure or intervention. Provenance: GA4GH Phenopackets v2 Procedure |
+array | +ageAtProcedure, bodySite, dateOfProcedure, procedureCode | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
iso8601duration | +Represents age as a ISO8601 duration (e.g., P40Y10M05D). | +string | +NA | +P32Y6M1D | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
karyotypicSex | +The chromosomal sex of an individual represented from a selection of options. | +string | +NA | +NA | +UNKNOWN_KARYOTYPE, XX, XY, XO, XXY, XXX, XXYY, XXXY, XXXX, XYY, OTHER_KARYOTYPE | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
label | +The text that describes the term. By default it could be the preferred text of the term, but is it acceptable to customize it for a clearer description and understanding of the term in an specific context. | +string | +NA | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
libraryLayout | +Ontology value for the library layout e.g "PAIRED", "SINGLE" #todo add Ontology name? | +string | +NA | +NA | +PAIRED, SINGLE | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
librarySelection | +Selection method for library preparation, e.g "RANDOM", "RT-PCR" | +string | +NA | +RANDOM, RT-PCR | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
librarySource | +Ontology value for the source of the sequencing or hybridization library, e.g "genomic source", "transcriptomic source" | +object | +id, label | +[{"id": "GENEPIO:0001966", "label": "genomic source"}, {"id": "GENEPIO:0001965", "label": "metagenomic source"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
libraryStrategy | +Library strategy, e.g. "WGS" | +string | +NA | +WGS | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
locations | +Country or region of origin of the individual (birthplace or residence place regardless of ethnic origin). Value from GAZ Geographic Location ontology (GAZ:00000448), e.g. GAZ:00002459 (United States of America). | +array | +id, label | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
measurementValue | +NA | +oneOf | +Complex Value, Value | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
measurements | +Definition of a measurement class. Provenance: GA4GH Phenopackets v2 Measurement |
+array | +assayCode, date, measurementValue, notes, observationMoment, procedure | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
measures | +Definition of a measurement class. Provenance: GA4GH Phenopackets v2 Measurement |
+array | +assayCode, date, measurementValue, notes, observationMoment, procedure | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
memberId | +Identifier of the individual. The individual could be part of the same Beacon datasets or not, in which case the information here is meant to complete the pedigree. If the individual is also in the dataset use that Individual ID. If it is not the in the dataset, use a non-collading ID, e.g. concatenating the Pedigree ID with a local ID, similarly to the example 'Pedigree1001-m1'. | +string | +NA | +Pedigree1001-m1, Ind0012122 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
molecularAttributes | +NA | +object | +aminoacidChanges, geneIds, genomicFeatures, molecularEffects | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
name | +Name of the dataset | +string | +NA | +Dataset with synthetic data | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
notes | +Unstructured text to describe additional properties of this disease instance. | +string | +NA | +Some free text | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
numSubjects | +Total number of subjects in pedigree. | +integer | +NA | +10 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
observationMoment | +NA | +oneOf | +Age, AgeRange, GestationalAge, TimeInterval | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
obtentionProcedure | +Ontology value from NCIT Intervention or Procedure ontology term (NCIT:C25218) describing the procedure for sample obtention, e.g. NCIT:C15189 (biopsy). | +object | +ageAtProcedure, bodySite, dateOfProcedure, procedureCode | +[{"code": {"id": "NCIT:C15189", "label": "biopsy"}}, {"code": {"id": "NCIT:C157179", "label": "FGFR1 Mutation Analysis"}}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
onset | +NA | +oneOf | +Age, AgeRange, GestationalAge, TimeInterval | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
pathologicalStage | +Pathological stage, if applicable, preferably as subclass of NCIT:C28108 - Disease Stage Qualifier. RECOMMENDED. | +object | +id, label | +[{"id": "NCIT:C27977", "label": "Stage IIIA"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
pathologicalTnmFinding | +NA | +array | +id, label | +[{"id": "NCIT:C48725", "label": "T2a Stage Finding"}, {"id": "NCIT:C48709", "label": "N1c Stage Finding"}, {"id": "NCIT:C48699", "label": "M0 Stage Finding"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
pedigrees | +Pedigree studies in which the individual is part of. | +array | +disease, id, members, numSubjects | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
phenotypicConditions | +Used to describe a phenotype that characterizes the subject or biosample. | +array | +evidence, excluded, featureType, modifiers, notes, onset, resolution, severity | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
phenotypicEffects | +List of annotated effects on disease or phenotypes. | +array | +annotatedWith, category, clinicalRelevance, conditionId, effect, evidenceType | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
phenotypicFeatures | +Used to describe a phenotype that characterizes the subject or biosample. | +array | +evidence, excluded, featureType, modifiers, notes, onset, resolution, severity | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
pipelineName | +Analysis pipeline and version if a standardized pipeline was used | +string | +NA | +Pipeline-panel-0001-v1 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
pipelineRef | +Link to Analysis pipeline resource | +string | +NA | +doi.org/10.48511/workflowhub.workflow.111.1 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
platform | +General platform technology label. It SHOULD be a subset of the platformModel and used only for query convenience, e.g. "return everything sequenced with Illimuna", where the specific model is not relevant | +string | +NA | +Illumina, Oxford Nanopore, Affymetrix | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
platformModel | +Ontology value for experimental platform or methodology used. For sequencing platforms the use of "OBI:0400103 - DNA sequencer" is suggested. | +object | +id, label | +[{"id": "OBI:0002048", "label": "Illumina HiSeq 3000"}, {"id": "OBI:0002750", "label": "Oxford Nanopore MinION"}, {"id": "EFO:0010938", "label": "large-insert clone DNA microarray"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
population | +A name for the population. A population could an ethnic, geographical one or just the members of a study. |
+string | +NA | +East Asian, ICGC Chronic Lymphocytic Leukemia-ES, Men, Children | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
procedure | +Class describing a clinical procedure or intervention. Provenance: GA4GH Phenopackets v2 Procedure |
+object | +ageAtProcedure, bodySite, dateOfProcedure, procedureCode | +code | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
procedureCode | +Definition of an ontology term. | +object | +id, label | +[{"id": "MAXO:0001175", "label": "liver transplantation"}, {"id": "MAXO:0000136", "label": "high-resolution microendoscopy"}, {"id": "OBI:0002654", "label": "needle biopsy"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
proteinHGVSIds | +NA | +array | +NA | +["NP_009225.1:p.Glu1817Ter"] ,["LRG 199p1:p.Val25Gly (preferred)"] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
quantity | +Definition of a quantity class. Provenance: GA4GH Phenopackets v2 Quantity |
+object | +referenceRange, unit, value | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
referenceBases | +Reference bases for this variant (starting from start ). * Accepted values: IUPAC codes for nucleotides (e.g. https://www.bioinformatics.org/sms/iupac.html ). * N is a wildcard, that denotes the position of any base, and can be used as a standalone base of any type or within a partially known sequence. As example, a query of ANNT the Ns can take take any form of [ACGT] and will match ANNT , ACNT , ACCT , ACGT ... and so forth. an empty value* is used in the case of insertions with the maximally trimmed, inserted sequence being indicated in AlternateBases .NOTE: Beacon instances may not support UIPAC codes and it is not mandatory for them to do so. In such cases the use of [ACGTN] is mandated. |
+string | +NA | +A, T, N, , ACG | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
resolution | +NA | +oneOf | +Age, AgeRange, GestationalAge, TimeInterval | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
role | +Definition of an ontology term. | +object | +id, label | +[{"id": "NCIT:C64435", "label": "Proband"}, {"id": "NCIT:C96580", "label": "Biological Mother"}, {"id": "NCIT:C96572", "label": "Biological Father"}, {"id": "NCIT:C165848", "label": "Identical Twin Brother"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
runDate | +Date at which the experiment was performed. | +string | +NA | +2021-10-18 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
runId | +Reference to the experimental run ID (run.id ) |
+string | +NA | +SRR10903401 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
sampleOriginDetail | +Tissue from which the sample was taken or sample origin matching the category set in 'sampleOriginType'. Value from Uber-anatomy ontology (UBERON) or BRENDA tissue / enzyme source (BTO), Ontology for Biomedical Investigations (OBI) or Cell Line Ontology (CLO), e.g. 'cerebellar vermis' (UBERON:0004720), 'HEK-293T cell' (BTO:0002181), 'nasopharyngeal swab specimen' (OBI:0002606), 'cerebrospinal fluid specimen' (OBI:0002502). | +object | +id, label | +[{"id": "UBERON:0000474", "label": "female reproductive system"}, {"id": "BTO:0002181", "label": "HEK-293T cell"}, {"id": "OBI:0002606", "label": "nasopharyngeal swab specimen"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
sampleOriginType | +Category of sample origin. Value from Ontology for Biomedical Investigations (OBI) material entity (BFO:0000040) ontology, e.g. 'specimen from organism' (OBI:0001479),'xenograft' (OBI:0100058), 'cell culture' (OBI:0001876) | +object | +id, label | +[{"id": "OBI:0001479", "label": "specimen from organism"}, {"id": "OBI:0001876", "label": "cell culture"}, {"id": "OBI:0100058", "label": "xenograft"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
sampleProcessing | +Status of how the specimen was processed,e.g. a child term of EFO:0009091. | +object | +id, label | +[{"id": "EFO:0009129", "label": "mechanical dissociation"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
severity | +Severity as applicable to phenotype or disease observed. Recommended are values from Human Phenotype Ontology (HP:0012824), e.g mild . The intensity or degree of a manifestation. Source: Phenopackets v2 |
+object | +id, label | +[{"id": "HP:0012828", "label": "Severe"}, {"id": "HP:0012826", "label": "Moderate"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
sex | +Sex of the individual. Value from NCIT General Qualifier (NCIT:C27993): 'unknown' (not assessed or not available) (NCIT:C17998), 'female' (NCIT:C16576), or 'male', (NCIT:C20197). | +object | +id, label | +[{"id": "NCIT:C16576", "label": "female"}, {"id": "NCIT:C20197", "label": "male"}, {"id": "NCIT:C1799", "label": "unknown"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
source | +The study | +string | +NA | +The Genome Aggregation Database (gnomAD), The European Genome-phenome Archive (EGA) | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
sourceReference | +A reference to further documentation or details. | +string | +NA | +gnomad.broadinstitute.org/, ega-archive.org/ | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
stage | +Definition of an ontology term. | +object | +id, label | +[{"id": "OGMS:0000119", "label": "acute onset"}, {"id": "OGMS:0000117", "label": "asymptomatic"}, {"id": "OGMS:0000106", "label": "remission"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
start | +Represents age as an ISO8601 duration (e.g., P18Y). | +object | +iso8601duration | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
toolName | +Name of the tool. | +string | +NA | +Ensembl Variant Effect Predictor (VEP) | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
toolReferences | +References to the tool | +object | +NA | +[{"bio.toolsId": "https://bio.tools/vep"}, {"url": "https://www.ensembl.org/vep"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
transcriptHGVSIds | +NA | +array | +NA | +["NC 000023.10(NM004006.2):c.357+1G"] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
treatments | +Treatment(s) prescribed/administered, defined by treatment ID, date and age of onset, dose, schedule and duration. | +array | +ageAtOnset, cumulativeDose, doseIntervals, routeOfAdministration, treatmentCode | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
tumorGrade | +Term representing the tumor grade. Child term of NCIT:C28076 (Disease Grade Qualifier) or equivalent. | +object | +id, label | +[{"id": "NCIT:C28080", "label": "Grade 3a"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
tumorProgression | +Tumor progression category indicating primary, metastatic or recurrent progression. Ontology value from Neoplasm by Special Category ontology (NCIT:C7062), e.g. NCIT:C84509 (Primary Malignant Neoplasm). | +object | +id, label | +[{"id": "NCIT:C84509", "label": "Primary Malignant Neoplasm"}, {"id": "NCIT:C4813", "label": "Recurrent Malignant Neoplasm"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
unit | +The kind of unit. Recommended from NCIT Unit of Category ontology term (NCIT:C42568) descendants | +object | +id, label | +[{"id": "NCIT:C70575", "label": "Roentgen"}, {"id": "NCIT:C28252", "label": "Kilogram"}, {"id": "NCIT:C28253", "label": "Milligram"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
updateDateTime | +The time the dataset was updated in (ISO 8601 format) | +string | +NA | +2017-01-17T20:33:40Z | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
value | +The value of the quantity in the units | +number | +NA | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
variantAlternativeIds | +Definition of an external reference class. Provenance: GA4GH Phenopackets v2 ExternalReference |
+array | +id, notes, reference | +[{"id": "dbSNP:rs587780345", "notes": "dbSNP id", "reference": "https://www.ncbi.nlm.nih.gov/snp/rs587780345"}, {"id": "ClinGen:CA152954", "notes": "ClinGen Allele Registry id", "reference": "https://reg.clinicalgenome.org/redmine/projects/registry/genboree_registry/by_caid?caid=CA152954"}, {"id": "UniProtKB:P35557#VAR_003699", "reference": "https://www.uniprot.org/uniprot/P35557#VAR_003699"}] ,[{"id": "OMIM:164757.0001", "reference": "https://www.omim.org/entry/164757#0001"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
variantCaller | +Reference to variant calling software / pipeline | +string | +NA | +GATK4.0 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
variantInternalId | +Reference to the internal variant ID. This represents the primary key/identifier of that variant inside a given Beacon instance. Different Beacon instances may use identical id values, referring to unrelated variants. Public identifiers such as the GA4GH Variant Representation Id (VRSid) MUST be returned in the identifiers section. A Beacon instance can, of course, use the VRSid as their own internal id but still MUST represent this then in the identifiers section. |
+string | +NA | +var00001, v110112 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
variantLevelData | +NA | +object | +clinicalInterpretations, phenotypicEffects | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
variantType | +The variantType declares the nature of the variation in relation to a reference. In a response, it is used to describe the variation. In a request, it is used to declare the type of event the Beacon client is looking for. If in queries variants can not be defined through a sequence of one or more bases (precise variants) it can be used standalone (i.e. without alternateBases ) together with positional parameters. Examples here are e.g. queries for structural variants such as DUP (increased allelic count of material from the genomic region between start and end positions without assumption about the placement of the additional sequence) or DEL (deletion of sequence following start ). Either alternateBases or variantType is required, with the exception of range queries (single start and end parameters). |
+string | +NA | +SNP, DEL, DUP, BND | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
variation | +NA | +oneOf | +LegacyVariation, MolecularVariation, SystemicVariation | +NA | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
version | +version of the source data. | +string | +NA | +gnomAD v3.1.1 | +NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
zygosity | +Ontology term for zygosity in which variant is present in the sample from the Zygosity Ontology (GENO:0000391) , e.g heterozygous (GENO:0000135) |
+object | +id, label | +[{"id": "GENO:0000135", "label": "heterozygous"}, {"id": "GENO:0000136", "label": "homozygous"}, {"id": "GENO:0000604", "label": "hemizygous X-linked"}] |
+NA | +
Term | +Description | +Type | +Properties | +Example | +Enum | +
---|---|---|---|---|---|
biosampleId | +Reference to the biosample ID. | +string | +NA | +008dafdd-a3d1-4801-8c0a-8714e2b58e48 | +NA | +
id | +Run ID. | +string | +NA | +SRR10903401 | +NA | +
individualId | +Reference to the individual ID. | +string | +NA | +TCGA-AO-A0JJ | +NA | +
info | +Placeholder to allow the Beacon to return any additional information that is necessary or could be of interest in relation to the query or the entry returned. It is recommended to encapsulate additional informations in this attribute instead of directly adding attributes at the same level than the others in order to avoid collision in the names of attributes in future versions of the specification. | +object | +NA | +NA | +NA | +
libraryLayout | +Ontology value for the library layout e.g "PAIRED", "SINGLE" #todo add Ontology name? | +string | +NA | +NA | +PAIRED, SINGLE | +
librarySelection | +Selection method for library preparation, e.g "RANDOM", "RT-PCR" | +string | +NA | +RANDOM, RT-PCR | +NA | +
librarySource | +Ontology value for the source of the sequencing or hybridization library, e.g "genomic source", "transcriptomic source" | +object | +id, label | +[{"id": "GENEPIO:0001966", "label": "genomic source"}, {"id": "GENEPIO:0001965", "label": "metagenomic source"}] |
+NA | +
libraryStrategy | +Library strategy, e.g. "WGS" | +string | +NA | +WGS | +NA | +
platform | +General platform technology label. It SHOULD be a subset of the platformModel and used only for query convenience, e.g. "return everything sequenced with Illimuna", where the specific model is not relevant | +string | +NA | +Illumina, Oxford Nanopore, Affymetrix | +NA | +
platformModel | +Ontology value for experimental platform or methodology used. For sequencing platforms the use of "OBI:0400103 - DNA sequencer" is suggested. | +object | +id, label | +[{"id": "OBI:0002048", "label": "Illumina HiSeq 3000"}, {"id": "OBI:0002750", "label": "Oxford Nanopore MinION"}, {"id": "EFO:0010938", "label": "large-insert clone DNA microarray"}] |
+NA | +
runDate | +Date at which the experiment was performed. | +string | +NA | +2021-10-18 | +NA | +
These are examples extracted directly from the GitHub repository.
+{
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "biosampleId": "008dafdd-a3d1-4801-8c0a-8714e2b58e48",
+ "id": "SRR10903401",
+ "runDate": "2021-10-18"
+}
+
{
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "biosampleId": "008dafdd-a3d1-4801-8c0a-8714e2b58e48",
+ "id": "SRR10903401",
+ "individualId": "TCGA-AO-A0JJ",
+ "libraryLayout": "PAIRED",
+ "librarySelection": "RANDOM",
+ "librarySource": {
+ "id": "GENEPIO:0001966",
+ "label": "genomic source"
+ },
+ "libraryStrategy": "WGS",
+ "platform": "Illumina",
+ "platformModel": {
+ "id": "OBI:0002048",
+ "label": "Illumina HiSeq 3000"
+ },
+ "runDate": "2021-10-18"
+}
+
Beacon v2 is a protocol and specification established by the Global Alliance for Genomics and Health (GA4GH) that defines an open standard for the discovery of genomic (and phenoclinic) data in biomedical research and clinical applications. Beacon facilitates the discovery of genomic variants and biomedical data in single or distributed resources with the goal to empower federated data models - i.e. the discovery (and potential retrieval) of data from different organisational and geographic locations.
Concept behind the Beacon v2 specification The protocol defines a framework for queries potentially containing genomic, phenotypic, clinical and techmical parameters. While all beacons support the minimal response of \"yes / no\" upon a query, Beacon v2 enables rich responses including detailed information about samples and experiments if supported by the individual resource and in the given context of security and authorisation.The Beacon specification is developed by an international team of sientists and technology experts, as a product of the GA4GH Discovery work stream and with major support from the European bioinformatics infrastructure organization ELIXIR.
The current version of the protocol is Beacon v2 represents a complete revision of the original code base and introduced a number of powerful new features which were considered important by the community such as:
Move to Beacon v2!
On 2022-04-21 Beacon v2 has been approved as an official GA4GH standard through the GA4GH steering committee.
With the release of Beacon v2 implementations of v1 and earlier are not longer supported. Deployers of Beacon instances or networks are advised to migrate to v2 of the standard. The functionality of Beacon v1 can be easily implemented in v2.
This website represents information about the Beacon protocol, its use for data discovery and data delivery but also about ways towards its implementation to \"beaconize\" genomics datasets and resources as well as discussions of the technical details of the Beacon framework and data model.
Additional information about the Beacon project - including news, events, publications - is available through the separate website at beacon-project.io.
Historical Tip
Originally, the Beacon protocol (versions 0 and 1) allowed researchers to get information about the presence/absence of a given, specific, genomic mutation in a set of data, from patients of a given disease or from the population in general. Early versions of Beacon did not support query parameters beyond genomic variations and did provide ways for the optional retrieval of matched recors.
"},{"location":"#components","title":"Components","text":"Beacon v2 consists of two components, the Framework and the Models.
The Framework contains the format for the requests and responses, whereas the Models define the structure of the biological data response. The overall function of these components is to provide the instructions to design a REST API (REpresentational State Transfer Application Programming Interface) with OpenAPI Specification (OAS). The OAS defines a standard, language-agnostic interface that is used by software developers to implement REST APIs.
Framework interdependency, releases and alternative models
In principle, this dual system allows for different Models (in other domains outside of the Beacon v2 realm, e.g. \"Imaging Beacon\" to be built using the same Framework. However, in the current context of Beacon v2, we consider the two elements interdependent and likely to be updated together for subsequent major versions (e.g. from v2 to v3).
"},{"location":"#informations-for-different-types-of-beacon-users","title":"Informations for Different Types of Beacon Users","text":"The Beacon documentation provides information for different types of users, depending on their interests and use cases. Although those will overlap, we highlight information relevant for some general scenarios throughout the documentation.
"},{"location":"#users","title":"Users","text":"A Beacon user (or end-user) is interested in querying Beacon instances and networks, either through web interfaces by using the Beacon API. While users of Beacon web forms in principle do not need to understand the underlying query syntax and response formats they too may benefit from some insights into the general capabilities of the underlying protocol.
User
A Beacon Deployer is someone who wants to make their genomics resource accessible through the Beacon protocol, without necessarily being interested or experienced in the computational aspects; while a Beacon Implementer provides the technical expertise (and potentially may get involved with Beacon development itself, e.g. to extend the protocol for novel use cases).
Deployer
Beacon v2 Models
Reference Implementation Link
Implementer
Stakeholder
Citation
Beacon v2 and Beacon Networks: a \"lingua franca\" for federated data discovery in biomedical genomics, and beyond. Jordi Rambla, Michael Baudis, Tim Beck, Lauren A. Fromont, Arcadi Navarro, Manuel Rueda, Gary Saunders, Babita Singh, J.Dylan Spalding, Juha Tornroos, Claudia Vasallo, Colin D.Veal, Anthony J.Brookes. Human Mutation (2022) DOI.
How do I emulate Beacon v1 while supporting the v2 protocol?The Beacon Framework describes the overall structure of the API requests, responses, parameters etc. One can implement e.g. a Boolean beacon (cf. the original protocol) without any use of the model, just by providing a well-formed JSON response upon a request very similar to the (pre-)v1 allele request.
Is itBeacon
or beacon
? The uppercase Beacon
is used to label API, framework or protocol and their components - while lower case beacons
are instances of these, i.e. individual resources using the protocol.
Beacon v2.0 does not provide a mechanism to detect what types of genomic variant queries are supported by a given instance.
Beacon had been originally designed to handle the \"simplest\" type of genomic variant queries in which a position
, alternateBases
(i.e. one or more base sequence of the variant at the position) and - sometimes optional - the reference sequence at this position (necessary e.g. for small deletions).
Beacon v1.1 in principle supported \"bracketed\" queries and a variantType
parameter (pointing to the VCF use) - see the current documentation for details. However, the support & interpretation was - and still is (2022-12-13) - left to implementers. Similar for Beacon Range Queries.
However, the Beacon documentation provides information about use and expected interpretation of variantType
values, specifically for copy number variations.
Ages are queried as ISO8601 durations such as P65Y
(i.e. 65 years) with a comparator (=
, <=
, >
...). However, the value needs an indication of what the duration refers to and resources may provide different ways to indicate this (as then shown in their /filtering_terms
) endpoint).
We recommend that all Beacon instances that support age queries support at minimum the syntax of age:<=P65Y
and map such values to the internal datapoint most relevant for the resource's context (in most cases probably corresponding to \"age at diagniosis\").
However, different scenarios may be supported (e.g. EFO_0005056:<=P1Y6M
for an \"age at death\" scenario).
This example is for a minimal SNV-type variant query.
/beacon/g_variants/?referenceName=refseq:NC_000017.11&start=7577120&referenceBases=G&alternateBases=A\n
"},{"location":"FAQ/#example-boolean-response","title":"Example Boolean Response","text":"In this minimal response to the query above the beacon indicates that its default response is Boolean and that it could interpreted it against the genomicVariant
entity and in the context of the same Beacon version.
In principle one could launch a Beacon instance using the example response document as a template in whatever server environment one has at hand. However, a proper Beacon v2 installation also has to provide informational endpoints (/info
, /map
...) to allow it's integration through aggregators.
{\n\"meta\": {\n\"apiVersion\": \"v2.0.0\",\n\"beaconId\": \"org.progenetix.beacon\",\n\"receivedRequestSummary\": {\n\"apiVersion\": \"v2.0.0\",\n\"pagination\": {\n\"limit\": 2000,\n\"skip\": 0\n},\n\"requestedGranularity\": \"boolean\",\n\"requestedSchemas\": [\n{\n\"entityType\": \"genomicVariant\",\n\"schema\": \"https://progenetix.org/services/schemas/genomicVariant/\"\n}\n],\n\"requestParameters\": {\n\"alternateBases\": \"A\",\n\"referenceBases\": \"G\",\n\"referenceName\": \"refseq:NC_000017.11\",\n\"start\": [\n7577120\n]\n}\n},\n\"returnedGranularity\": \"boolean\",\n\"returnedSchemas\": [\n{\n\"entityType\": \"genomicVariant\",\n\"schema\": \"https://progenetix.org/services/schemas/genomicVariant/\"\n}\n]\n},\n\"responseSummary\": {\n\"exists\": true\n}\n}\n
"},{"location":"FAQ/#last-change-2023-02-17-mbaudis","title":"last change 2023-02-17 @mbaudis","text":""},{"location":"FAQ/#last-change-2022-10-01-by-mbaudis","title":"last change 2022-10-01 by @mbaudis","text":""},{"location":"FAQ/#last-change-2022-12-14-mbaudis","title":"last change 2022-12-14 @mbaudis","text":""},{"location":"FAQ/#last-change-2023-05-31-by-mbaudis","title":"last change 2023-05-31 by @mbaudis","text":""},{"location":"FAQ/#queries","title":"Queries","text":"The Beacon framework currently (v2.0 and earlier) considers genomic variants to be allelic and does not support the query for multiple alleles or \"haplotype shorthand expressions\" (e.g. C,T
).
Workarounds In case of a specific need for haplotype queries implementers of a given beacon with control of its data content in principle can extend their query model to support shorthand haploype expressions, as long as they support the standard format, too. However, such an approach may be superseeded or in conflict with future direct protocol support.
An approach in line with the current protocol would be to query for one allelic variant with a record-level genomicVariation
response, and then query the retrieved variants individually by their id
in combination with the second allele.
As with queries the Beacon \"legacy\" format does not support haplotype representation but would represent each allelic variation separately. The same is true for the VRSified variant representation which for v2.0 corresponds to VRS v1.2. However, draft versions of the VRS standard (will) address haplotype and genotype representations and will be adopted by Beacon v2.n after reaching a release state.
"},{"location":"beacon-flavours/","title":"Beacon \"Flavours\"","text":"About UI
Most of the information that you will find here is related to the Beacon v2 specification. For that reason, the examples are shown as REST API requests/responses in the form of JSON. If you are only interested in using beacon with a graphical interface please visit the implementations page.
While the original Beacon v1 only provided Boolean (i.e. YES/NO) responses on queries for the existence of specific genomic variants, Beacon v2 is a flexible protocol that supports different usage scenarios - also called \"flavours\", since they are more a representation of usage types w/o prescribing their specific details.
Importantly, the Beacon framework separates query options from the response side. In that way a privacy-protecting1 Boolean Beacon still may offer more query features - and therefore better usability - compared to the first Beacon concept implementations.
Technical Notes
For detailed information about the technical implementation of the different logical scopes please see the Framework documentation.
"},{"location":"beacon-flavours/#aggregate-response-beacons-boolean-and-count","title":"Aggregate Response Beacons - Boolean and Count","text":"A Boolean Response Beacon is in it's response similar to Beacon v1 - i.e. responding with a true or false value when queried for the existence of some data in a resource. Similarly a Count Response Beacon only returns aggregate information, i.e. the number of matched entries (e.g. genomic variants), a feature also part of the Beacon v1 protocol.
However, in contrast to earlier versions, in Beacon v2 in principle a beaconized resource may implement all types of query options (e.g. combinations of various filters and genomic query parameters) but still only offer a Boolean and optionally Count response.
Also, all Beacons should implement the Boolean Response format as fallback option and handle extended options depending on the user's authentication status.
Boolean Response in v2Count Response in v2{\n\"meta\": {\n\"apiVersion\": \"v2.0.0\",\n\"__other_meta_parameters__\": \"...\"\n\"receivedRequestSummary\": {\n\"requestedGranularity\": \"boolean\",\n\"__other_request_parameters__\": \"...\"\n},\n\"returnedGranularity\": \"boolean\"\n},\n\"responseSummary\": {\n\"exists\": true\n}\n}\n
{\n\"meta\": {\n\"apiVersion\": \"v2.0.0\",\n\"__other_meta_parameters__\": \"...\"\n\"receivedRequestSummary\": {\n\"requestedGranularity\": \"count\",\n\"__other_request_parameters__\": \"...\"\n},\n\"returnedGranularity\": \"count\"\n},\n\"responseSummary\": {\n\"exists\": true,\n\"numTotalResults\": 42\n}\n}\n
"},{"location":"beacon-flavours/#beacons-supporting-data-and-information-delivery","title":"Beacons Supporting Data and Information Delivery","text":"Technical Notes
For detailed information about the technical implementation of the different logical scopes please see the Models documentation.
Information about the different data delivery options can be found here:
Privacy protecting as in \"reasonably protecting by design but not immune to complex re-identification attacks\".\u00a0\u21a9
This page only lists changes w/ regard to the documentation and general organization of the Beacon project site(s) as well as with overarching repository organization.
"},{"location":"changes-todo/#changes","title":"Changes","text":""},{"location":"changes-todo/#2023-06-12-restructured-and-extended-documentation","title":"2023-06-12: Restructured and extended documentation","text":"@mbaudis
"},{"location":"changes-todo/#2023-06-04-improved-filter-documentation-https","title":"2023-06-04: Improved filter documentation & HTTPS","text":"GET
contextHTTPS
issue (by brute-forcing all links on site to https://
)@mbaudis
"},{"location":"changes-todo/#2023-03-14-new-website-docs-branch","title":"2023-03-14: Newwebsite-docs
branch","text":"To protect the code branches we are using now a separate website-docs
branch in beacon-v2
for documentation website updates. Please make sure all documentation edits happen there!
@mbaudis
"},{"location":"changes-todo/#2022-06-20-retiring-of-beacon-framework-v2-and-beacon-v2-models-repos","title":"2022-06-20: Retiring ofbeacon-framework-v2
and beacon-v2-Models
repos","text":"archived
w/ pointers to this one here and archived (i.e. set to read only)implementations-v2
repository (part of documentation)filters.md
from section Beacon Components to Implement...._rest-api.md
and _tips-for-implementers.md
).bin
files that parse JSON schemasdocs/*.md
","text":"mermaid
to mermaid2
plugin.networks.md and
roles.md`security.md
under Beacon Types.implementations-v2
repository to the Beacon v2 Documentation - web access here.beacon-v2
","text":"beacon-v2-unity-testing
to beacon-v2
.implementations-and-networks
to other-implementations
and left only the \"Networks\" Part.mkdocs-mermaid2-plugin
both to mkdocs.yaml
and to github workflows.Beacon Compoments/Models
implement-and-deploy.md
The mkdocs-macros-plugin
has been activated, allowing the use of site-wide variables:
repo_model_url: https://github.com/ga4gh-beacon/beacon-v2/tree/main/models/src
{{ no such element: mkdocs.config.defaults.MkDocsConfig object['repo_model_url'] }}
Implementations and Networks
and Standards IntegrationAs of today the new/emerging Beacon v2 documentation is meintained in this repository. We're testing rendered versions (same text/code base) through Github actions (here) and ReadTheDocs.
material
themed buildyaml
export version","text":"Since moving to source in YAML the existence of a separate yaml
export seems unnecessary & maybe confusing. Removed.
The structure of the models
directory has now be changed to have the default model as one of possibly multiple options as per the discussions in #1. The current structure (below) might not be final (e.g. placing of the beaconConfiguration.yaml
, beaconMap.yaml
, endpoints.yaml
files?).
beacon\n |\n |-- framework ...\n |-- models\n | |-- src\n | | |-- beacon-v2-default-model\n | | |-- analyses ...\n | | |-- biosamples ...\n | | |-- genomicVariations ...\n | | |-- ...\n | | |-- endpoints.yaml\n | | \n | |-- json\n | |-- beacon-v2-default-model\n | |-- analyses ...\n | |-- biosamples ...\n | |-- genomicVariations ...\n | |-- ...\n | |-- endpoints.yaml\n |\n |-- bin ...\n |-- docs ... \n...\n
"},{"location":"changes-todo/#2022-03-08-automated-pulling-from-current-origin-repos","title":"2022-03-08: Automated pulling from current origin repos","text":"git -C $BEACONMODELPATH pull\ngit -C $BEACONFRAMEWORKPATH pull\n
yamler.py
with a dedicated beaconYamler.py
The development of Beacon code and documentation happens in the beacon-v2
repository.
main
","text":"The main
branch is the branch used for production, it reflects the last version that beacon v2 has reached by accomplishing the milestones that ga4gh has set for the beacon to be considered as a new version. It can only be committed by a PR from the develop branch and exceptionally by some hotfixes to correct errors spotted after its official deployment.
develop
","text":"The develop
branch is the branch used for development, it reflects the current state of the progress of development. It can be modified by all the PR from the feature branches that have been finished (this means that must include all the merges from the subfeature branches) and the PR must reach a consensus to be finally accepted.
website-docs
","text":"This branch is used to maintain the website at docs.genomebeacons.org. The relevant files consists of anything under /docs
as well as the configuration file (/mkdocs.yaml
) and the workflow file for processing the pages under /.github/workflows/mk-beacon-docs.yaml
.
Changes to the Markdown files in the /docs
directory (and its children) will initiate the processing of the workflow file; updating of the website than may take some minutes.
gh-pages
","text":"The gh-pages
branch is generated from the /docs
directory through its mkdocs
workflow and contains the website itself. Do not edit
TBD
"},{"location":"contribute/","title":"How to Contribute to Beacon Development","text":"The Beacon API & standard is a driver project of the Global Alliance for Genomics and Health GA4GH. Since 2016 Beacon development has been organized through projects supported by ELIXIR with additional contributions from outside organizations and individual developers and implementers.
TBD
"},{"location":"filters/","title":"Filters","text":"Filters represent a powerful addition to the Beacon query API. They are rules for selecting records based upon the field values those records contain. The rules can refer to bio-ontology or custom terms, numerical or alphanumerical values, and employ wildcards, standard operators or other principles of selection. This empowers such options as queries for phenotypes, disease codes or technical parameters associated with observed genomic variants.
Using Filters
Please see Using Filters in Queries for more information on how to use filters in Beacon requests.
"},{"location":"filters/#filter-types","title":"Filter types","text":"A Beacon can support three general types of Filters.
OntologyFilters
are identified using the full term/class identifier as CURIE, e.g. \u201cHP:0100526\u201d.HP:0032443
Past medical history), a comparator and a numerical, pseudo-numerical (e.g. ISO8601 period) or string valueThe /filtering_terms endpoint returns a list of all data fields whose values may be subjected to filtering, plus the data type(s) for those fields, and/or the list of extant values for each of those data fields in the current dataset. In addition, for each bio-ontology used by a Beacon, the endpoint response includes a description of the bio-ontology in Phenopackets Resource format.
The endpoint's filteringTerms
response identifies the Filter types.
Bio-ontology and custom term Filter types contain:
type
= resource name (required) id
= term id (required) label
= term label (optional)\"response\":{\n\"resources\":[\n{\n\"id\":\"hp\",\n\"name\":\"Human Phenotype Ontology\",\n\"url\":\"https://purl.obolibrary.org/obo/hp.owl\",\n\"version\":\"27-03-2020\",\n\"namespacePrefix\":\"HP\",\n\"iriPrefix\":\"https://purl.obolibrary.org/obo/HP_\"\n},\n...\n],\n\"filteringTerms\": [\n{\n\"type\": \"ontologyTerm\",\n\"id\": \"HP:0008773\",\n\"label\": \"neoplasm of the lung\"\n},\n...\n]\n}\n
Alphanumerical value Filter types contain:
type
= data type as 'alphanumeric' (required) id
= field id (required) label
= field label (optional) \"filteringTerms\": [\n{\n\"type\": \"alphanumeric\",\n\"id\": \"PATO:0000011\",\n\"label\": \"age\"\n},\n...\n]\n
"},{"location":"filters/#using-filters-in-queries","title":"Using Filters in Queries","text":"For all query types, the logical AND
is implied between Filters. The Filter id
is required for all query types.
Filters in GET
Requests
GET
requests use a filters
parameter for one or more (comma-separated) filter id
values. In this case general filter defaults apply (e.g. { \"includeDescendantTerms\": true }
). Generally, use of filters other than CURIE values for filter ids is discouraged.
List Parameters in GET Requests
Since the direct interpretation of list parameters in queries is not supported by some server environments (e.g. PHP, GO\u2026), list parameters such as start
and end
should be provided as comma-concatenated strings when using them in GET requests.
Hierarchical term expansion
It is recomended that the use of terms from hierarchical ontologies/classicfications uses an internal term expansion mechanism - i.e. records with parameters containing a child term are matched when the parent term is being queried. This default behaviour can be modoiified (see below).
The following query retrieves (or filters retrieved...) data matching the diagnosis of Papillary Renal Cell Carcinoma (NCIT:C6975) from a publication identified through its PubMed id (22824167):
GETPOST/biosamples?filters=PMID:22824167,NCIT:C6975\n
\"filters\": [\n{\n\"id\": \"PMID:22824167\"\n},\n{\n\"id\": \"NCIT:C6975\"\n}\n]\n
"},{"location":"filters/#modified-hierarchical-ontology-query","title":"Modified hierarchical ontology query","text":"A Beacon will query for entities associated with the submitted bio-ontology term(s), and by default, all descendent terms. The optional includeDescendantTerms
parameter can be set to either true
or false
. The default and assumed value of includeDescendantTerms
is true
, thus if the parameter is not set, then the use of bio-ontology terms in a Beacon request implies that a hierarchical ontology search is requested.
Request example of two filters, where one filter excludes matches with descendent terms:
POST\"filters\": [\n{\n\"id\": \"HP:0100526\",\n\"includeDescendantTerms\": false\n},\n{\n\"id\": \"HP:0005978\"\n}\n]\n
"},{"location":"filters/#semantic-similarity-query","title":"Semantic similarity query","text":"A Beacon will query for entities that are associated with bio-ontology terms that are similar to the submitted terms. The Beacon API is agnostic to the semantic similarity model implemented by a Beacon and how a Beacon applies the relative thresholds of similarity. A semantic similarity query request contains the required similarity
parameter with a value set to define the relative threshold level of high
, medium
or low
.
POST request example of two Filters using differing relative similarity thresholds:
\"filters\": [\n{\n\"id\": \"HP:0100526\",\n\"similarity\": \"high\"\n},\n{\n\"id\": \"HP:0005978\",\n\"similarity\": \"medium\"\n}\n]\n
"},{"location":"filters/#alphanumerical-value-queries","title":"Alphanumerical value queries","text":"A Beacon will query for quantitative properties when the required operator
and numerical value
parameters are set in the filters request. The id
parameter identifies the logical scope (with the exact field depending on the internal data model at the resource), the operator
parameter defines the operator to use, and the value
parameter provides the field query value. Equality and relational operators (= < >) can be used between field name and field value pairs, and field values can be associated with units if applicable.
filters=age:>P70Y
filters=PATO_0000011:>P70Y
(\"age\")filters=EFO_0004847:>P70Y
(\"age at onset\")\"filters\": [\n{\n\"id\": \"PATO:0000011\",\n\"operator\": \">\",\n\"value\": \"P70Y\"\n}\n]\n
We recommend that implementers provide term expansions for equivalent terms, depending on the context. Also, it is up to the implementers to provide the correct tooling for e.g. transformation of input values (e.g. numerical age in years and comparator) to the standardized wire format (e.g. ages/durations are always transmitted as ISO8601 periods) as well as the correct deparsing and use (e.g. the ISO values probably will be converted to some numerical format for database matches).
"},{"location":"filters/#text-matches","title":"Text matches","text":"A Beacon will query free-text values within fields when the required operator
and alphanumerical value
parameters are set in the filters request. Queries can be for exact alphanumerical values, used to exclude alphanumerical values, or employ wildcards to match patterns within alphanumerical values. In all query classes, the id
parameter identifies the field name, the operator
parameter defines the operator to use, and the value
parameter provides the field query value.
The operator
parameter is set to the equality (=) operator.
POST request example of using free-text to filter medical history (past medical history = HP:0032443):
\"filters\": [\n{\n\"id\": \"HP:0032443\",\n\"operator\": \"=\",\n\"value\": \"unknown medical history\"\n}\n]\n
'LIKE' value query
The inclusion of a percent sign (%) wildcard character within the value
parameter represents zero or more characters within a LIKE style string match. The wildcard character can lead the query string, end the string, or surround the string.
POST request example to filter medical history free-text for any reference to cancer:
\"filters\": [\n{\n\"id\": \"HP:0032443\",\n\"operator\": \"=\",\n\"value\": \"%cancer%\"\n}\n]\n
"},{"location":"filters/#not-value-query","title":"'NOT' value query","text":"The operator
parameter is set to the logical not (!) operator. The value
parameter should not be present in field value. The wildcard character can be used if required. The following example shows how to filter medical history free-text for records that do not include the query string:
filters=HP_0032443:!unknown+medical+history
\"filters\": [\n{\n\"id\": \"HP:0032443\",\n\"operator\": \"!\",\n\"value\": \"unknown medical history\"\n}\n]\n
"},{"location":"formats-standards/","title":"Formats, Standards and Integrations","text":""},{"location":"formats-standards/#data-formats-and-standards","title":"Data Formats and Standards","text":""},{"location":"formats-standards/#coding-and-naming-conventions","title":"Coding and naming conventions","text":"For historical reasons, in the names of entities, parameters and URLs we are following these conventions:
PascalCase
camelCase
snake_case
The only exception is: service-info
which is a required GA4GH standard and has a different word separation convention.
The Beacon v2 API follows OpenAPI 3.0.2 specification for the endpoints, in conjuntion with JSON Schema (2020-12) to define the Framework and the Models components. The specification uses JSON references ($ref
) to reference internal (e.g., definitions) or external concepts/terms (e.g., VRS).
The Beacon v2 specification is written in YAML. The original files are located under src
directory (see below). For technical purposes, we also provide a copy of the original YAML in JSON format (see json
directory below). Changes in the specification must be performed in the YAML version and are then rewritten to the JSON version.
framework\n|-- json\n| |-- common\n| | `-- examples\n| |-- configuration\n| | `-- examples\n| |-- requests\n| | |-- examples-fullDocuments\n| | `-- examples-sections\n| `-- responses\n| |-- sections\n| |-- examples-fullDocuments\n| `-- examples-sections\n`-- src\n |-- common\n | `-- examples\n |-- configuration\n | `-- examples\n |-- requests\n | |-- examples-fullDocuments\n | `-- examples-sections\n `-- responses\n |-- sections\n |-- examples-fullDocuments\n `-- examples-sections\n
models\n|-- json\n| `-- beacon-v2-default-model\n| |-- analyses\n| | `-- examples\n| |-- biosamples\n| | `-- examples\n| |-- cohorts\n| | `-- examples\n| |-- common\n| |-- datasets\n| | `-- examples\n| |-- genomicVariations\n| | `-- examples\n| |-- individuals\n| | `-- examples\n| `-- runs\n| `-- examples\n`-- src\n `-- beacon-v2-default-model\n |-- analyses\n | `-- examples\n |-- biosamples\n | `-- examples\n |-- cohorts\n | `-- examples\n |-- common\n |-- datasets\n | `-- examples\n |-- genomicVariations\n | `-- examples\n |-- individuals\n | `-- examples\n `-- runs\n `-- examples\n
"},{"location":"formats-standards/#genome-coordinates","title":"Genome Coordinates","text":"GA4GH Genome Coordinate Use Recommendation1
Date and time formats are specified as ISO8601 compatible strings, both for time points as well as for durations. Some of the ISO8601 compatible formats have not (yet) been used in the Beacon v2 default model.
"},{"location":"formats-standards/#examples","title":"Examples","text":"\"type\": \"string\", format\": \"date-time\"
The development of the Beacon v2 framework and default model closely follows and widely adopts concepts and schemas from approved GA4GH products such as Phenopackets and the Variant Representation Standard (VRS).
"},{"location":"formats-standards/#variant-representation-standard-vrs","title":"Variant Representation Standard (VRS)","text":"The GA4GH Variant Representation Standard (VRS) constitutes the reference one should use when implementing representations of genomic variations. The current version 1.2 has been approved and covers a set of use cases and requirements, especially with respect to genomic (including cytogenetic or feature based) locations. However, it is not yet suitable for a number of practical use cases, especially the representation of some structural variations.
The Beacon v2 default model for GenomicVariation
makes use of the VRS standard to represent the variation
part, i.e. the location and sequence or copy number changes of the genomic variation. While a \"legacy\" alternative is still allowed this one too has been adjusted to make use of the VRS Location
format.
The examples are for different forma of the location
property inside a genomicVariation
.
\"variation\": {\n\"type\": \"Allele\",\n\"state\": {\n\"sequence\": \"G\",\n\"type\": \"LiteralSequenceExpression\"\n},\n\"location\": {\n\"type\": \"SequenceLocation\",\n\"sequence_id\": \"refseq:NC_000017.11\",\n\"interval\": {\n\"type\": \"SequenceInterval\",\n\"start\": {\n\"type\": \"Number\",\n\"value\": 7577120\n},\n\"end\": {\n\"type\": \"Number\",\n\"value\": 7577121\n}\n}\n}\n}\n
\"variation\": {\n\"type\": \"RelativeCopyNumber\",\n\"relative_copy_class\": \"partial loss\",\n\"location\": {\n\"type\": \"SequenceLocation\",\n\"sequence_id\": \"refseq:NC_000018.10\",\n\"interval\": {\n\"start\": {\n\"type\": \"Number\",\n\"value\": 23029501\n},\n\"end\": {\n\"type\": \"Number\",\n\"value\": 62947165\n}\n}\n}\n}\n
\"variation\": {\n\"variantType\": \"SNP\",\n\"referenceBases\": \"C\",\n\"alternateBases\": \"G\",\n\"location\": {\n\"type\": \"SequenceLocation\",\n\"sequence_id\": \"refseq:NC_000017.11\",\n\"interval\": {\n\"type\": \"SequenceInterval\",\n\"start\": {\n\"type\": \"Number\",\n\"value\": 7577120\n},\n\"end\": {\n\"type\": \"Number\",\n\"value\": 7577121\n}\n}\n}\n}\n
\"variation\": {\n\"variantType\": \"DEL\",\n\"location\": {\n\"type\": \"SequenceLocation\",\n\"sequence_id\": \"refseq:NC_000018.10\",\n\"interval\": {\n\"start\": {\n\"type\": \"Number\",\n\"value\": 23029501\n},\n\"end\": {\n\"type\": \"Number\",\n\"value\": 62947165\n}\n}\n}\n}\n
"},{"location":"formats-standards/#link-vrs-documentation","title":"LINK: VRS Documentation","text":""},{"location":"formats-standards/#phenopackets","title":"Phenopackets","text":"In the Beacon v2 default data model, many schemas are either directly compatible to Phenopackets v2 building blocks or at least reflect them but with some adjustments. While the Beacon v2 default model's schemas do not per se have to reflect PXF schemas, we target an as-close-as-possible alignment to promote/leverage GA4GH-wide standardization.
"},{"location":"formats-standards/#top-level-differences","title":"Top-level differences","text":"The Phenopackets model is centered around the Phenopacket
, which is the collector and integrator of all sub-schemas (with the addition of the external Family
and Cohort
schemas). While Phenopacket
usually describes information related to a subject
- which is defined in an Individual
- and the top level elements in Phenopacket
relate to a specific proband
(measurements
as \"Measurements performed in the proband\"), the phenopacket itself does not explicitely represent an individual.
In contrast, the Beacon v2 default model uses a hierarchy in which biosamples reference individuals directly (if existing). For most purposes one can equate Beacon's Individual
with a merge of Phenopacket's core Phenopacket
and Individual
parameters.
==
PXF v2","text":""},{"location":"formats-standards/#age","title":"Age
","text":"AgeRange
","text":"Evidence
","text":"KaryotypicSex
","text":"ReferenceRange
","text":"While unit
in Beacon points to a Unit
definition, this is itself an OntologyTerm
i.e. structurally the same.
Value
","text":"=~
PXF v2 (e.g. renamed or additional parameters)","text":""},{"location":"formats-standards/#complexvalue","title":"ComplexValue
","text":"Renamed ComplexValue.TypedQuantity.quantityType
compared to GA4GH Phenopackets v2 ComplexValue.TypedQuantity.type
due to problematic use of type
as parameter
ExternalReference
","text":"Renamed ExternalReference.notes
compared to GA4GH Phenopackets v2 ExternalReference.description
due to problematic use of description
as parameter
Measurement
","text":"Added notes
and date
.
PhenotypicFeature
","text":"featureType
type
severity
(re-used definition reflecting an ontology term) severity
(ontology class) notes
"},{"location":"formats-standards/#procedure","title":"Procedure
","text":"procedureCode
code
ageAtProcedure
(TimeElement) performed
(TimeElement
) dateOfProcedure
(ISO date)"},{"location":"formats-standards/#timeelement","title":"TimeElement
","text":"The specific parameters have been aligned w/ minimal differences in naming or use of general parameters.
Beacon PhenopacketsontologyTerm
ontology_class
age
age
(Age
) ageRange
age_range
(AgeRange
) gestationalAge
gestational_age
(GestationalAge
) ...Timestamp
timestamp
(TimeStamp
) timeInterval
interval
(TimeInterval
)"},{"location":"formats-standards/#treatment","title":"Treatment
","text":"Beacon still has an ageOfOnset
parameter (?). Also, PXF agent
has been renamed to a more general treatmentCode
.
~
PXF v2 (e.g. multiple/complex differences)","text":""},{"location":"formats-standards/#disease","title":"Disease
","text":""},{"location":"formats-standards/#pedigree","title":"Pedigree
","text":"While the Beacon & Phenopackets schemas for \"pedigree\" representation are not aligned, they may become superseded by the GA4GH pedigree standard currenty under development.
"},{"location":"formats-standards/#sex","title":"Sex
","text":"Beacon directly uses the (IMO preferable) representation through an ontology term, while PXF uses an ordinal mapping
"},{"location":"formats-standards/#link-phenopackets-documentation","title":"LINK: Phenopackets Documentation","text":"Source: @andrewyatz at SchemaBlocks {S}[B] \u21a9
The GA4GH Beacon specification is composed by two parts:
The Beacon Framework is the part that describes the overall structure of the API requests, responses, parameters, the common components, etc. It could also be referred in this document as simply the Framework.
A Beacon Model describes the set of concepts included in a Beacon version (e.g. Beacon v2), like individual or biosample. It could also be referred in this document as simply the Model.
The Framework could be considered the syntax and the Model as the semantics.
Refer to the Models for further information about the default model and how to use it.
The Framework doesn't include anything related to specific entities but only the mechanisms for querying them and parsing the responses. The BF is, therefore, independent from/agnostic to any specific Model. It can be leveraged to describe models from other domains like proteomics, imaging, biobanking, etc.
A Beacon instance is just an implementation of a Beacon Model that follows the rules stated by the Beacon Framework.
If you are a Beacon implementer, then, you don't need to clone this (Framework) repo, you only need to copy (or clone) the Beacon Model and modify it to your specific instance. You will find plenty of references to the Framework in the Model copy, and you will use the Json schemas in the Framework to validate that both the structure of your requests and responses are compliant with the Beacon Framework. The Beacon verifier tool would help in such validation.
The Framework repo includes the elements that are common to all Beacons:
Please visit the Standards Page
"},{"location":"framework/#folder-structure-in-the-framework-repo","title":"Folder structure in the framework repo","text":"The above listed elements are organized in several folders (in alphabetical order):
The root folder only contains the endpoints.json document, an OpenAPI 3.0.2 description of the endpoints that every Beacon instance MUST implement. The endpoints are: * the root (/
) and /info
that MUST return information (metadata) about the Beacon service and the organization supporting it. * the /service-info
endpoint that returns the Beacon metadata in the GA4GH Service Info schema. * the /configuration
endpoint that returns some configuration aspects and the definition of the entry types (e.g. genomic variants, biosamples, cohorts) implemented in that specific Beacon server or instance. * the /entry_types
endpoints that only return the section of the configuration that describes the entry types in that Beacon. * the /map
endpoint that returns a map (like a web sitemap) of the different endpoints implemented in that Beacon instance. * the /filtering_terms
endpoint that returns a list of the filtering terms accepted by that Beacon instance.
Most of these endpoints simply return the configuration files that are in the Beacon configuration folder. Of course, every Beacon instance would have their particular instance of such documents, including the configuration of such instance.
Note: It could be argued that the Beacon configuration files are different for every Beacon instance and, hence, they should be part of the Model. However, the configuration files MUST be used, exactly with the same schema, by any model, independently if that Beacon follows the Beacon v2 Model or any other. Additionally, these endpoints and configuration files are critical for a Beacon client to be able to understand and use a Beacon instance. Therefore, we have considered it to be an essential part of the Framework and belonging to it.
"},{"location":"framework/#the-configuration","title":"The Configuration","text":"Contains the Json schema files that describe the Beacon configuration, its contents are described in the section above, as they have almost a 1-to-1 relationship with such endpoints. Further details about the specific content of each file could be find in the corresponding sections below.
"},{"location":"framework/#the-requests","title":"The Requests","text":"Contains the following Json schemas:
RequestBody
to keep the same nomenclature used by OpenAPI v3, but it actually contains the definition of the whole HTTP POST request payload.MIN
in the name shows the minimal required attributes for the request to be compliant. The example labelled with MAX
in the name includes a richer case with all the sections filled in.Both, the filters (filteringTerms) and the parameters (requestParameters), are used to refine the query. The availability of two mechanisms to refine the queries could sound initially confusing, but that separation is taylored to facilitate the interpretation of the request by the Beacon server.
An basic difference is that, in HTTP GET requests, each parameters is named (e.g. 'id', 'skip','limit') while filters go under the same named parameter 'filters'. For HTTP POST requests, the difference relays on paramaters having each one a separate definition (e.g. id
is a string
, while skip
is an integer
), while all filters follow the schema described in /requests/filteringTerms.json
.
An unrestricted query like /datasets
should return the list of all datasets in a Beacon instance. That query could be refined by adding a generic condition like: \"return only datasets which could be used for 'general research'\" or \"return only the first 10 datasets\". The former belong to the filter category, the latter to the parameters. If you are a beacon implementer, a rule of thumb could be:
The Beacon concept includes several types of responses: some informative or informational and some with actual data payloads, and the error one.
"},{"location":"framework/#informational-responses","title":"Informational responses","text":"A Beacon is able to return information, details, about itself. Many of the schema responses included in the responses
folder have a 1-to-1 relationship with the corresponding configuration documents and their equivalent root endpoints, e.g. the beaconEntryTypeResponse.json
is the schema of a response that wraps the beaconConfiguration.json
document, and is then used as the payload of the /entry_types
root endpoint. Schematically: * configuration/an_schema.json: describes the schema of the configuration file itself. * responses/an_schema_response.json: describes the format of the response that returns these configuration information. * root/endpoints.json: describes the API endpoints to be called and parameters to be used to retrieve such responses.
The following schemas refer to informational responses: beaconConfigurationResponse, beaconEntryTypeResponse, beaconFilteringTermsResponse, \u00e2nd beaconMapResponse.
"},{"location":"framework/#data-responses","title":"Data Responses","text":"A Beacon could return responses at different granularity levels:
exists: true
('Yes') or exists: false
('No') to a given query.Yes
/No
and the number of matching results.Yes
/No
, the number of matching results and all documents corresponding to the requested entities. Documents are wrapped in \"result set\" objects for every collection (e.g. every dataset or cohort). Even for record level responses each beacon can control the details of data exposed in record besides the minimal requirements of the entry type's schema.Each of these granularity levels has an equivalent response schema:
beaconBooleanResponse
beaconCountResponse
beaconResultSetsResponse
An additional schema, beaconCollectionsResponse, describes such responses that returns details about the collections in a Beacon, but not the collection content themselves. Otherwise said, the response describes a dataset, but not returns the contents of any dataset.
"},{"location":"framework/#common-components","title":"Common Components","text":"Some elements are transerval to the Framework and to any model, e.g. the schema for describing an ontology term or the reference to an external schema (like the reference to GA4GH Phenopackets or GA4GH Service Info schemas).
"},{"location":"framework/#pagination-skip-and-limit","title":"Pagination -skip
and limit
","text":"Record level responses potentially may return many (i.e. thousands and beyond) documents which usually would be \"paginated\", i.e. split into may chunks (\"pages\"). Beacon handles pagination through the skip
and limit
parameters as part of the request:
limit
in the request tells the server the maximum number of records that should be returned in a single response (i.e. the \"page size\")skip
indicates how many of those pages should be skipped over when delivering the resultsTherefore, skip: 2
and limit: 8
will return records 17-24 (if those exist).
Given that the flexibility allowed in the implentation of each Beacon instance, and the security restrictions that could apply (e.g. only answering after authentication of the user), a mechanism is required for allowing testing the compliance of a Beacon. A first step in this compliance testing is done by the implementer by checking that received requests are correct and that the generated responses match the provided schemas. However, an external compliance testing is desirable when the Beacon instance plans to be integrated in a network or to engage in dialogs with a diversity of clients. For this second scenario, the testMode parameter was included.
A Beacon instance could receive a request with the testMode parameter activated (value= true) in which case the Beacon MUST respond, with actual or fake contents, using the response format and skipping any user authentication. The fact that a response has been generated for testing purposes is included in the meta section of the response.
"},{"location":"framework/#the-beacon-configuration-file","title":"The Beacon Configuration file","text":"The file /configuration/beaconConfiguration.json
defines the schema (in Json schema draft-07) of the Json file that includes core aspects of a Beacon instance configuration. The schema includes four sections:
boolean
(true/false) responses, and only if the user is authenticated and explicitly authorized to access the Beacon resources. Although this is the safest set of settings, it is not recommended unless the Beacon shares very sensitive information. Non sensitive Beacons should preferably opt for a record
and PUBLIC
combination.boolean
returns 'true/false' responses. count
adds the total number of positive results found. record
returns details for every row. For those cases where a Beacon prefers to return records with less, not all, attributes, different strategies have been considered, e.g.: keep non-mandatory attributes empty, or Beacon to provide a minimal record definition, but these strategies still need to be tested in real world cases and hence no design decision has been taken yet.
security level descriptionPUBLIC
Any anonymous user can read the data REGISTERED
Only known users can read the data CONTROLLED
Only specificly granted users can read the data"},{"location":"framework/#example","title":"Example","text":" \"maturityAttributes\": {\n\"productionStatus\": \"DEV\"\n},\n\"securityAttributes\": {\n\"defaultGranularity\": \"boolean\",\n\"securityLevels\": [\"PUBLIC\", \"REGISTERED\", \"CONTROLLED\"]\n}\n
The Beacon in the example is in development status, returns boolean answers by default, and has queries available in any of the access levels.
"},{"location":"handovers/","title":"[H\u2014>O] Beacon Handovers for Data Delivery","text":"While the Beacon v1 response was restricted to aggregate data and Beacon v2 itself provides schemas for structuring response objects (e.g. henomic variation or biosample data) the protocol can be expanded by providing custom access methods to data elements matched by a Beacon query. Since November 2018, Beacon v1.n has included support for a \"handover\" protocol, in which rich data content can be provided from linked services, initiated through a Beacon query1.
Typical examples of Handover
use include:
In the following example a minimal boolean response is shown which contains a single handover in the general resultsHandovers
list.
{\n\"meta\": {\n...\n},\n\"responseSummary\": {\n\"exists\": true\n},\n\"resultsHandovers\": [\n{\n\"handoverType\": {\n\"id\": \"EDAM:3016\",\n\"label\": \"VCF\"\n},\n\"url\": \"https://my.genomeserver.space/data/vcf/grch38/gizsgf8oaoiteowgfdhhpoiuy/variants.vcf\",\n\"note\": \"VCFv4.4 file with sample mapped variants (authentication required)\"\n}\n]\n}\n
An early discussion of the topic can e.g. be found in the Beacon developer area on Github. As of 2018-11-13, the handover concept had become part of the code development.\u00a0\u21a9
Important
As previously described, Beacon v2 is an specification for sharing/discovery of data. Thus, a priori, it has nothing to do with any particular software, database or computer language.
"},{"location":"implementations-options/#which-are-the-implementation-options","title":"Which are the implementation options?","text":"Two elements are needed to implement (or \"light\") a Beacon v2:
In this section we are going to present three implementation options, going from no involvement/delegate to CRG software to full delegate to CRG software.
"},{"location":"implementations-options/#option-a","title":"Option A","text":"Let's say that you have your data organized and structured in a database (e.g. SQL or NoSQL which may or may not have an internal layer to get access to it). Let's also say that you have the resources (and knowledge) to read the \"instructions\" (i.e., Beacon v2 specification) to build an API on top of your existing solution. If that's your case, then this is the option for you. You are one of what we call Beacon v2 API implementers. We have a few of them already in the Beacon v2 Service Registry:
bycon
Python stack driving full featured v2 under the Progenetix resourceLet's say that you have a solution to organize your data but you don't have the resources (or knowledge) to implement a Beacon v2 API yourself. In some pilot studies, CRG has been helping individual institutions to build their Beacon v2 API. However, this option is not practical and does not scalate well so you may want to check Option C.
"},{"location":"implementations-options/#option-c","title":"Option C","text":"Let's say that you have your data somewhat structured (you may have Excel files, PDFs, VCFs... or maybe a SQL database, or an EHR solution with phenoclinic information).
You want to \"beaconize\" your data to be part of a larger ecosystem, but you're unsure where to start, and/or don't want to invest a lot of resources because you are still unsure if the whole thing will pay off. Well, you're a not alone! Most centers are in this situation. For that reason at CRG we developed the Beacon v2 Reference Implementation.
Important
People that download and install B2RI or another pre-packaged solution are named Beacon v2 deployers.
"},{"location":"models/","title":"beacon-v2-Models","text":""},{"location":"models/#introduction","title":"Introduction","text":"The GA4GH Beacon specification is composed by two parts:
The Beacon Framework (in Framework repo ) is the part that describes the overall structure of the API requests, responses, parameters, the common components, etc. It could also be referred in this document as simply the Framework.
Beacon Models (in the Models repo ) describes the set of concepts included in a Beacon version (e.g. Beacon v2), like individual or biosample, and also the relationships between them. It could also be referred in this document as simply the Model.
The Framework could be considered the syntax and the Model as the semantics.
Refer to the Framework for further information about the Framework and its parts.
A beacon instance is just an implementation of a Beacon Model that follows the rules stated by the Beacon Framework.
Beacon default model vs. beacon instances
While the Beacon default model provides templates for responses and formats for uniform data delivery - especially for networked beacons - it does not prescribe how data should be organised in individual instances or what schemas should be used for local storage.
If you are a Beacon implementer, then, you don't need to clone the Framework repo, you only need to copy (or clone) the Beacon Model and modify it to your specific case. You will find plenty of references to the Framework in the Model copy, and you will use the Json schemas there to validate that both the structure of your requests and responses are compliant with the Beacon Framework. The Framework is not used to check the schema in the responses payload (e.g. the actual details of a biosample of a cohort). The schemas for that are included in the Model that you should have copied.
classDiagram analyses <-- genomicVariations : 1..n runs <-- analyses : 1..n biosamples <-- runs : 1..n individuals <-- biosamples : 1..n runs <.. genomicVariations : 1..n biosamples <.. genomicVariations : 1..n individuals <.. genomicVariations : 1..n biosamples <.. analyses : 1..n individuals <.. analyses : 1..n individuals <.. runs : 1..n cohorts o-- individuals : m..n datasets o-- genomicVariations : 1..n class genomicVariations{ analysisId runId biosampleId individualId variation clinicalInterpretations caseLevelData ... } class analyses{ id runId biosampleId individualId analysisDate pipelineName aligner ... } class biosamples{ id individualId biosampleStatus sampleOriginType histologicalDiagnosis collectionDate ... } class individuals{ id sex diseases phenotypicFeatures ethnicity pedigrees ... } class runs{ id biosampleId individualId runDate librarySource libraryStrategy platform ... } class datasets{ id name description dataUseCondition info updateDateTime ... } class cohorts{ id name cohortType cohortSize cohortDataTypes cohortDesign ... }
Beacon v2 Models entities and their relationshipsThe above entities are defined as follows;
Beacon v1 Model: Repo
Provided as an example for Beacon v1 implementers that want to update to Beacon v2 but not planning to add any additional entry type to their Beacon.
"},{"location":"networks/","title":"Beacon Networks and Aggregators","text":"Although a Beacon can be instantiated as stand-alone solution Many Beacon instances will be part of managed networks, e.g. multi-institunional projects where individual beacons are combined through a single interface. Additionally, open beacon instances may be accessed from aggregators which can register these resources, federate queries and aggregate the responses, possibly without any direct support from the instances' maintainers.
Beacon Networks
... are collections of multiple beacon instances - possibly from different institutions or providers. Beacon networks rely on some sort of central service managing the integration of nodes and provide a unified access through a customized interface and possibly with active alignment of the instances' features (such as harmonized filtering terms). One may think of a beacon network as a \"managed aggregator\" with some active alignment of the individual resources.
Beacon Aggregator
... provides a single interface and API for accessing multiple Beacon instances where the individual beacons may not necessarily be harmonized (or even aware of their integration through the aggregator). An aggregator may include functionality to remap requests and responses for beacons with e.g. different versions or such using different standards (genome editions, ontology terms...).
The Beacon framework includes several features aimed to be consumed by Beacon network aggregators. For example, a Beacon endpoint declares which entities are implemented in that particular instance, which filtering terms are being supported or the URL endpoints through which different entities (such as biosamples or genomic variants) can be queried.
Beacon v2 Networks"},{"location":"networks/#networking-heterogeneous-beacons","title":"Networking heterogeneous beacons","text":"
In addition to genomic variation queries with Boolean responses the Beacon v2 protocol permits the implementers to support different types of entities (e.g. biosample and analysis data) both to be queried against and to be returned in Beacon responses - so a request may retrieve information about the samples in which an indicated genomic variant had been found or information about technical parameters used to detect such a variant.
However, individual beacons will have different profiles regarding the supported parameters, supported entities or the filtering terms recognized. Here, a number of information endpoints allow the profiling of beacons which is especially important when designing Beacon networks and aggregating their responses.
"},{"location":"networks/#supported-filters","title":"Supported filters","text":"Filters represent a powerful way to query various features in beacon entities. When designing a network of multiple beacons the filtering_terms
informational endpoints can be utilized to e.g. implement translators for harmonizing the possibly differing terms used in the individual Beacon instances.
TBD
"},{"location":"other-implementations/","title":"Other implementations","text":""},{"location":"other-implementations/#registry-server","title":"Registry Server","text":"The Beacon registry server, hosted through the European Genome-Phenome Archive, monitors a number of implementations of the Beacon v2 protocol by various organisations actively involved in Beacon protocol development.
"},{"location":"other-implementations/#link-beacon-v2-ga4gh-approval-registry","title":"Link: Beacon v2 GA4GH Approval Registry","text":""},{"location":"other-implementations/#example-implementations","title":"Example Implementations","text":""},{"location":"other-implementations/#progenetix-api","title":"Progenetix API","text":"The Progenetix database and cancer genomic information resource contains genome profiles of more than 140000 individual cancer genome screening experiments, with the majority representing results from genomic copy number assessment studies. With its Beacon+ forward-looking test implementation, since 2016 Progenetix has been developing concepts for Beacon protocol extensions such as CNV query options or handover data delivery.
"},{"location":"other-implementations/#technologies","title":"Technologies","text":"bycon
Python-based full stack API / middleware (documentation here)progenetix-web
React based front-end (modular for Beacon instances as well as the whole Progenetix UI)Find below some tips to get you started:
A beacon instance will allow to retrieve data - in contrast to the aggregated boolean and count responses - if it supports record
granularity. The type of document(s) is selected either through the REST path or by specifying the entity through the requestedEntityId
.
While any beacon can in principle choose its own data model - and thereby the schemas of records it supports - for biomedical genomics beacons we recommend the support of the Beacon default data model
"},{"location":"records/#beacon-default-data-model","title":"Beacon Default Data Model","text":"The Beacon v2 default data model provides a set of schemas for common data entities with a focus on biomedical genomics (although neither specific to medical application or human genomics per se).
In contrast to earlier versions of the protocol, the Beacon v2 default models provide the technical blueprint for rich, structured data responses to Beacon queries, such as annotated genomic variations, biosamples from which matched variants were retrieved or data about individuals and study cohorts, where available and authorized.
Detailed information is available through the Models Introduction and the default schemas documented from there.
"},{"location":"records/#examples","title":"Examples","text":"Biosample in Beacon v2.0This example is a single biosample response, e.g. as the result of a REST path call (.../biosamples/{id}/
). The response just demonstrates some of the available biosample parameters and removes some technical/meta information for clarity. Also, the sample contains fields which are not defined in the default schema (such as icdoMorphology
...); but although the use of custom fields is discouraged to enhance interoperability, the use of additionalProperties
is allowed so the data itself remains schema conform.
{\n \"meta\": {\n \"apiVersion\": \"v2.0.0\",\n \"beaconId\": \"org.progenetix\",\n \"receivedRequestSummary\": {\n ...\n },\n \"returnedGranularity\": \"record\",\n \"returnedSchemas\": [\n {\n \"entityType\": \"biosample\",\n \"schema\": \"https://progenetix.org/services/schemas/biosample/\"\n }\n ],\n },\n \"responseSummary\": {\n \"exists\": true,\n \"numTotalResults\": 1\n },\n \"response\": {\n \"resultSets\": [\n {\n \"exists\": true,\n \"setType\": \"dataset\",\n \"id\": \"progenetix\",\n \"resultsCount\": 1,\n \"results\": [\n {\n \"id\": \"pgxbs-kftvi9i0\",\n \"individualId\": \"pgxind-kftvi9i0\",\n \"notes\": \"Primary Tumor\",\n \"biosampleStatus\": {\n \"id\": \"EFO:0009656\",\n \"label\": \"neoplastic sample\"\n },\n \"collectionMoment\": \"P44Y1M24D\",\n \"sampleOriginType\": {\n \"id\": \"OBI:0001479\",\n \"label\": \"specimen from organism\"\n },\n \"dataUseConditions\": {\n \"id\": \"DUO:0000004\",\n \"label\": \"no restriction\"\n },\n \"externalReferences\": [\n {\n \"id\": \"pgx:TCGA.933b9daf-a5bf-46cf-92b6-5ddd8279919c\",\n \"label\": \"TCGA case_id\"\n },\n {\n \"id\": \"pgx:TCGA.TCGA-76-6663\",\n \"label\": \"TCGA submitter_id\"\n },\n {\n \"id\": \"pgx:TCGA.005cb7ce-5050-43aa-85ff-cd56ed830535\",\n \"label\": \"TCGA sample_id\"\n },\n {\n \"id\": \"pgx:TCGA.GBM\",\n \"label\": \"TCGA GBM project\"\n }\n ],\n \"histologicalDiagnosis\": {\n \"id\": \"NCIT:C3058\",\n \"label\": \"Glioblastoma\"\n },\n \"icdoMorphology\": {\n \"id\": \"pgx:icdom-94403\",\n \"label\": \"Glioblastoma, NOS\"\n },\n \"icdoTopography\": {\n \"id\": \"pgx:icdot-C71.9\",\n \"label\": \"Brain, NOS\"\n },\n \"pathologicalStage\": {\n \"id\": \"NCIT:C92207\",\n \"label\": \"Stage Unknown\"\n },\n \"sampleOriginDetail\": {\n \"id\": \"UBERON:0000955\",\n \"label\": \"brain\"\n },\n \"updated\": \"2020-09-10 17:44:04.888000\"\n }\n ]\n }\n ]\n }\n}\n
"},{"location":"records/#alternative-data-models","title":"Alternative Data Models","text":"In principle, the separation of framework and models allows for different models in domains outside of the genomics focussed Beacon v2 realm, e.g. \u201cImaging Beacon\u201d, to be built using the same Framework.
"},{"location":"rest-api/","title":"Beacon REST API","text":"While the full power of the Beacon API can be unlocked through the use of structured queries using JSON serialization (\"POST\" requests), the majority of common queries can be implemented through standard query URLs with parameters (GET queries).
"},{"location":"rest-api/#beacon-api-url-structure","title":"Beacon API URL structure","text":"Beacon REST paths in general follow the format
__APIroot__/__entryType__/{id}/
or
__APIroot__/__entryType__/{id}/__requestedSchema__
A typical example would e.g. the request to retrieve all genomic variants associated with a biosample
https://example.com/beacon/api/biosamples/bios-st4582/g_variants
The endpoind paths available for a given Beacon instance are defined in __APIroot__/beaconMap/
Github
POST
requests","text":"In POST
requests queries and metadata are defined in JSON objects as specified in the model supported by the Beacon instance. For more information see
GET
queries","text":"By default the Beacon model supports a limited set of query parameters, most notably such addressing genomic variations. Examples can be found in the Genomic Queries documentation and in the requests section of the default model.
"},{"location":"rest-api/#list-parameters-in-get-queries","title":"List parameters inGET
queries","text":"Several of the common query parameters have a multiple value option, i.e. are assumed to be lists. A typical use case here would be the construction of Bracket Queries which use 2 of each start
and end
values.
,
separator for list values in GET
Due to the problem of some web frameworks with the interpretation of multiple values for the same parameter we recommend the consistant use of a single parameter name and comma-concatenated values.
&start=1234000&start=5234000
&start=1234000,5234000
Disclaimer
A stand-alone regulatory and ethics review has been performed on the specification itself> However, it is the responsibility of the implementer to ensure that appropriate measures are taken to remove risks related to privacy, confidentiality, and/or security of data.
The Beacon uses a 3-tiered access model: anonymous
, registered
, and controlled access
.
Technical Notes
For detailed information about the technical implementation of the different levels of security please see the Framework documentation.
"},{"location":"security/#registered","title":"Registered","text":"For a Beacon to respond to a query at the registered tier, the user must identify themselves to the Beacon, for example by using an ELIXIR identity.
"},{"location":"security/#controlled","title":"Controlled","text":"For a Beacon to respond to a controlled access query, the user must have applied for, and been granted access to, the Beacon (or data derived from one or more individuals within the Beacon) individuals) whose data is only accessible at specified tiers within the Beacon. This tiered access model allows the owner or controller of a Beacon to determine which responses are returned to whom depending on the query and the user who is making the request, for example to ensure the response respects the consent under which the data were collected.
"},{"location":"security/#anonymous","title":"Anonymous","text":"Anonymous Beacon can be accessed by any request.
Synthetic data
The use of synthetic data for testing is important in that it ensures that the full functionality of a Beacon can be tested and / or demonstrated without risk of exposing data from individuals. In addition to testing or demonstrating a deployment, synthetic data should be used for development, for example when adding new features.
"},{"location":"variant-queries/","title":"Genomic Variant Queries","text":"For querying of genomic variations Beacon v2 builds on and extends the options provided by earlier versions.
"},{"location":"variant-queries/#beacon-sequence-queries","title":"Beacon Sequence Queries","text":"Sequence Queries query for the existence of a specified sequence at a given genomic position. Such queries correspond to the original Beacon queries and are used to match short, precisely defined genomic variants such as SNVs and INDELs.
"},{"location":"variant-queries/#parameters","title":"Parameters","text":"referenceName
start
(single value)alternateBases
referenceBases
This is an example for a single base mutation (G>A
) at a specific position (GRCh38 chromosome 17 7577120
) in the EIF4A1 eukaryotic translation initiation factor 4A1.
?referenceName=NC_000017.11&start=7577120&referenceBases=G&alternateBases=A\n
{\n \"$schema\":\"beaconRequestBody.json\",\n \"meta\": {\n \"apiVersion\": \"2.0\",\n \"requestedSchemas\": [\n {\n \"entityType\": \"genomicVariation\",\n \"schema:\": \"https://raw.githubusercontent.com/ga4gh-beacon/beacon-v2/main/models/json/beacon-v2-default-model/genomicVariations/defaultSchema.json\"\n }\n ]\n },\n \"query\": {\n \"requestParameters\": {\n \"g_variant\": {\n \"referenceName\": \"NC_000017.11\",\n \"start\": [7577120],\n \"referenceBases\": \"G\",\n \"alternateBases\": \"A\"\n }\n }\n },\n \"requestedGranularity\": \"record\",\n \"pagination\": {\n \"skip\": 0,\n \"limit\": 5\n }\n}\n
There are optional parameters [datasetIds
, filters
...] and also the option to specify the response type (through requestedGranularity
) and returned data format (requestedSchemas
). Please follow this up in the framework documentation.
?assemblyId=GRCh38&referenceName=17&start=7577120&referenceBases=G&alternateBases=A\n
?ref=GRCh38&chrom=17&pos=7577121&referenceAllele=C&allele=A\n
"},{"location":"variant-queries/#optional","title":"Optional","text":"datasetIds=__some-dataset-ids__
filters
...datasetIds=__some-dataset-ids__
beacon=__some-beacon-id__
Before Beacon v0.4 a 1-based coordinate system was being used.
"},{"location":"variant-queries/#beacon-range-queries","title":"Beacon Range Queries","text":"Beacon Range Queries are supposed to return matches of any variant with at least partial overlap of the sequence range specified by reference_name
, start
and end
parameters.
referenceName
start
(single value)end
(single value)variantType
OR alternateBases
OR aminoacidChange
variantMinLength
variantMaxLength
Use of start
and end
Range queries require the use of single start
and end
parameters, in contrast to Bracket Queries.
?assemblyId=GRCh38&referenceName=17&start=7572837&end=7578641\n
{\n \"$schema\":\"https://raw.githubusercontent.com/ga4gh-beacon/beacon-v2/main/framework/json/requests/beaconRequestBody.json\",\n \"meta\": {\n \"apiVersion\": \"2.0\",\n \"requestedSchemas\": [\n {\n \"entityType\": \"genomicVariation\",\n \"schema:\": \"https://raw.githubusercontent.com/ga4gh-beacon/beacon-v2/main/models/json/beacon-v2-default-model/genomicVariations/defaultSchema.json\"\n }\n ]\n },\n \"query\": {\n \"requestParameters\": {\n \"g_variant\":\n \"referenceName\": \"NC_000017.11\",\n \"start\": [ 7572837 ],\n \"end\": [ 7578641 ]\n }\n }\n },\n \"requestedGranularity\": \"record\",\n \"pagination\": {\n \"skip\": 0,\n \"limit\": 5\n }\n}\n
Range Queries are new to Beacon v2
Range Queries are new to Beacon v2
"},{"location":"variant-queries/#beacon-geneid-queries","title":"Beacon GeneId Queries","text":"GeneId Queries are in essence a variation of Range Queries in which the coordinates are replaced by the HGNC gene symbol. It is left to the implementation if the matching is done on variants annotated for the gene symbol or if a positional translation is being applied.
"},{"location":"variant-queries/#parameters_2","title":"Parameters","text":"geneId
variantType
OR alternateBases
OR aminoacidChange
variantMinLength
variantMaxLength
geneId
(deletion CNV) ?geneId=EIF4A1&variantMaxLength=1000000&variantType=DEL\n
"},{"location":"variant-queries/#beacon-bracket-queries","title":"Beacon Bracket Queries","text":"Bracket Queries allow the specification of sequence ranges for both start and end positions of a genomic variation. The typical example here is the query for similar structural variants - particularly CNVs - affecting a genomic region but potentially differing in their exact base extents.
"},{"location":"variant-queries/#parameters_3","title":"Parameters","text":"referenceName
start
(min) and start
(max) - i.e. 2 start parametersend
(min) and end
(max) - i.e. 2 end parametersvariantType
(optional)Use of start
and end
Bracket queries require the use of two start
and end
parameters, in contrast to Range Queries.
List Parameters in GET Requests
Since the direct interpretation of list parameters in queries is not supported by some server environments (e.g. PHP, GO\u2026), list parameters such as start
and end
should be provided as comma-concatenated strings when using them in GET requests.
The following example shows a \"bracket query\" for focal deletions of the TP53 gene locus:
This leads to matching of deletion CNVs which have at least some base overlap with the gene locus but are not larger than approx. 5Mb (operational definitions of focality vary between 1 and 5Mb).
Beacon v2 GETBeacon v2 POSTBeacon v1Beacon v0.3?datasetIds=TEST&referenceName=NC_000017.11&variantType=DEL&start=5000000,7676592&end=7669607,10000000\n
{\n\"$schema\":\"https://raw.githubusercontent.com/ga4gh-beacon/beacon-v2/main/framework/json/requests/beaconRequestBody.json\",\n\"meta\": {\n\"apiVersion\": \"2.0\",\n\"requestedSchemas\": [\n{\n\"entityType\": \"genomicVariation\",\n\"schema:\": \"https://raw.githubusercontent.com/ga4gh-beacon/beacon-v2/main/models/json/beacon-v2-default-model/genomicVariations/defaultSchema.json\"\n}\n]\n},\n\"query\": {\n\"requestParameters\": {\n\"g_variant\": {\n\"referenceName\": \"NC_000017.11\",\n\"start\": [ 5000000, 7676592 ],\n\"end\": [ 7669607, 10000000 ],\n\"variantType\": \"DEL\"\n}\n}\n},\n\"requestedGranularity\": \"record\",\n\"pagination\": {\n\"skip\": 0,\n\"limit\": 5\n}\n}\n
There are optional parameters [datasetIds
, filters
...] and also the option to specify the response type (through requestedGranularity
) and returned data format (requestedSchemas
). Please follow this up in the framework documentation.
?assemblyId=GRCh38&referenceName=17&variantType=DEL&start=5000000,7676592&end=7669607,10000000\n
CNV query options were only implemented with Beacon v0.4, based on Beacon+ prototyping.
"},{"location":"variant-queries/#optional_3","title":"Optional","text":"datasetIds=__some-dataset-ids__
filters
...datasetIds=__some-dataset-ids__
TBD
Beacon v2 GET?allele=NM_004006.2:c.4375C>T\n
to be completed
"},{"location":"variant-queries/#aminoacid-change-query","title":"Aminoacid Change Query","text":"TBD
Beacon v2 GET?aminoacidChange=V600E\n
to be completed
"},{"location":"variant-queries/#varianttype-parameter-interpretation","title":"variantType
Parameter Interpretation","text":"The variantType
parameter is essential for scoping queries beyond precise sequence queries. While versions of Beacon before v2 had demonstrated the use of a few, VCF derived values (particularly for CNV queries using DUP
or DEL
), the relation of these values to underlying genomic variations had not been precisely defined.
Implementation of variantType
in Beacon Instances
The current Beacon query model does not limit the use of values for variantType
since at this time no single specification provides unanimous definitions of genomic variation categories.
variantType
parameter use While for legacy reasons and widespread use of VCFs as input source Beacon v2 documents the use of VCF-like terms, in principle other variant terms can be used (though with possibly negative implications in federated settings). The field of structural genomic variant annotations is rapidly developing, with more specific terms now becoming available e.g. through the Experimental Factor Ontology or the GA4GH Variant Representation Standard VRS (which ligns with the main EFO terms).
"},{"location":"variant-queries/#cnv-term-use-comparison-in-computational-fileschema-formats","title":"CNV Term Use Comparison in Computational (File/Schema) Formats","text":"This table is maintained in parallel with the hCNV community documentation.
EFO Beacon VCF SO GA4GH VRS \u21d2VRS proposal1 NotesEFO:0030070
copy number gain DUP
2 orEFO:0030070
DUP
SVCLAIM=D
3 SO:0001742
copy_number_gain low-level gain
(implicit) \u21d2 EFO:0030070
copy\u00a0number\u00a0gain a sequence alteration whereby the copy number of a given genomic region is greater than the reference sequence EFO:0030071
low-level copy number gain DUP
2 orEFO:0030071
DUP
SVCLAIM=D
3 SO:0001742
copy_number_gain low-level gain
\u21d2 EFO:0030071
low-level copy number gain EFO:0030072
high-level copy number gain DUP
2 orEFO:0030072
DUP
SVCLAIM=D
3 SO:0001742
copy_number_gain high-level gain
\u21d2 EFO:0030072
high-level copy number gain commonly but not consistently used for >=5 copies on a bi-allelic genome region EFO:0030073
focal genome amplification DUP
2 orEFO:0030073
DUP
SVCLAIM=D
3 SO:0001742
copy_number_gain high-level gain
\u21d2 EFO:0030073
focal genome amplification commonly but not consistently used for >=5 copies on a bi-allelic genome region, of limited size (operationally max. 1-5Mb) EFO:0030067
copy number loss DEL
2 orEFO:0030067
DEL
SVCLAIM=D
3 SO:0001743
copy_number_loss partial loss
(implicit) \u21d2 EFO:0030067
copy number loss a sequence alteration whereby the copy number of a given genomic region is smaller than the reference sequence EFO:0030068
low-level copy number loss DEL
2 orEFO:0030068
DEL
SVCLAIM=D
3 SO:0001743
copy_number_loss partial loss
\u21d2 EFO:0030068
low-level copy number loss EFO:0020073
high-level copy number loss DEL
2 orEFO:0020073
DEL
SVCLAIM=D
3 SO:0001743
copy_number_loss partial loss
\u21d2 EFO:0020073
high-level copy number loss a loss of several copies; also used in cases where a complete genomic deletion cannot be asserted EFO:0030069
complete genomic deletion DEL
2 orEFO:0030069
DEL
SVCLAIM=D
3 SO:0001743
copy_number_loss complete loss
\u21d2 EFO:0030069
complete genomic deletion complete genomic deletion (e.g. homozygous deletion on a bi-allelic genome region)"},{"location":"variant-queries/#last-updated-2023-03-22-by-mbaudis-efo0020073","title":"Last updated 2023-03-22 by @mbaudis (EFO:0020073)","text":""},{"location":"variant-queries/#updated-2023-03-20-by-mbaudis-vrs-proposal","title":"updated 2023-03-20 by @mbaudis (VRS proposal)","text":""},{"location":"variant-queries/#query-parameter-change-log","title":"Query Parameter Change Log","text":""},{"location":"variant-queries/#beacon-v2","title":"Beacon v2","text":"assemblyId
parameterreferenceBases
, alternateBases
, variantType
...) may be used to scope the range queryaminoacidChange
geneId
variantMinLength
, variantMaxLength
start
and end
positions when querying multi-base variants allows for \"fuzzy\" CNV queriesvariantType
parameter to specify e.g. CNV queries (DUP
, DEL
)variantType
is not required for precise queries with specified referenceBases
and alternateBases
The VRS annotations refer to the status at v1.2 (2022). The GA4GH VRS team is currently (Spring 2023) preparing an updated specification which will introduce the new class CopyNumberChange
(discussion...) with the use of the EFO terms (including a new term for high level deletion (EFO:0020073)
in the April 2023 EFO release).\u00a0\u21a9
While the use of VCF derived (DUP
, DEL
) values had been introduced with beacon v1, usage of these terms has always been a recommendation rather than an integral part of the API. We now encourage the support of more specific terms (particularly EFO) by Beacon developers. As example, the Progentix Beacon API uses EFO terms but provides an internal term expansion for legacy DUP
, DEL
support.\u00a0\u21a9\u21a9\u21a9\u21a9\u21a9\u21a9\u21a9\u21a9
VCFv4.4 introduces an SVCLAIM
field to disambiguate between in situ events (such as tandem duplications; known adjacency/ break junction: SVCLAIM=J
) and events where e.g. only the change in abundance / read depth (SVCLAIM=D
) has been determined. Both J and D flags can be combined.\u00a0\u21a9\u21a9\u21a9\u21a9\u21a9\u21a9\u21a9\u21a9
The Beacon+ implementation - developed in the Python & MongoDB based bycon
project - implements an expanding set of Beacon v2 paths for the Progenetix resource .
In queries with a complete beaconRequestBody
the type of the delivered data is independent of the path and determined in the requestedSchemas
. So far, Beacon+ will compare the first of those to its supported responses and provide the results accordingly; it doesn't matter if the endpoint was /beacon/biosamples/
or /beacon/variants/
etc.
Below is an example for the standard test \"small deletion CNVs in the CDKN2A locus, in gliomas\" Progenetix test query, here responding with the matched variants. Exchanging the entityType
entry to
{ \"entityType\": \"biosample\", \"schema:\": \"https://progenetix.org/services/schemas/Biosample/\"}
would change this to a biosample response. The example ccan be tested by POSTing this as application/json
to https://progenetix.org/beacon/variants/
or https://progenetix.org/beacon/biosamples/
.
{\n\"$schema\":\"beaconRequestBody.json\",\n\"meta\": {\n\"apiVersion\": \"2.0\",\n\"requestedSchemas\": [\n{\n\"entityType\": \"genomicVariant\",\n\"schema:\": \"https://progenetix.org/services/schemas/genomicVariant\"\n}\n]\n},\n\"query\": {\n\"requestParameters\": {\n\"datasets\": {\n\"datasetIds\": [\"progenetix\"]\n},\n\"assemblyid\": \"GRCh38\",\n\"referenceName\": \"9\",\n\"start\": [21500001, 21975098],\n\"end\": [21967753, 22500000], \"variantType\": \"DEL\"\n}\n},\n\"filters\": [\n{ \"id\": \"NCIT:C3058\", \"includeDescendantTerms\": true }\n]\n}\n
"},{"location":"implementations/org.progenetix/#paths","title":"Paths","text":""},{"location":"implementations/org.progenetix/#base","title":"Base /
","text":"The root path provides the standard BeaconInfoResponse
.
/filtering_terms
","text":""},{"location":"implementations/org.progenetix/#filtering_terms","title":"/filtering_terms/
","text":"/biosamples
","text":""},{"location":"implementations/org.progenetix/#biosamples-query","title":"/biosamples/
+ query","text":"/biosamples/{id}/
","text":"/biosamples/?testMode=true
","text":"/biosamples/{id}/g_variants
","text":"/individuals
","text":""},{"location":"implementations/org.progenetix/#individuals-query","title":"/individuals
+ query","text":"/individuals
+ query + requestedSchema=phenopacket
","text":"Progenetix provides phenopacket
as (currently experimental) alternative schema (requestedSchema
) for /individuals
. This feature allows the combined delivery of attributes annotated w/ the biosamples and such general of the individual, as well as e.g. linking to genomic variation data.
/individuals/{id}
","text":"/individuals/?testMode=true
","text":"/individuals/{id}/g_variants
","text":"/g_variants
","text":"There is currently (April 2021) still some discussion about the implementation and naming of the different types of genomic variant endpoints. Since the Progenetix collections follow a \"variant observations\" principle all variant requests are directed against the local variants
collection.
variants
is used as alias.
/g_variants?testMode=true
","text":"/g_variants
+ query","text":"/g_variants/{id}
","text":"/g_variants/{id}/biosamples
","text":"/analyses
","text":"The Beacon v2 /analyses
endpoint accesses the Progenetix callsets
collection documents, i.e. information about the genomic variants derived from a single analysis. In Progenetix the main use of these documents is the storage of e.g. CNV statistics or binned genome calls.
/callsets
is an alias in Progenetix
/analyses?testMode=true
","text":"/analyses
+ query","text":"variants_in_sample
)/testMode
example/map
endpoint (incomplete/unser construction)/configuration
endpoint (incomplete/unser construction)/filteringTerms
endpoint to v2b4datasets
parameter as objectresponse_summary
response
root element & direct use of result_sets
entityType
format fixedfilters
now objectsvariants_interpretations
exampleresultSets
response formatbycon
backend/analyses
BeaconInfoResponse
id
of the biosample this analysis is reporting on. string NA S0001 NA id Analysis reference ID (external accession or internal ID) string NA NA NA individualId Reference to the id
of the individual this analysis is reporting on. string NA P0001 NA info Placeholder to allow the Beacon to return any additional information that is necessary or could be of interest in relation to the query or the entry returned. It is recommended to encapsulate additional informations in this attribute instead of directly adding attributes at the same level than the others in order to avoid collision in the names of attributes in future versions of the specification. object NA NA NA pipelineName Analysis pipeline and version if a standardized pipeline was used string NA Pipeline-panel-0001-v1 NA pipelineRef Link to Analysis pipeline resource string NA doi.org/10.48511/workflowhub.workflow.111.1 NA runId Run identifier (external accession or internal ID). string NA SRR10903401 NA variantCaller Reference to variant calling software / pipeline string NA GATK4.0 NA"},{"location":"schemas-md/analyses_defaultSchema/#examples","title":"Examples","text":"These are examples extracted directly from the GitHub repository.
MINMAX{\n\"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n\"analysisDate\": \"2021-10-17\",\n\"id\": \"analyses-example-0001\",\n\"pipelineName\": \"Pipeline-panel-0001-v1\"\n}\n
{\n\"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n\"aligner\": \"bwa-0.7.8\",\n\"analysisDate\": \"2021-10-17\",\n\"biosampleId\": \"S0001\",\n\"id\": \"analyses-example-0001\",\n\"individualId\": \"P0001\",\n\"pipelineName\": \"Pipeline-panel-0001-v1\",\n\"pipelineRef\": \"https://doi.org/10.48511/workflowhub.workflow.111.1\",\n\"runId\": \"SRR10903401\",\n\"variantCaller\": \"GATK4.0\"\n}\n
"},{"location":"schemas-md/beacon_terms/","title":"Beacon terms","text":"[{\"id\": \"EFO:0009654\", \"label\": \"reference sample\"}, {\"id\": \"EFO:0009655\", \"label\": \"abnormal sample\"}, {\"id\": \"EFO:0009656\", \"label\": \"neoplastic sample\"}, {\"id\": \"EFO:0010941\", \"label\": \"metastasis sample\"}, {\"id\": \"EFO:0010942\", \"label\": \"primary tumor sample\"}, {\"id\": \"EFO:0010943\", \"label\": \"recurrent tumor sample\"}]
NA collectionDate Date of biosample collection in ISO8601 format. string NA 2021-04-23 NA collectionMoment Individual's or cell cullture age at the time of sample collection in the ISO8601 duration format P[n]Y[n]M[n]DT[n]H[n]M[n]S
. string NA P32Y6M1D, P7D NA diagnosticMarkers NA array id, label NA NA histologicalDiagnosis Disease diagnosis that was inferred from the histological examination. RECOMMENDED. object id, label [{\"id\": \"NCIT:C3778\", \"label\": \"Serous Cystadenocarcinoma\"}]
NA id Biosample identifier (external accession or internal ID). string NA S0001 NA individualId Reference to the individual from which that sample was obtained. string NA P0001 NA info Placeholder to allow the Beacon to return any additional information that is necessary or could be of interest in relation to the query or the entry returned. It is recommended to encapsulate additional informations in this attribute instead of directly adding attributes at the same level than the others in order to avoid collision in the names of attributes in future versions of the specification. object NA NA NA measurements Definition of a measurement class. Provenance: GA4GH Phenopackets v2 Measurement
array assayCode, date, measurementValue, notes, observationMoment, procedure NA NA notes Any relevant info about the biosample that does not fit into any other field in the schema. string NA Some free text NA obtentionProcedure Ontology value from NCIT Intervention or Procedure ontology term (NCIT:C25218) describing the procedure for sample obtention, e.g. NCIT:C15189 (biopsy). object ageAtProcedure, bodySite, dateOfProcedure, procedureCode [{\"code\": {\"id\": \"NCIT:C15189\", \"label\": \"biopsy\"}}, {\"code\": {\"id\": \"NCIT:C157179\", \"label\": \"FGFR1 Mutation Analysis\"}}]
NA pathologicalStage Pathological stage, if applicable, preferably as subclass of NCIT:C28108 - Disease Stage Qualifier. RECOMMENDED. object id, label [{\"id\": \"NCIT:C27977\", \"label\": \"Stage IIIA\"}]
NA pathologicalTnmFinding NA array id, label [{\"id\": \"NCIT:C48725\", \"label\": \"T2a Stage Finding\"}, {\"id\": \"NCIT:C48709\", \"label\": \"N1c Stage Finding\"}, {\"id\": \"NCIT:C48699\", \"label\": \"M0 Stage Finding\"}]
NA phenotypicFeatures Used to describe a phenotype that characterizes the subject or biosample. array evidence, excluded, featureType, modifiers, notes, onset, resolution, severity NA NA sampleOriginDetail Tissue from which the sample was taken or sample origin matching the category set in 'sampleOriginType'. Value from Uber-anatomy ontology (UBERON) or BRENDA tissue / enzyme source (BTO), Ontology for Biomedical Investigations (OBI) or Cell Line Ontology (CLO), e.g. 'cerebellar vermis' (UBERON:0004720), 'HEK-293T cell' (BTO:0002181), 'nasopharyngeal swab specimen' (OBI:0002606), 'cerebrospinal fluid specimen' (OBI:0002502). object id, label [{\"id\": \"UBERON:0000474\", \"label\": \"female reproductive system\"}, {\"id\": \"BTO:0002181\", \"label\": \"HEK-293T cell\"}, {\"id\": \"OBI:0002606\", \"label\": \"nasopharyngeal swab specimen\"}]
NA sampleOriginType Category of sample origin. Value from Ontology for Biomedical Investigations (OBI) material entity (BFO:0000040) ontology, e.g. 'specimen from organism' (OBI:0001479),'xenograft' (OBI:0100058), 'cell culture' (OBI:0001876) object id, label [{\"id\": \"OBI:0001479\", \"label\": \"specimen from organism\"}, {\"id\": \"OBI:0001876\", \"label\": \"cell culture\"}, {\"id\": \"OBI:0100058\", \"label\": \"xenograft\"}]
NA sampleProcessing Status of how the specimen was processed,e.g. a child term of EFO:0009091. object id, label [{\"id\": \"EFO:0009129\", \"label\": \"mechanical dissociation\"}]
NA sampleStorage Status of how the specimen was stored. object id, label NA tumorGrade Term representing the tumor grade. Child term of NCIT:C28076 (Disease Grade Qualifier) or equivalent. object id, label [{\"id\": \"NCIT:C28080\", \"label\": \"Grade 3a\"}]
NA tumorProgression Tumor progression category indicating primary, metastatic or recurrent progression. Ontology value from Neoplasm by Special Category ontology (NCIT:C7062), e.g. NCIT:C84509 (Primary Malignant Neoplasm). object id, label [{\"id\": \"NCIT:C84509\", \"label\": \"Primary Malignant Neoplasm\"}, {\"id\": \"NCIT:C4813\", \"label\": \"Recurrent Malignant Neoplasm\"}]
NA"},{"location":"schemas-md/biosamples_defaultSchema/#examples","title":"Examples","text":"These are examples extracted directly from the GitHub repository.
MINMID{\n\"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n\"biosampleStatus\": {\n\"id\": \"EFO:0009655\",\n\"label\": \"abnormal sample\"\n},\n\"id\": \"sample-example-0001\",\n\"sampleOriginType\": {\n\"id\": \"UBERON:0000474\",\n\"label\": \"female reproductive system\"\n}\n}\n
{\n\"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n\"biosampleStatus\": {\n\"id\": \"EFO:0009655\",\n\"label\": \"abnormal sample\"\n},\n\"collectionDate\": \"2020-09-11\",\n\"collectionMoment\": \"P32Y6M1D\",\n\"id\": \"sample-example-0001\",\n\"obtentionProcedure\": {\n\"procedureCode\": {\n\"id\": \"OBI:0002654\",\n\"label\": \"needle biopsy\"\n}\n},\n\"sampleOriginType\": {\n\"id\": \"UBERON:0000992\",\n\"label\": \"ovary\"\n}\n}\n
"},{"location":"schemas-md/cohorts_defaultSchema/","title":"cohorts defaultSchema","text":"Term Description Type Properties Example Enum cohortDataTypes NA array id, label [{\"id\": \"OGMS:0000015\", \"label\": \"clinical history\"}, {\"id\": \"OBI:0000070\", \"label\": \"genotyping assay\"}, {\"id\": \"OMIABIS:0000060\", \"label\": \"survey data\"}]
NA cohortDesign Cohort type by its design. A plan specification comprised of protocols (which may specify how and what kinds of data will be gathered) that are executed as part of an investigation and is realized during a study design execution. Value from Ontologized MIABIS (OMIABIS) Study design ontology term tree (OBI:0500000). object id, label [{\"id\": \"OMIABIS:0001017\", \"label\": \"case control study design\"}, {\"id\": \"OMIABIS:0001019\", \"label\": \"longitudinal study design\"}, {\"id\": \"OMIABIS:0001024\", \"label\": \"twin study design\"}]
NA cohortSize Count of unique Individuals in cohort (individuals meeting criteria for user-defined
cohorts). If not previously known, it could be calculated by counting the individuals in the cohort. integer NA 14765, 20000 NA cohortType Cohort type by its definition. If a cohort is declared study-defined
or beacon-defined
criteria are to be entered in cohort_inclusion_criteria
; if a cohort is declared user-defined
cohort_inclusion_criteria
could be automatically populated from the parameters used to perform the query. string NA NA study-defined, beacon-defined, user-defined collectionEvents TBD array eventAgeRange, eventCases, eventControls, eventDataTypes, eventDate, eventDiseases, eventEthnicities, eventGenders, eventLocations, eventNum, eventPhenotypes, eventSize, eventTimeline NA NA exclusionCriteria Exclusion criteria used for defining the cohort. It is assumed that NONE of the cohort participants will match such criteria. object ageRange, diseaseConditions, ethnicities, genders, locations, phenotypicConditions NA NA id Cohort identifier. For study-defined
or beacon-defined
cohorts this field is set by the implementer. For user-defined
this unique identifier could be generated upon the query that defined the cohort, but could be later edited by the user. string NA cohort-T2D-2010 NA inclusionCriteria Inclusion criteria used for defining the cohort. It is assumed that all cohort participants will match such criteria. object ageRange, diseaseConditions, ethnicities, genders, locations, phenotypicConditions NA NA name Name of the cohort. For user-defined
this field could be generated upon the query, e.g. a value that is a concatenationor some representation of the user query. string NA Wellcome Trust Case Control Consortium, GCAT Genomes for Life NA"},{"location":"schemas-md/cohorts_defaultSchema/#examples","title":"Examples","text":"These are examples extracted directly from the GitHub repository.
MINMIDMAX{\n\"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n\"cohortType\": \"study-defined\",\n\"id\": \"cohort0001\",\n\"name\": \"GCAT Genomes for Life\"\n}\n
{\n\"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n\"cohortDataTypes\": [\n{\n\"id\": \"OGMS:0000015\",\n\"label\": \"clinical history\"\n},\n{\n\"id\": \"OBI:0000070\",\n\"label\": \"genotyping assay\"\n},\n{\n\"id\": \"OMIABIS:0000060\",\n\"label\": \"survey data\"\n}\n],\n\"cohortDesign\": {\n\"id\": \"OMIABIS:0001019\",\n\"label\": \"longitudinal study design\"\n},\n\"cohortSize\": 20000,\n\"cohortType\": \"study-defined\",\n\"id\": \"cohort0001\",\n\"inclusionCriteria\": {\n\"ageRange\": {\n\"end\": {\n\"iso8601duration\": \"P40Y\"\n},\n\"start\": {\n\"iso8601duration\": \"P18Y\"\n}\n},\n\"genders\": [\n{\n\"id\": \"NCIT:C16576\",\n\"label\": \"female\"\n},\n{\n\"id\": \"NCIT:C20197\",\n\"label\": \"male\"\n}\n],\n\"locations\": [\n{\n\"id\": \"GAZ:00004501\",\n\"label\": \"Catalonia Autonomous Community\"\n}\n]\n},\n\"name\": \"GCAT Genomes for Life\"\n}\n
{\n\"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n\"cohortDataTypes\": [\n{\n\"id\": \"OGMS:0000015\",\n\"label\": \"clinical history\"\n},\n{\n\"id\": \"OBI:0000070\",\n\"label\": \"genotyping assay\"\n},\n{\n\"id\": \"OMIABIS:0000060\",\n\"label\": \"survey data\"\n}\n],\n\"cohortDesign\": {\n\"id\": \"OMIABIS:0001019\",\n\"label\": \"longitudinal study design\"\n},\n\"cohortSize\": 20000,\n\"cohortType\": \"study-defined\",\n\"collectionEvents\": [\n{\n\"eventDataTypes\": {\n\"availability\": true,\n\"distribution\": {\n\"dataTypes\": {\n\"blood collected from fasting subject\": 51,\n\"survey data\": 98\n}\n}\n},\n\"eventDate\": \"2019-04-23\",\n\"eventEthnicities\": {\n\"availability\": true,\n\"availabilityCount\": 101,\n\"distribution\": {\n\"ethnicities\": {\n\"African\": 3,\n\"European\": 90,\n\"Latin American\": 8\n}\n}\n},\n\"eventGenders\": {\n\"availability\": true,\n\"availabilityCount\": 101,\n\"distribution\": {\n\"genders\": {\n\"female\": 51,\n\"male\": 50\n}\n}\n},\n\"eventNum\": 1,\n\"eventSize\": 101\n}\n],\n\"id\": \"cohort0001\",\n\"inclusionCriteria\": {\n\"ageRange\": {\n\"end\": {\n\"iso8601duration\": \"P40Y\"\n},\n\"start\": {\n\"iso8601duration\": \"P18Y\"\n}\n},\n\"genders\": [\n{\n\"id\": \"NCIT:C16576\",\n\"label\": \"female\"\n},\n{\n\"id\": \"NCIT:C20197\",\n\"label\": \"male\"\n}\n],\n\"locations\": [\n{\n\"id\": \"GAZ:00004501\",\n\"label\": \"Catalonia Autonomous Community\"\n}\n]\n},\n\"name\": \"GCAT Genomes for Life\"\n}\n
"},{"location":"schemas-md/datasets_defaultSchema/","title":"datasets defaultSchema","text":"Term Description Type Properties Example Enum createDateTime The time the dataset was created (ISO 8601 format) string NA 2017-01-17T20:33:40Z NA dataUseConditions Data use conditions applying to this dataset. object duoDataUse NA NA description Description of the dataset string NA This dataset provides examples of the actual data in this Beacon instance. NA externalUrl URL to an external system providing more dataset information (RFC 3986 format). string NA example.org/wiki/Main_Page NA id Unique identifier of the dataset string NA ds01010101 NA info Placeholder to allow the Beacon to return any additional information that is necessary or could be of interest in relation to the query or the entry returned. It is recommended to encapsulate additional informations in this attribute instead of directly adding attributes at the same level than the others in order to avoid collision in the names of attributes in future versions of the specification. object NA NA NA name Name of the dataset string NA Dataset with synthetic data NA updateDateTime The time the dataset was updated in (ISO 8601 format) string NA 2017-01-17T20:33:40Z NA version Version of the dataset string NA v1.1 NA"},{"location":"schemas-md/datasets_defaultSchema/#examples","title":"Examples","text":"These are examples extracted directly from the GitHub repository.
MAXMIN{\n\"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n\"createDateTime\": \"2017-01-17T20:33:40Z\",\n\"dataUseConditions\": {\n\"duoDataUse\": [\n{\n\"id\": \"DUO:0000007\",\n\"label\": \"disease specific research\",\n\"modifiers\": [\n{\n\"id\": \"EFO:0001645\",\n\"label\": \"coronary artery disease\"\n}\n],\n\"version\": \"17-07-2016\"\n}\n]\n},\n\"description\": \"This dataset provides examples of the actual data in this Beacon instance.\",\n\"externalUrl\": \"https://example.org/wiki/Main_Page\",\n\"id\": \"ds01010101\",\n\"name\": \"Dataset with synthetic data\",\n\"updateDateTime\": \"2017-01-17T20:33:40Z\",\n\"version\": \"v1.1\"\n}\n
{\n\"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n\"id\": \"ds01010101\",\n\"name\": \"Dataset with synthetic data\"\n}\n
"},{"location":"schemas-md/genomicVariations_defaultSchema/","title":"genomicVariations defaultSchema","text":"Term Description Type Properties Example Enum caseLevelData array alleleOrigin, analysisId, biosampleId, clinicalInterpretations, id, individualId, phenotypicEffects, runId, zygosity NA NA frequencyInPopulations NA array frequencies, source, sourceReference, version NA NA identifiers NA object clinvarVariantId, genomicHGVSId, proteinHGVSIds, transcriptHGVSIds, variantAlternativeIds NA NA molecularAttributes NA object aminoacidChanges, geneIds, genomicFeatures, molecularEffects NA NA variantInternalId Reference to the internal variant ID. This represents the primary key/identifier of that variant inside a given Beacon instance. Different Beacon instances may use identical id values, referring to unrelated variants. Public identifiers such as the GA4GH Variant Representation Id (VRSid) MUST be returned in the identifiers
section. A Beacon instance can, of course, use the VRSid as their own internal id but still MUST represent this then in the identifiers
section. string NA var00001, v110112 NA variantLevelData NA object clinicalInterpretations, phenotypicEffects NA NA variation NA oneOf LegacyVariation, MolecularVariation, SystemicVariation NA NA"},{"location":"schemas-md/genomicVariations_defaultSchema/#examples","title":"Examples","text":"These are examples extracted directly from the GitHub repository.
MINMINMID{\n\"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n\"variantInternalId\": \"GRCh37-1-55505652-G-A\",\n\"variation\": {\n\"alternateBases\": \"A\",\n\"location\": {\n\"interval\": {\n\"end\": {\n\"type\": \"Number\",\n\"value\": 5505653\n},\n\"start\": {\n\"type\": \"Number\",\n\"value\": 5505652\n},\n\"type\": \"SequenceInterval\"\n},\n\"sequence_id\": \"refseq:NC_000001.10\",\n\"type\": \"SequenceLocation\"\n},\n\"variantType\": \"SNP\"\n}\n}\n
{\n\"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n\"variantInternalId\": \"GRCh37-1-55505652-G-A\",\n\"variation\": {\n\"location\": {\n\"interval\": {\n\"end\": {\n\"type\": \"Number\",\n\"value\": 5505653\n},\n\"start\": {\n\"type\": \"Number\",\n\"value\": 5505652\n},\n\"type\": \"SequenceInterval\"\n},\n\"sequence_id\": \"refseq:NC_000001.10\",\n\"type\": \"SequenceLocation\"\n},\n\"state\": {\n\"sequence\": \"A\",\n\"type\": \"SequenceState\"\n},\n\"type\": \"Allele\"\n}\n}\n
{\n\"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n\"frequencyInPopulations\": [\n{\n\"frequencies\": [\n{\n\"alleleFrequency\": 2.939e-05,\n\"population\": \"European (non-Finish)\"\n},\n{\n\"alleleFrequency\": 0,\n\"population\": \"Other\"\n}\n],\n\"source\": \"gnomaD Genomes\",\n\"sourceReference\": \"https://gnomad.broadinstitute.org/\",\n\"version\": \"v3.1.1\"\n},\n{\n\"frequencies\": [\n{\n\"alleleFrequency\": 9e-05,\n\"population\": \"Total\"\n},\n{\n\"alleleFrequency\": 6e-05,\n\"population\": \"European\"\n},\n{\n\"alleleFrequency\": 0,\n\"population\": \"African\"\n}\n],\n\"source\": \"ALFA\",\n\"sourceReference\": \"https://www.ncbi.nlm.nih.gov/snp/docs/gsr/alfa/\",\n\"version\": \"20201027095038\"\n}\n],\n\"identifiers\": {\n\"clinVarIds\": [\n\"434136\",\n\"VCV000440707.6\"\n],\n\"genomicHGVSId\": \"NC_000001.11:g.55039979G>A\",\n\"proteinHGVSIds\": [\n\"NP_777596.2:p.Glu48Lys\"\n],\n\"transcriptHGVSIds\": [\n\"NM_174936.4:c.142G>A\"\n],\n\"variantAlternativeIds\": [\n\"dbSNP:rs3975092470\",\n\"ClinGen: CA340482854\"\n]\n},\n\"molecularAttributes\": {\n\"aminoacidChanges\": [\n\"E48K\"\n],\n\"geneIds\": [\n\"PCSK9\",\n\"LRG_275\"\n],\n\"molecularEffects\": [\n{\n\"id\": \"ENSGLOSSARY:0000150\",\n\"label\": \"Missense variant\"\n}\n]\n},\n\"variantInternalId\": \"var123\",\n\"variantLevelData\": {\n\"clinicalInterpretations\": [\n{\n\"category\": {\n\"id\": \"MONDO:0000001\",\n\"label\": \"disease or disorder\"\n},\n\"clinicalRelevance\": \"pathogenic\",\n\"conditionId\": \"famchol1\",\n\"effect\": {\n\"id\": \"MONDO:0007750\",\n\"label\": \"Familial hypercholesterolemia 1\"\n}\n},\n{\n\"category\": {\n\"id\": \"MONDO:0000001\",\n\"label\": \"disease or disorder\"\n},\n\"clinicalRelevance\": \"uncertain significance\",\n\"conditionId\": \"famchol3\",\n\"effect\": {\n\"id\": \"MONDO:0011369\",\n\"label\": \"hypercholesterolemia, autosomal dominant, 3\"\n}\n}\n]\n},\n\"variation\": {\n\"alternateBases\": \"A\",\n\"location\": {\n\"interval\": {\n\"end\": {\n\"type\": \"Number\",\n\"value\": 55039980\n},\n\"start\": {\n\"type\": \"Number\",\n\"value\": 55039979\n},\n\"type\": \"SequenceInterval\"\n},\n\"sequence_id\": \"refseq:NC_000001.11\",\n\"type\": \"SequenceLocation\"\n},\n\"referenceBases\": \"G\",\n\"variantType\": \"SNP\"\n}\n}\n
"},{"location":"schemas-md/individuals_defaultSchema/","title":"individuals defaultSchema","text":"Term Description Type Properties Example Enum diseases Diseases diagnosed e.g. to an individual, defined by diseaseID, age of onset, stage, level of severity, outcome and the presence of family history. Similarities to GA4GH Phenopackets v2 Disease
array ageOfOnset, diseaseCode, familyHistory, notes, severity, stage NA NA ethnicity Ethnic background of the individual. Value from NCIT Race (NCIT:C17049) ontology term descendants, e.g. NCIT:C126531 (Latin American). A geographic ancestral origin category that is assigned to a population group based mainly on physical characteristics that are thought to be distinct and inherent. [ NCI ] object id, label [{\"id\": \"NCIT:C42331\", \"label\": \"African\"}, {\"id\": \"NCIT:C41260\", \"label\": \"Asian\"}, {\"id\": \"NCIT:C126535\", \"label\": \"Australian\"}, {\"id\": \"NCIT:C43851\", \"label\": \"European\"}, {\"id\": \"NCIT:C77812\", \"label\": \"North American\"}, {\"id\": \"NCIT:C126531\", \"label\": \"Latin American\"}, {\"id\": \"NCIT:C104495\", \"label\": \"Other race\"}]
NA exposures Exposures (lifestyle, behavioural exposures) occurred to individual, defined by exposure ID, date and age of onset, dose, and duration. array ageAtExposure, date, duration, exposureCode, unit, value NA NA geographicOrigin Individual's country or region of origin (birthplace or residence place regardless of ethnic origin). Value from GAZ Geographic Location ontology (GAZ:00000448), e.g. GAZ:00002459 (United States of America). object id, label [{\"id\": \"GAZ:00002955\", \"label\": \"Slovenia\"}, {\"id\": \"GAZ:00002459\", \"label\": \"United States of America\"}, {\"id\": \"GAZ:00316959\", \"label\": \"Municipality of El Masnou\"}, {\"id\": \"GAZ:00000460\", \"label\": \"Eurasia\"}]
NA id Individual identifier (internal ID). string NA P0001 NA info Placeholder to allow the Beacon to return any additional information that is necessary or could be of interest in relation to the query or the entry returned. It is recommended to encapsulate additional informations in this attribute instead of directly adding attributes at the same level than the others in order to avoid collision in the names of attributes in future versions of the specification. object NA NA NA interventionsOrProcedures Class describing a clinical procedure or intervention. Provenance: GA4GH Phenopackets v2 Procedure
array ageAtProcedure, bodySite, dateOfProcedure, procedureCode NA NA karyotypicSex The chromosomal sex of an individual represented from a selection of options. string NA NA UNKNOWN_KARYOTYPE, XX, XY, XO, XXY, XXX, XXYY, XXXY, XXXX, XYY, OTHER_KARYOTYPE measures Definition of a measurement class. Provenance: GA4GH Phenopackets v2 Measurement
array assayCode, date, measurementValue, notes, observationMoment, procedure NA NA pedigrees Pedigree studies in which the individual is part of. array disease, id, members, numSubjects NA NA phenotypicFeatures Used to describe a phenotype that characterizes the subject or biosample. array evidence, excluded, featureType, modifiers, notes, onset, resolution, severity NA NA sex Sex of the individual. Value from NCIT General Qualifier (NCIT:C27993): 'unknown' (not assessed or not available) (NCIT:C17998), 'female' (NCIT:C16576), or 'male', (NCIT:C20197). object id, label [{\"id\": \"NCIT:C16576\", \"label\": \"female\"}, {\"id\": \"NCIT:C20197\", \"label\": \"male\"}, {\"id\": \"NCIT:C1799\", \"label\": \"unknown\"}]
NA treatments Treatment(s) prescribed/administered, defined by treatment ID, date and age of onset, dose, schedule and duration. array ageAtOnset, cumulativeDose, doseIntervals, routeOfAdministration, treatmentCode NA NA"},{"location":"schemas-md/individuals_defaultSchema/#examples","title":"Examples","text":"These are examples extracted directly from the GitHub repository.
MINMID{\n\"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n\"id\": \"Ind001\",\n\"sex\": {\n\"id\": \"NCIT:C16576\",\n\"label\": \"female\"\n}\n}\n
{\n\"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n\"diseases\": [\n{\n\"ageOfOnset\": {\n\"ageGroup\": {\n\"id\": \"NCIT:C49685\",\n\"label\": \"Adult 18-65 Years Old\"\n}\n},\n\"diseaseCode\": {\n\"id\": \"OMIM:164400\",\n\"label\": \"Spinocerebellar ataxia 1\"\n},\n\"familyHistory\": false,\n\"severity\": {\n\"id\": \"HP:0012829\",\n\"label\": \"Profound\"\n},\n\"stage\": {\n\"id\": \"OGMS:0000119\",\n\"label\": \"acute onset\"\n}\n}\n],\n\"ethnicity\": {\n\"id\": \"NCIT:C43851\",\n\"label\": \"European\"\n},\n\"geographicOrigin\": {\n\"id\": \"GAZ:00002955\",\n\"label\": \"Slovenia\"\n},\n\"id\": \"Ind001\",\n\"measures\": [\n{\n\"assayCode\": {\n\"id\": \"LOINC:26515-7\",\n\"label\": \"Platelets [#/volume] in Blood\"\n},\n\"date\": \"2017-05-03\",\n\"measurementValue\": {\n\"units\": {\n\"id\": \"NCIT:C103452\",\n\"label\": \"Per Milliliter\"\n},\n\"value\": 55345\n},\n\"observationMoment\": {\n\"age\": {\n\"iso8601duration\": \"P55Y8M12D\"\n}\n}\n}\n],\n\"sex\": {\n\"id\": \"NCIT:C16576\",\n\"label\": \"female\"\n}\n}\n
"},{"location":"schemas-md/runs_defaultSchema/","title":"runs defaultSchema","text":"Term Description Type Properties Example Enum biosampleId Reference to the biosample ID. string NA 008dafdd-a3d1-4801-8c0a-8714e2b58e48 NA id Run ID. string NA SRR10903401 NA individualId Reference to the individual ID. string NA TCGA-AO-A0JJ NA info Placeholder to allow the Beacon to return any additional information that is necessary or could be of interest in relation to the query or the entry returned. It is recommended to encapsulate additional informations in this attribute instead of directly adding attributes at the same level than the others in order to avoid collision in the names of attributes in future versions of the specification. object NA NA NA libraryLayout Ontology value for the library layout e.g \"PAIRED\", \"SINGLE\" #todo add Ontology name? string NA NA PAIRED, SINGLE librarySelection Selection method for library preparation, e.g \"RANDOM\", \"RT-PCR\" string NA RANDOM, RT-PCR NA librarySource Ontology value for the source of the sequencing or hybridization library, e.g \"genomic source\", \"transcriptomic source\" object id, label [{\"id\": \"GENEPIO:0001966\", \"label\": \"genomic source\"}, {\"id\": \"GENEPIO:0001965\", \"label\": \"metagenomic source\"}]
NA libraryStrategy Library strategy, e.g. \"WGS\" string NA WGS NA platform General platform technology label. It SHOULD be a subset of the platformModel and used only for query convenience, e.g. \"return everything sequenced with Illimuna\", where the specific model is not relevant string NA Illumina, Oxford Nanopore, Affymetrix NA platformModel Ontology value for experimental platform or methodology used. For sequencing platforms the use of \"OBI:0400103 - DNA sequencer\" is suggested. object id, label [{\"id\": \"OBI:0002048\", \"label\": \"Illumina HiSeq 3000\"}, {\"id\": \"OBI:0002750\", \"label\": \"Oxford Nanopore MinION\"}, {\"id\": \"EFO:0010938\", \"label\": \"large-insert clone DNA microarray\"}]
NA runDate Date at which the experiment was performed. string NA 2021-10-18 NA"},{"location":"schemas-md/runs_defaultSchema/#examples","title":"Examples","text":"These are examples extracted directly from the GitHub repository.
MINMAX{\n\"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n\"biosampleId\": \"008dafdd-a3d1-4801-8c0a-8714e2b58e48\",\n\"id\": \"SRR10903401\",\n\"runDate\": \"2021-10-18\"\n}\n
{\n\"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n\"biosampleId\": \"008dafdd-a3d1-4801-8c0a-8714e2b58e48\",\n\"id\": \"SRR10903401\",\n\"individualId\": \"TCGA-AO-A0JJ\",\n\"libraryLayout\": \"PAIRED\",\n\"librarySelection\": \"RANDOM\",\n\"librarySource\": {\n\"id\": \"GENEPIO:0001966\",\n\"label\": \"genomic source\"\n},\n\"libraryStrategy\": \"WGS\",\n\"platform\": \"Illumina\",\n\"platformModel\": {\n\"id\": \"OBI:0002048\",\n\"label\": \"Illumina HiSeq 3000\"\n},\n\"runDate\": \"2021-10-18\"\n}\n
"},{"location":"schemas-md/obj/Age/","title":"Age","text":"Term Description Type Properties Example Enum Age Age value definition. Provenance: GA4GH Phenopackets v2 Age
object iso8601duration NA NA"},{"location":"schemas-md/obj/AgeRange/","title":"AgeRange","text":"Term Description Type Properties Example Enum AgeRange Age range definition. Provenance: GA4GH Phenopackets v2 AgeRange
object end, start NA NA"},{"location":"schemas-md/obj/Allele/","title":"Allele","text":"Term Description Type Properties Example Enum Allele The state of a molecule at a Location. object _id, location, state, type NA NA"},{"location":"schemas-md/obj/Complex%20Value/","title":"Complex Value","text":"Term Description Type Properties Example Enum Complex Value Definition of a complex value class. Provenance: GA4GH Phenopackets v2 TypedQuantity
object typedQuantities NA NA"},{"location":"schemas-md/obj/CopyNumber/","title":"CopyNumber","text":"Term Description Type Properties Example Enum CopyNumber NA allOf VRS definition for CopyNumber NA NA"},{"location":"schemas-md/obj/GestationalAge/","title":"GestationalAge","text":"Term Description Type Properties Example Enum GestationalAge Gestational age (or menstrual age) is the time elapsed between the first day of the last normal menstrual period and the day of delivery. The first day of the last menstrual period occurs approximately 2 weeks before ovulation and approximately 3 weeks before implantation of the blastocyst. Because most women know when their last period began but not when ovulation occurred, this definition traditionally has been used when estimating the expected date of delivery. In contrast, chronological age (or postnatal age) is the time elapsed after birth. Provenance: Phenopackets v2 object days, weeks NA NA"},{"location":"schemas-md/obj/Haplotype/","title":"Haplotype","text":"Term Description Type Properties Example Enum Haplotype A set of non-overlapping Allele members that co-occur on the same molecule. object _id, members, type NA NA"},{"location":"schemas-md/obj/LegacyVariation/","title":"LegacyVariation","text":"Term Description Type Properties Example Enum LegacyVariation NA object alternateBases, location, referenceBases, variantType NA NA"},{"location":"schemas-md/obj/MolecularVariation/","title":"MolecularVariation","text":"Term Description Type Properties Example Enum MolecularVariation NA oneOf Allele, Haplotype NA NA"},{"location":"schemas-md/obj/SystemicVariation/","title":"SystemicVariation","text":"Term Description Type Properties Example Enum SystemicVariation NA oneOf CopyNumber NA NA"},{"location":"schemas-md/obj/TimeInterval/","title":"TimeInterval","text":"Term Description Type Properties Example Enum TimeInterval Time interval with start and end defined as ISO8601 time stamps. object end, start NA NA"},{"location":"schemas-md/obj/Value/","title":"Value","text":"Term Description Type Properties Example Enum Value NA oneOf Quantity, ontologyTerm NA NA"},{"location":"schemas-md/obj/affected/","title":"Affected","text":"Term Description Type Properties Example Enum affected Is the individual affected by the disease in the pedigree? boolean NA NA NA"},{"location":"schemas-md/obj/ageAtExposure/","title":"ageAtExposure","text":"Term Description Type Properties Example Enum ageAtExposure Age value definition. Provenance: GA4GH Phenopackets v2 Age
object iso8601duration NA NA"},{"location":"schemas-md/obj/ageAtOnset/","title":"ageAtOnset","text":"Term Description Type Properties Example Enum ageAtOnset Age value definition. Provenance: GA4GH Phenopackets v2 Age
object iso8601duration NA NA"},{"location":"schemas-md/obj/ageAtProcedure/","title":"ageAtProcedure","text":"Term Description Type Properties Example Enum ageAtProcedure NA oneOf Age, AgeRange, GestationalAge, TimeInterval NA NA"},{"location":"schemas-md/obj/ageOfOnset/","title":"ageOfOnset","text":"Term Description Type Properties Example Enum ageOfOnset NA oneOf Age, AgeRange, GestationalAge, TimeInterval NA NA"},{"location":"schemas-md/obj/ageRange/","title":"ageRange","text":"Term Description Type Properties Example Enum ageRange Individual age range in cohort inclusion criteria object end, start NA NA"},{"location":"schemas-md/obj/aligner/","title":"Aligner","text":"Term Description Type Properties Example Enum aligner Reference to mapping/alignment software string NA bwa-0.7.8 NA"},{"location":"schemas-md/obj/alleleFrequency/","title":"alleleFrequency","text":"Term Description Type Properties Example Enum alleleFrequency Allele frequency between 0 and 1. number NA 3.186e-05 NA"},{"location":"schemas-md/obj/alleleOrigin/","title":"alleleOrigin","text":"Term Description Type Properties Example Enum alleleOrigin Ontology value for allele origin of variant in sample from the Variant Origin (SO:0001762). Categories are somatic variant
, germline variant
, maternal variant
, paternal variant
, de novo variant
, pedigree specific variant
, population specific variant
. Corresponds to Variant Inheritance in FHIR. object id, label [{\"id\": \"SO:0001777\", \"label\": \"somatic variant\"}, {\"id\": \"SO:0001778\", \"label\": \"germline variant\"}, {\"id\": \"SO:0001775\", \"label\": \"maternal variant\"}, {\"id\": \"SO:0001776\", \"label\": \"paternal variant\"}, {\"id\": \"SO:0001781\", \"label\": \"de novo variant\"}, {\"id\": \"SO:0001779\", \"label\": \"pedigree specific variant\"}, {\"id\": \"SO:0001780\", \"label\": \"population specific variant\"}]
NA"},{"location":"schemas-md/obj/alternateBases/","title":"alternateBases","text":"Term Description Type Properties Example Enum alternateBases Alternate bases for this variant (starting from start
). * Accepted values: IUPAC codes for nucleotides (e.g. https://www.bioinformatics.org/sms/iupac.html
). * N is a wildcard, that denotes the position of any base, and can beused as a standalone base of any type or within a partially knownsequence. As example, a query of ANNT
the Ns can take take any form of[ACGT] and will match ANNT
, ACNT
, ACCT
, ACGT
... and so forth. an empty value is used in the case of deletions with the maximally trimmed, deleted sequence being indicated in ReferenceBases
Categorical variant queries, e.g. such not being represented through sequence & position, make use of the variantType
parameter.* Either alternateBases
or variantType
is required.' string NA T, G, N, AG, NA"},{"location":"schemas-md/obj/aminoacidChanges/","title":"aminoacidChanges","text":"Term Description Type Properties Example Enum aminoacidChanges NA array NA [\"V304*\"]
NA"},{"location":"schemas-md/obj/analysisDate/","title":"analysisDate","text":"Term Description Type Properties Example Enum analysisDate Date at which analysis was performed. string NA 2021-10-17 NA"},{"location":"schemas-md/obj/analysisId/","title":"analysisId","text":"Term Description Type Properties Example Enum analysisId Reference to the bioinformatics analysis ID (analysis.id
) string NA pgxcs-kftvldsu NA"},{"location":"schemas-md/obj/annotatedWith/","title":"annotatedWith","text":"Term Description Type Properties Example Enum annotatedWith NA object toolName, toolReferences, version NA NA"},{"location":"schemas-md/obj/assayCode/","title":"assayCode","text":"Term Description Type Properties Example Enum assayCode Definition of an ontology term. object id, label id, label NA"},{"location":"schemas-md/obj/availability/","title":"Availability","text":"Term Description Type Properties Example Enum availability data availability boolean NA NA NA"},{"location":"schemas-md/obj/availabilityCount/","title":"availabilityCount","text":"Term Description Type Properties Example Enum availabilityCount Count of individuals with data available integer NA NA NA"},{"location":"schemas-md/obj/biosampleId/","title":"biosampleId","text":"Term Description Type Properties Example Enum biosampleId Reference to the biosample ID. string NA 008dafdd-a3d1-4801-8c0a-8714e2b58e48 NA"},{"location":"schemas-md/obj/biosampleStatus/","title":"biosampleStatus","text":"Term Description Type Properties Example Enum biosampleStatus Ontology value from Experimental Factor Ontology (EFO) Material Entity term (BFO:0000040). Classification of the sample in abnormal sample (EFO:0009655) or reference sample (EFO:0009654). object id, label [{\"id\": \"EFO:0009654\", \"label\": \"reference sample\"}, {\"id\": \"EFO:0009655\", \"label\": \"abnormal sample\"}, {\"id\": \"EFO:0009656\", \"label\": \"neoplastic sample\"}, {\"id\": \"EFO:0010941\", \"label\": \"metastasis sample\"}, {\"id\": \"EFO:0010942\", \"label\": \"primary tumor sample\"}, {\"id\": \"EFO:0010943\", \"label\": \"recurrent tumor sample\"}]
NA"},{"location":"schemas-md/obj/bodySite/","title":"bodySite","text":"Term Description Type Properties Example Enum bodySite Definition of an ontology term. object id, label [{\"id\": \"UBERON:0003403\", \"label\": \"Skin of forearm\"}, {\"id\": \"UBERON:0003214\", \"label\": \"mammary gland alveolus\"}]
NA"},{"location":"schemas-md/obj/caseLevelData/","title":"caseLevelData","text":"Term Description Type Properties Example Enum caseLevelData array alleleOrigin, analysisId, biosampleId, clinicalInterpretations, id, individualId, phenotypicEffects, runId, zygosity NA NA"},{"location":"schemas-md/obj/category/","title":"Category","text":"Term Description Type Properties Example Enum category Ontology term for the type of disease, condition, phenotypic measurement, etc. object id, label [{\"id\": \"MONDO:0000001\", \"label\": \"disease or disorder\"}, {\"id\": \"HP:0000118\", \"label\": \"phenotypic abnormality\"}]
NA"},{"location":"schemas-md/obj/clinicalInterpretations/","title":"clinicalInterpretations","text":"Term Description Type Properties Example Enum clinicalInterpretations List of annotated effects on disease or phenotypes. array annotatedWith, category, clinicalRelevance, conditionId, effect, evidenceType NA NA"},{"location":"schemas-md/obj/clinicalRelevance/","title":"clinicalRelevance","text":"Term Description Type Properties Example Enum clinicalRelevance Indication of the clinical relevance of the variant Recommended: A value from the five-tiered classification from the American College of Medical Genetics (ACMG) designed to describe the likelihood that a genomic sequence variant is causative of an inherited disease. (NCIT:C168798). string NA pathogenic benign, likely benign, uncertain significance, likely pathogenic, pathogenic"},{"location":"schemas-md/obj/clinvarVariantId/","title":"clinvarVariantId","text":"Term Description Type Properties Example Enum clinvarVariantId ClinVar variant id. Other id values used by ClinVar can be added to variantAlternativeIds
string NA clinvar:12345, 9325 NA"},{"location":"schemas-md/obj/cohortDataTypes/","title":"cohortDataTypes","text":"Term Description Type Properties Example Enum cohortDataTypes NA array id, label [{\"id\": \"OGMS:0000015\", \"label\": \"clinical history\"}, {\"id\": \"OBI:0000070\", \"label\": \"genotyping assay\"}, {\"id\": \"OMIABIS:0000060\", \"label\": \"survey data\"}]
NA"},{"location":"schemas-md/obj/cohortDesign/","title":"cohortDesign","text":"Term Description Type Properties Example Enum cohortDesign Cohort type by its design. A plan specification comprised of protocols (which may specify how and what kinds of data will be gathered) that are executed as part of an investigation and is realized during a study design execution. Value from Ontologized MIABIS (OMIABIS) Study design ontology term tree (OBI:0500000). object id, label [{\"id\": \"OMIABIS:0001017\", \"label\": \"case control study design\"}, {\"id\": \"OMIABIS:0001019\", \"label\": \"longitudinal study design\"}, {\"id\": \"OMIABIS:0001024\", \"label\": \"twin study design\"}]
NA"},{"location":"schemas-md/obj/cohortSize/","title":"cohortSize","text":"Term Description Type Properties Example Enum cohortSize Count of unique Individuals in cohort (individuals meeting criteria for user-defined
cohorts). If not previously known, it could be calculated by counting the individuals in the cohort. integer NA 14765, 20000 NA"},{"location":"schemas-md/obj/cohortType/","title":"cohortType","text":"Term Description Type Properties Example Enum cohortType Cohort type by its definition. If a cohort is declared study-defined
or beacon-defined
criteria are to be entered in cohort_inclusion_criteria
; if a cohort is declared user-defined
cohort_inclusion_criteria
could be automatically populated from the parameters used to perform the query. string NA NA study-defined, beacon-defined, user-defined"},{"location":"schemas-md/obj/collectionDate/","title":"collectionDate","text":"Term Description Type Properties Example Enum collectionDate Date of biosample collection in ISO8601 format. string NA 2021-04-23 NA"},{"location":"schemas-md/obj/collectionEvents/","title":"collectionEvents","text":"Term Description Type Properties Example Enum collectionEvents TBD array eventAgeRange, eventCases, eventControls, eventDataTypes, eventDate, eventDiseases, eventEthnicities, eventGenders, eventLocations, eventNum, eventPhenotypes, eventSize, eventTimeline NA NA"},{"location":"schemas-md/obj/collectionMoment/","title":"collectionMoment","text":"Term Description Type Properties Example Enum collectionMoment Individual's or cell cullture age at the time of sample collection in the ISO8601 duration format P[n]Y[n]M[n]DT[n]H[n]M[n]S
. string NA P32Y6M1D, P7D NA"},{"location":"schemas-md/obj/conditionId/","title":"conditionId","text":"Term Description Type Properties Example Enum conditionId Internal identifier of the phenotype or clinical effect. string NA disease1, phen2234 NA"},{"location":"schemas-md/obj/createDateTime/","title":"createDateTime","text":"Term Description Type Properties Example Enum createDateTime The time the dataset was created (ISO 8601 format) string NA 2017-01-17T20:33:40Z NA"},{"location":"schemas-md/obj/cumulativeDose/","title":"cumulativeDose","text":"Term Description Type Properties Example Enum cumulativeDose Definition of a quantity class. Provenance: GA4GH Phenopackets v2 Quantity
object referenceRange, unit, value NA NA"},{"location":"schemas-md/obj/dataUseConditions/","title":"dataUseConditions","text":"Term Description Type Properties Example Enum dataUseConditions Data use conditions applying to this dataset. object duoDataUse NA NA"},{"location":"schemas-md/obj/date/","title":"Date","text":"Term Description Type Properties Example Enum date Date of the exposure in ISO8601 format. string NA NA NA"},{"location":"schemas-md/obj/dateOfProcedure/","title":"dateOfProcedure","text":"Term Description Type Properties Example Enum dateOfProcedure Date of procedure, in ISO8601 format string NA 2010-07-10 NA"},{"location":"schemas-md/obj/description/","title":"Description","text":"Term Description Type Properties Example Enum description Description of the dataset string NA This dataset provides examples of the actual data in this Beacon instance. NA"},{"location":"schemas-md/obj/diagnosticMarkers/","title":"diagnosticMarkers","text":"Term Description Type Properties Example Enum diagnosticMarkers NA array id, label NA NA"},{"location":"schemas-md/obj/disease/","title":"Disease","text":"Term Description Type Properties Example Enum disease Diseases diagnosed e.g. to an individual, defined by diseaseID, age of onset, stage, level of severity, outcome and the presence of family history. Similarities to GA4GH Phenopackets v2 Disease
object ageOfOnset, diseaseCode, familyHistory, notes, severity, stage NA NA"},{"location":"schemas-md/obj/diseaseCode/","title":"diseaseCode","text":"Term Description Type Properties Example Enum diseaseCode Definition of an ontology term. object id, label [{\"id\": \"HP:0004789\", \"label\": \"lactose intolerance\"}, {\"id\": \"ICD10CM:E73\", \"label\": \"lactose intolerance\"}, {\"id\": \"OMIM:164400\", \"label\": \"Spinocerebellar ataxia 1\"}]
NA"},{"location":"schemas-md/obj/diseaseConditions/","title":"diseaseConditions","text":"Term Description Type Properties Example Enum diseaseConditions Diseases diagnosed e.g. to an individual, defined by diseaseID, age of onset, stage, level of severity, outcome and the presence of family history. Similarities to GA4GH Phenopackets v2 Disease
array ageOfOnset, diseaseCode, familyHistory, notes, severity, stage NA NA"},{"location":"schemas-md/obj/diseases/","title":"Diseases","text":"Term Description Type Properties Example Enum diseases Diseases diagnosed e.g. to an individual, defined by diseaseID, age of onset, stage, level of severity, outcome and the presence of family history. Similarities to GA4GH Phenopackets v2 Disease
array ageOfOnset, diseaseCode, familyHistory, notes, severity, stage NA NA"},{"location":"schemas-md/obj/distribution/","title":"Distribution","text":"Term Description Type Properties Example Enum distribution List of categories and results or counts for each category. object [{\"genders\": {\"female\": \"51\", \"male\": \"50\"}}]
NA"},{"location":"schemas-md/obj/doseIntervals/","title":"doseIntervals","text":"Term Description Type Properties Example Enum doseIntervals This element represents a block of time in which the dosage of a medication was constant. For example, to represent a period of 30 mg twice a day for an interval of 10 days, we would use a Quantity element to represent the individual 30 mg dose, and OntologyClass element to represent twice a day, and an Interval element to represent the 10-day interval. Provenance: Phenopackets v2 array interval, quantity, scheduleFrequency NA NA"},{"location":"schemas-md/obj/duoDataUse/","title":"duoDataUse","text":"Term Description Type Properties Example Enum duoDataUse Definition of an ontology term. array id, label, modifiers, version [{\"id\": \"DUO:0000007\", \"label\": \"disease specific research\", \"version\": \"17-07-2016\"}]
NA"},{"location":"schemas-md/obj/duration/","title":"Duration","text":"Term Description Type Properties Example Enum duration Exposure duration in ISO8601 format string NA P2Y6M1D NA"},{"location":"schemas-md/obj/effect/","title":"Effect","text":"Term Description Type Properties Example Enum effect Ontology term for the phenotypic or clinical effect object id, label [{\"id\": \"MONDO:0003582\", \"label\": \"hereditary breast ovarian cancer syndrome\"}, {\"id\": \"HP:0000256\", \"label\": \"macrocephaly\"}]
NA"},{"location":"schemas-md/obj/end/","title":"End","text":"Term Description Type Properties Example Enum end Represents age as an ISO8601 duration (e.g., P59Y). object iso8601duration NA NA"},{"location":"schemas-md/obj/ethnicities/","title":"Ethnicities","text":"Term Description Type Properties Example Enum ethnicities Ethnic background of the individual. Recommended is the use of a value from NCIT Race (NCIT:C17049) ontology term descendants, e.g. NCIT:C126531 (Latin American). A geographic ancestral origin category that is assigned to a population group based mainly on physical characteristics that are thought to be distinct and inherent. [ NCI ] array id, label NA NA"},{"location":"schemas-md/obj/ethnicity/","title":"Ethnicity","text":"Term Description Type Properties Example Enum ethnicity Ethnic background of the individual. Value from NCIT Race (NCIT:C17049) ontology term descendants, e.g. NCIT:C126531 (Latin American). A geographic ancestral origin category that is assigned to a population group based mainly on physical characteristics that are thought to be distinct and inherent. [ NCI ] object id, label [{\"id\": \"NCIT:C42331\", \"label\": \"African\"}, {\"id\": \"NCIT:C41260\", \"label\": \"Asian\"}, {\"id\": \"NCIT:C126535\", \"label\": \"Australian\"}, {\"id\": \"NCIT:C43851\", \"label\": \"European\"}, {\"id\": \"NCIT:C77812\", \"label\": \"North American\"}, {\"id\": \"NCIT:C126531\", \"label\": \"Latin American\"}, {\"id\": \"NCIT:C104495\", \"label\": \"Other race\"}]
NA"},{"location":"schemas-md/obj/eventAgeRange/","title":"eventAgeRange","text":"Term Description Type Properties Example Enum eventAgeRange Individual age range, obtained from individual level info of the cohort members object availability, availabilityCount, distribution NA NA"},{"location":"schemas-md/obj/eventCases/","title":"eventCases","text":"Term Description Type Properties Example Enum eventCases number of cases integer NA 543, 20 NA"},{"location":"schemas-md/obj/eventControls/","title":"eventControls","text":"Term Description Type Properties Example Enum eventControls number of controls integer NA 1000, 22 NA"},{"location":"schemas-md/obj/eventDataTypes/","title":"eventDataTypes","text":"Term Description Type Properties Example Enum eventDataTypes Aggregated data type information available for each cohort data type as declared in cohortDataTypes
, and obtained from individual level info of the cohort members object availability, availabilityCount, distribution NA NA"},{"location":"schemas-md/obj/eventDate/","title":"eventDate","text":"Term Description Type Properties Example Enum eventDate date of collection event/data point string NA 2018-10-01T13:23:45Z, 2019-04-23T09:11:13Z, 2017-01-17T20:33:40Z NA"},{"location":"schemas-md/obj/eventDiseases/","title":"eventDiseases","text":"Term Description Type Properties Example Enum eventDiseases Aggregated information of disease/condition(s) obtained from individual level info of the cohort members object availability, availabilityCount, distribution NA NA"},{"location":"schemas-md/obj/eventEthnicities/","title":"eventEthnicities","text":"Term Description Type Properties Example Enum eventEthnicities Aggregated information of ethnicity obtained from individual level info of the cohort members object availability, availabilityCount, distribution NA NA"},{"location":"schemas-md/obj/eventGenders/","title":"eventGenders","text":"Term Description Type Properties Example Enum eventGenders Aggregated information of gender(s) obtained from individual level info of the cohort members object availability, availabilityCount, distribution NA NA"},{"location":"schemas-md/obj/eventLocations/","title":"eventLocations","text":"Term Description Type Properties Example Enum eventLocations Aggregated information of geographic location obtained from individual level info of the cohort members object availability, availabilityCount, distribution NA NA"},{"location":"schemas-md/obj/eventNum/","title":"eventNum","text":"Term Description Type Properties Example Enum eventNum cardinality of the collection event / data point in a series integer NA 1, 2, 3, 4 NA"},{"location":"schemas-md/obj/eventPhenotypes/","title":"eventPhenotypes","text":"Term Description Type Properties Example Enum eventPhenotypes Aggregated information of phenotype(s) obtained from individual level info of the cohort members object availability, availabilityCount, distribution NA NA"},{"location":"schemas-md/obj/eventSize/","title":"eventSize","text":"Term Description Type Properties Example Enum eventSize Count of individuals in cohort at data point (for \u00b4user-defined\u00b4 cohorts, this is individuals meeting criteria) obtained from individual level info in database. integer NA 1543, 42 NA"},{"location":"schemas-md/obj/eventTimeline/","title":"eventTimeline","text":"Term Description Type Properties Example Enum eventTimeline Aggregated information of dates of visit diagnostic inclusion in study obtained from individual level info of the cohort members. object end, start"},{"location":"schemas-md/obj/evidence/","title":"Evidence","text":"Term Description Type Properties Example Enum evidence The evidence for an assertion of the observation of a type. RECOMMENDED. object evidenceCode, reference NA NA"},{"location":"schemas-md/obj/evidenceCode/","title":"evidenceCode","text":"Term Description Type Properties Example Enum evidenceCode Definition of an ontology term. object id, label id, label NA"},{"location":"schemas-md/obj/evidenceType/","title":"evidenceType","text":"Term Description Type Properties Example Enum evidenceType Ontology term for the type of evidence supporting variant-disease association Recommended: values from the Evidence & Conclusion Ontology (ECO) object id, label [{\"id\": \"ECO:0000361\", \"label\": \"inferential evidence\"}, {\"id\": \"ECO:0000006\", \"label\": \"experimental evidence\"}]
NA"},{"location":"schemas-md/obj/excluded/","title":"Excluded","text":"Term Description Type Properties Example Enum excluded Flag to indicate whether the phenotypic feature was observed or not. Default is \u2018false\u2019, in other words the phenotype was observed. Therefore it is only used in cases where the phenotype was looked for but found to be absent. More formally, this modifier indicates the logical negation of the OntologyClass used in the featureType
field. CAUTION: It is imperative to check this field for correct interpretation of the phenotype! Source: Phenopackets v2 boolean NA NA NA"},{"location":"schemas-md/obj/exclusionCriteria/","title":"exclusionCriteria","text":"Term Description Type Properties Example Enum exclusionCriteria Exclusion criteria used for defining the cohort. It is assumed that NONE of the cohort participants will match such criteria. object ageRange, diseaseConditions, ethnicities, genders, locations, phenotypicConditions NA NA"},{"location":"schemas-md/obj/exposureCode/","title":"exposureCode","text":"Term Description Type Properties Example Enum exposureCode Definition of an ontology term. object id, label [{\"id\": \"CHEBI:46661\", \"label\": \"asbestos\"}, {\"id\": \"ENVO:21001217\", \"label\": \"X-ray radiation\"}]
NA"},{"location":"schemas-md/obj/exposures/","title":"Exposures","text":"Term Description Type Properties Example Enum exposures Exposures (lifestyle, behavioural exposures) occurred to individual, defined by exposure ID, date and age of onset, dose, and duration. array ageAtExposure, date, duration, exposureCode, unit, value NA NA"},{"location":"schemas-md/obj/externalUrl/","title":"externalUrl","text":"Term Description Type Properties Example Enum externalUrl URL to an external system providing more dataset information (RFC 3986 format). string NA example.org/wiki/Main_Page NA"},{"location":"schemas-md/obj/familyHistory/","title":"familyHistory","text":"Term Description Type Properties Example Enum familyHistory Boolean indicating determined or self-reported presence of family history of the disease. boolean NA 1 NA"},{"location":"schemas-md/obj/featureClass/","title":"featureClass","text":"Term Description Type Properties Example Enum featureClass Ontology term that describes the class of genomic feature affected by the variant. Values from SO (Sequence ontology) are recommended, e.g. SO:0001623: 5 prime UTR variant
object id, label [{\"id\": \"SO:0001623\", \"label\": \"5 prime UTR variant\"}]
NA"},{"location":"schemas-md/obj/featureID/","title":"featureID","text":"Term Description Type Properties Example Enum featureID Where applicable, ID/accession/name of genomic feature related to the featureClass
, preferably in CURIE format. If the value is a gene id or name, it points to the gene related to the featureClass
, e.g. the 5 prime UTR upstream of TP53
object id, label [{\"id\": \"HGNC:11998\", \"label\": \"TP53\"}]
NA"},{"location":"schemas-md/obj/featureType/","title":"featureType","text":"Term Description Type Properties Example Enum featureType Definition of an ontology term. object id, label [{\"id\": \"HP:0000002\", \"label\": \"Abnormality of body height\"}, {\"id\": \"HP:0002006\", \"label\": \"Facial cleft\"}, {\"id\": \"HP:0012469\", \"label\": \"Infantile spasms\"}]
NA"},{"location":"schemas-md/obj/frequencies/","title":"Frequencies","text":"Term Description Type Properties Example Enum frequencies NA array alleleFrequency, population NA NA"},{"location":"schemas-md/obj/frequencyInPopulations/","title":"frequencyInPopulations","text":"Term Description Type Properties Example Enum frequencyInPopulations NA array frequencies, source, sourceReference, version NA NA"},{"location":"schemas-md/obj/genders/","title":"Genders","text":"Term Description Type Properties Example Enum genders Sex of the individual. Recommended values from NCIT General Qualifier (NCIT:C27993): \"unknown\" (not assessed or not available) - NCIT:C17998; \"female\" - NCIT:C16576; \"male\" - NCIT:C20197 array id, label NA NA"},{"location":"schemas-md/obj/geneIds/","title":"geneIds","text":"Term Description Type Properties Example Enum geneIds NA array NA [\"ACE2\"]
,[\"BRCA1\"]
NA"},{"location":"schemas-md/obj/genomicFeatures/","title":"genomicFeatures","text":"Term Description Type Properties Example Enum genomicFeatures Genomic feature(s) related to the variant. NOTE: Although genes could also be referenced using these attributes, they have an independent section to allow direct queries. array featureClass, featureID NA NA"},{"location":"schemas-md/obj/genomicHGVSId/","title":"genomicHGVSId","text":"Term Description Type Properties Example Enum genomicHGVSId HGVSId descriptor. string NA NC_000017.11:g.43057063G>A NA"},{"location":"schemas-md/obj/geographicOrigin/","title":"geographicOrigin","text":"Term Description Type Properties Example Enum geographicOrigin Individual's country or region of origin (birthplace or residence place regardless of ethnic origin). Value from GAZ Geographic Location ontology (GAZ:00000448), e.g. GAZ:00002459 (United States of America). object id, label [{\"id\": \"GAZ:00002955\", \"label\": \"Slovenia\"}, {\"id\": \"GAZ:00002459\", \"label\": \"United States of America\"}, {\"id\": \"GAZ:00316959\", \"label\": \"Municipality of El Masnou\"}, {\"id\": \"GAZ:00000460\", \"label\": \"Eurasia\"}]
NA"},{"location":"schemas-md/obj/histologicalDiagnosis/","title":"histologicalDiagnosis","text":"Term Description Type Properties Example Enum histologicalDiagnosis Disease diagnosis that was inferred from the histological examination. RECOMMENDED. object id, label [{\"id\": \"NCIT:C3778\", \"label\": \"Serous Cystadenocarcinoma\"}]
NA"},{"location":"schemas-md/obj/id/","title":"Id","text":"Term Description Type Properties Example Enum id Run ID. string NA SRR10903401 NA"},{"location":"schemas-md/obj/identifiers/","title":"Identifiers","text":"Term Description Type Properties Example Enum identifiers NA object clinvarVariantId, genomicHGVSId, proteinHGVSIds, transcriptHGVSIds, variantAlternativeIds NA NA"},{"location":"schemas-md/obj/inclusionCriteria/","title":"inclusionCriteria","text":"Term Description Type Properties Example Enum inclusionCriteria Inclusion criteria used for defining the cohort. It is assumed that all cohort participants will match such criteria. object ageRange, diseaseConditions, ethnicities, genders, locations, phenotypicConditions NA NA"},{"location":"schemas-md/obj/individualId/","title":"individualId","text":"Term Description Type Properties Example Enum individualId Reference to the individual ID. string NA TCGA-AO-A0JJ NA"},{"location":"schemas-md/obj/info/","title":"Info","text":"Term Description Type Properties Example Enum info Placeholder to allow the Beacon to return any additional information that is necessary or could be of interest in relation to the query or the entry returned. It is recommended to encapsulate additional informations in this attribute instead of directly adding attributes at the same level than the others in order to avoid collision in the names of attributes in future versions of the specification. object NA NA NA"},{"location":"schemas-md/obj/interval/","title":"Interval","text":"Term Description Type Properties Example Enum interval Time interval with start and end defined as ISO8601 time stamps. object end, start [{\"end\": \"1967-11-18T12:00:00+01\", \"start\": \"1967-11-11T07:30:00+01\"}]
NA"},{"location":"schemas-md/obj/interventionsOrProcedures/","title":"interventionsOrProcedures","text":"Term Description Type Properties Example Enum interventionsOrProcedures Class describing a clinical procedure or intervention. Provenance: GA4GH Phenopackets v2 Procedure
array ageAtProcedure, bodySite, dateOfProcedure, procedureCode NA NA"},{"location":"schemas-md/obj/iso8601duration/","title":"Iso8601duration","text":"Term Description Type Properties Example Enum iso8601duration Represents age as a ISO8601 duration (e.g., P40Y10M05D). string NA P32Y6M1D NA"},{"location":"schemas-md/obj/karyotypicSex/","title":"karyotypicSex","text":"Term Description Type Properties Example Enum karyotypicSex The chromosomal sex of an individual represented from a selection of options. string NA NA UNKNOWN_KARYOTYPE, XX, XY, XO, XXY, XXX, XXYY, XXXY, XXXX, XYY, OTHER_KARYOTYPE"},{"location":"schemas-md/obj/label/","title":"Label","text":"Term Description Type Properties Example Enum label The text that describes the term. By default it could be the preferred text of the term, but is it acceptable to customize it for a clearer description and understanding of the term in an specific context. string NA NA NA"},{"location":"schemas-md/obj/libraryLayout/","title":"libraryLayout","text":"Term Description Type Properties Example Enum libraryLayout Ontology value for the library layout e.g \"PAIRED\", \"SINGLE\" #todo add Ontology name? string NA NA PAIRED, SINGLE"},{"location":"schemas-md/obj/librarySelection/","title":"librarySelection","text":"Term Description Type Properties Example Enum librarySelection Selection method for library preparation, e.g \"RANDOM\", \"RT-PCR\" string NA RANDOM, RT-PCR NA"},{"location":"schemas-md/obj/librarySource/","title":"librarySource","text":"Term Description Type Properties Example Enum librarySource Ontology value for the source of the sequencing or hybridization library, e.g \"genomic source\", \"transcriptomic source\" object id, label [{\"id\": \"GENEPIO:0001966\", \"label\": \"genomic source\"}, {\"id\": \"GENEPIO:0001965\", \"label\": \"metagenomic source\"}]
NA"},{"location":"schemas-md/obj/libraryStrategy/","title":"libraryStrategy","text":"Term Description Type Properties Example Enum libraryStrategy Library strategy, e.g. \"WGS\" string NA WGS NA"},{"location":"schemas-md/obj/location/","title":"Location","text":"Term Description Type Properties Example Enum location NA oneOf CURIE, Location NA NA"},{"location":"schemas-md/obj/locations/","title":"Locations","text":"Term Description Type Properties Example Enum locations Country or region of origin of the individual (birthplace or residence place regardless of ethnic origin). Value from GAZ Geographic Location ontology (GAZ:00000448), e.g. GAZ:00002459 (United States of America). array id, label NA NA"},{"location":"schemas-md/obj/measurementValue/","title":"measurementValue","text":"Term Description Type Properties Example Enum measurementValue NA oneOf Complex Value, Value NA NA"},{"location":"schemas-md/obj/measurements/","title":"Measurements","text":"Term Description Type Properties Example Enum measurements Definition of a measurement class. Provenance: GA4GH Phenopackets v2 Measurement
array assayCode, date, measurementValue, notes, observationMoment, procedure NA NA"},{"location":"schemas-md/obj/measures/","title":"Measures","text":"Term Description Type Properties Example Enum measures Definition of a measurement class. Provenance: GA4GH Phenopackets v2 Measurement
array assayCode, date, measurementValue, notes, observationMoment, procedure NA NA"},{"location":"schemas-md/obj/memberId/","title":"memberId","text":"Term Description Type Properties Example Enum memberId Identifier of the individual. The individual could be part of the same Beacon datasets or not, in which case the information here is meant to complete the pedigree. If the individual is also in the dataset use that Individual ID. If it is not the in the dataset, use a non-collading ID, e.g. concatenating the Pedigree ID with a local ID, similarly to the example 'Pedigree1001-m1'. string NA Pedigree1001-m1, Ind0012122 NA"},{"location":"schemas-md/obj/members/","title":"Members","text":"Term Description Type Properties Example Enum members NA array affected, memberId, role NA NA"},{"location":"schemas-md/obj/modifiers/","title":"Modifiers","text":"Term Description Type Properties Example Enum modifiers Definition of an ontology term. array id, label [{\"id\": \"HP:0032500\", \"label\": \"Exacerbated by tobacco use\"}, {\"id\": \"HP:4000053\", \"label\": \"Displaced fracture\"}]
NA"},{"location":"schemas-md/obj/molecularAttributes/","title":"molecularAttributes","text":"Term Description Type Properties Example Enum molecularAttributes NA object aminoacidChanges, geneIds, genomicFeatures, molecularEffects NA NA"},{"location":"schemas-md/obj/molecularEffects/","title":"molecularEffects","text":"Term Description Type Properties Example Enum molecularEffects NA array id, label [{\"id\": \"SO:0002322\", \"label\": \"stop gained NMD escaping\"}, {\"id\": \"SO:0001583\", \"label\": \"missense variant\"}]
NA"},{"location":"schemas-md/obj/name/","title":"Name","text":"Term Description Type Properties Example Enum name Name of the dataset string NA Dataset with synthetic data NA"},{"location":"schemas-md/obj/notes/","title":"Notes","text":"Term Description Type Properties Example Enum notes Unstructured text to describe additional properties of this disease instance. string NA Some free text NA"},{"location":"schemas-md/obj/numSubjects/","title":"numSubjects","text":"Term Description Type Properties Example Enum numSubjects Total number of subjects in pedigree. integer NA 10 NA"},{"location":"schemas-md/obj/observationMoment/","title":"observationMoment","text":"Term Description Type Properties Example Enum observationMoment NA oneOf Age, AgeRange, GestationalAge, TimeInterval NA NA"},{"location":"schemas-md/obj/obtentionProcedure/","title":"obtentionProcedure","text":"Term Description Type Properties Example Enum obtentionProcedure Ontology value from NCIT Intervention or Procedure ontology term (NCIT:C25218) describing the procedure for sample obtention, e.g. NCIT:C15189 (biopsy). object ageAtProcedure, bodySite, dateOfProcedure, procedureCode [{\"code\": {\"id\": \"NCIT:C15189\", \"label\": \"biopsy\"}}, {\"code\": {\"id\": \"NCIT:C157179\", \"label\": \"FGFR1 Mutation Analysis\"}}]
NA"},{"location":"schemas-md/obj/onset/","title":"Onset","text":"Term Description Type Properties Example Enum onset NA oneOf Age, AgeRange, GestationalAge, TimeInterval NA NA"},{"location":"schemas-md/obj/pathologicalStage/","title":"pathologicalStage","text":"Term Description Type Properties Example Enum pathologicalStage Pathological stage, if applicable, preferably as subclass of NCIT:C28108 - Disease Stage Qualifier. RECOMMENDED. object id, label [{\"id\": \"NCIT:C27977\", \"label\": \"Stage IIIA\"}]
NA"},{"location":"schemas-md/obj/pathologicalTnmFinding/","title":"pathologicalTnmFinding","text":"Term Description Type Properties Example Enum pathologicalTnmFinding NA array id, label [{\"id\": \"NCIT:C48725\", \"label\": \"T2a Stage Finding\"}, {\"id\": \"NCIT:C48709\", \"label\": \"N1c Stage Finding\"}, {\"id\": \"NCIT:C48699\", \"label\": \"M0 Stage Finding\"}]
NA"},{"location":"schemas-md/obj/pedigrees/","title":"Pedigrees","text":"Term Description Type Properties Example Enum pedigrees Pedigree studies in which the individual is part of. array disease, id, members, numSubjects NA NA"},{"location":"schemas-md/obj/phenotypicConditions/","title":"phenotypicConditions","text":"Term Description Type Properties Example Enum phenotypicConditions Used to describe a phenotype that characterizes the subject or biosample. array evidence, excluded, featureType, modifiers, notes, onset, resolution, severity NA NA"},{"location":"schemas-md/obj/phenotypicEffects/","title":"phenotypicEffects","text":"Term Description Type Properties Example Enum phenotypicEffects List of annotated effects on disease or phenotypes. array annotatedWith, category, clinicalRelevance, conditionId, effect, evidenceType NA NA"},{"location":"schemas-md/obj/phenotypicFeatures/","title":"phenotypicFeatures","text":"Term Description Type Properties Example Enum phenotypicFeatures Used to describe a phenotype that characterizes the subject or biosample. array evidence, excluded, featureType, modifiers, notes, onset, resolution, severity NA NA"},{"location":"schemas-md/obj/pipelineName/","title":"pipelineName","text":"Term Description Type Properties Example Enum pipelineName Analysis pipeline and version if a standardized pipeline was used string NA Pipeline-panel-0001-v1 NA"},{"location":"schemas-md/obj/pipelineRef/","title":"pipelineRef","text":"Term Description Type Properties Example Enum pipelineRef Link to Analysis pipeline resource string NA doi.org/10.48511/workflowhub.workflow.111.1 NA"},{"location":"schemas-md/obj/platform/","title":"Platform","text":"Term Description Type Properties Example Enum platform General platform technology label. It SHOULD be a subset of the platformModel and used only for query convenience, e.g. \"return everything sequenced with Illimuna\", where the specific model is not relevant string NA Illumina, Oxford Nanopore, Affymetrix NA"},{"location":"schemas-md/obj/platformModel/","title":"platformModel","text":"Term Description Type Properties Example Enum platformModel Ontology value for experimental platform or methodology used. For sequencing platforms the use of \"OBI:0400103 - DNA sequencer\" is suggested. object id, label [{\"id\": \"OBI:0002048\", \"label\": \"Illumina HiSeq 3000\"}, {\"id\": \"OBI:0002750\", \"label\": \"Oxford Nanopore MinION\"}, {\"id\": \"EFO:0010938\", \"label\": \"large-insert clone DNA microarray\"}]
NA"},{"location":"schemas-md/obj/population/","title":"Population","text":"Term Description Type Properties Example Enum population A name for the population. A population could an ethnic, geographical one or just the members
of a study. string NA East Asian, ICGC Chronic Lymphocytic Leukemia-ES, Men, Children NA"},{"location":"schemas-md/obj/procedure/","title":"Procedure","text":"Term Description Type Properties Example Enum procedure Class describing a clinical procedure or intervention. Provenance: GA4GH Phenopackets v2 Procedure
object ageAtProcedure, bodySite, dateOfProcedure, procedureCode code NA"},{"location":"schemas-md/obj/procedureCode/","title":"procedureCode","text":"Term Description Type Properties Example Enum procedureCode Definition of an ontology term. object id, label [{\"id\": \"MAXO:0001175\", \"label\": \"liver transplantation\"}, {\"id\": \"MAXO:0000136\", \"label\": \"high-resolution microendoscopy\"}, {\"id\": \"OBI:0002654\", \"label\": \"needle biopsy\"}]
NA"},{"location":"schemas-md/obj/proteinHGVSIds/","title":"proteinHGVSIds","text":"Term Description Type Properties Example Enum proteinHGVSIds NA array NA [\"NP_009225.1:p.Glu1817Ter\"]
,[\"LRG 199p1:p.Val25Gly (preferred)\"]
NA"},{"location":"schemas-md/obj/quantity/","title":"Quantity","text":"Term Description Type Properties Example Enum quantity Definition of a quantity class. Provenance: GA4GH Phenopackets v2 Quantity
object referenceRange, unit, value NA NA"},{"location":"schemas-md/obj/reference/","title":"Reference","text":"Term Description Type Properties Example Enum reference Representation of the source of the evidence object id, notes, reference id, label NA"},{"location":"schemas-md/obj/referenceBases/","title":"referenceBases","text":"Term Description Type Properties Example Enum referenceBases Reference bases for this variant (starting from start
). * Accepted values: IUPAC codes for nucleotides (e.g. https://www.bioinformatics.org/sms/iupac.html
). * N is a wildcard, that denotes the position of any base, and can be used as a standalone base of any type or within a partially known sequence. As example, a query of ANNT
the Ns can take take any form of [ACGT]
and will match ANNT
, ACNT
, ACCT
, ACGT
... and so forth. an empty value* is used in the case of insertions with the maximally trimmed, inserted sequence being indicated in AlternateBases
.NOTE: Beacon instances may not support UIPAC codes and it is not mandatory for them to do so. In such cases the use of [ACGTN] is mandated. string NA A, T, N, , ACG NA"},{"location":"schemas-md/obj/referenceRange/","title":"referenceRange","text":"Term Description Type Properties Example Enum referenceRange The normal range for the value object high, low, unit NA"},{"location":"schemas-md/obj/resolution/","title":"Resolution","text":"Term Description Type Properties Example Enum resolution NA oneOf Age, AgeRange, GestationalAge, TimeInterval NA NA"},{"location":"schemas-md/obj/role/","title":"Role","text":"Term Description Type Properties Example Enum role Definition of an ontology term. object id, label [{\"id\": \"NCIT:C64435\", \"label\": \"Proband\"}, {\"id\": \"NCIT:C96580\", \"label\": \"Biological Mother\"}, {\"id\": \"NCIT:C96572\", \"label\": \"Biological Father\"}, {\"id\": \"NCIT:C165848\", \"label\": \"Identical Twin Brother\"}]
NA"},{"location":"schemas-md/obj/routeOfAdministration/","title":"routeOfAdministration","text":"Term Description Type Properties Example Enum routeOfAdministration Definition of an ontology term. object id, label [{\"id\": \"NCIT:C38304\", \"label\": \"Topical\"}, {\"id\": \"NCIT:C78373\", \"label\": \"Dietary\"}]
NA"},{"location":"schemas-md/obj/runDate/","title":"runDate","text":"Term Description Type Properties Example Enum runDate Date at which the experiment was performed. string NA 2021-10-18 NA"},{"location":"schemas-md/obj/runId/","title":"runId","text":"Term Description Type Properties Example Enum runId Reference to the experimental run ID (run.id
) string NA SRR10903401 NA"},{"location":"schemas-md/obj/sampleOriginDetail/","title":"sampleOriginDetail","text":"Term Description Type Properties Example Enum sampleOriginDetail Tissue from which the sample was taken or sample origin matching the category set in 'sampleOriginType'. Value from Uber-anatomy ontology (UBERON) or BRENDA tissue / enzyme source (BTO), Ontology for Biomedical Investigations (OBI) or Cell Line Ontology (CLO), e.g. 'cerebellar vermis' (UBERON:0004720), 'HEK-293T cell' (BTO:0002181), 'nasopharyngeal swab specimen' (OBI:0002606), 'cerebrospinal fluid specimen' (OBI:0002502). object id, label [{\"id\": \"UBERON:0000474\", \"label\": \"female reproductive system\"}, {\"id\": \"BTO:0002181\", \"label\": \"HEK-293T cell\"}, {\"id\": \"OBI:0002606\", \"label\": \"nasopharyngeal swab specimen\"}]
NA"},{"location":"schemas-md/obj/sampleOriginType/","title":"sampleOriginType","text":"Term Description Type Properties Example Enum sampleOriginType Category of sample origin. Value from Ontology for Biomedical Investigations (OBI) material entity (BFO:0000040) ontology, e.g. 'specimen from organism' (OBI:0001479),'xenograft' (OBI:0100058), 'cell culture' (OBI:0001876) object id, label [{\"id\": \"OBI:0001479\", \"label\": \"specimen from organism\"}, {\"id\": \"OBI:0001876\", \"label\": \"cell culture\"}, {\"id\": \"OBI:0100058\", \"label\": \"xenograft\"}]
NA"},{"location":"schemas-md/obj/sampleProcessing/","title":"sampleProcessing","text":"Term Description Type Properties Example Enum sampleProcessing Status of how the specimen was processed,e.g. a child term of EFO:0009091. object id, label [{\"id\": \"EFO:0009129\", \"label\": \"mechanical dissociation\"}]
NA"},{"location":"schemas-md/obj/sampleStorage/","title":"sampleStorage","text":"Term Description Type Properties Example Enum sampleStorage Status of how the specimen was stored. object id, label NA"},{"location":"schemas-md/obj/scheduleFrequency/","title":"scheduleFrequency","text":"Term Description Type Properties Example Enum scheduleFrequency Definition of an ontology term. object id, label [{\"id\": \"NCIT:C64496\", \"label\": \"Twice Daily\"}]
NA"},{"location":"schemas-md/obj/severity/","title":"Severity","text":"Term Description Type Properties Example Enum severity Severity as applicable to phenotype or disease observed. Recommended are values from Human Phenotype Ontology (HP:0012824), e.g mild
. The intensity or degree of a manifestation. Source: Phenopackets v2 object id, label [{\"id\": \"HP:0012828\", \"label\": \"Severe\"}, {\"id\": \"HP:0012826\", \"label\": \"Moderate\"}]
NA"},{"location":"schemas-md/obj/sex/","title":"Sex","text":"Term Description Type Properties Example Enum sex Sex of the individual. Value from NCIT General Qualifier (NCIT:C27993): 'unknown' (not assessed or not available) (NCIT:C17998), 'female' (NCIT:C16576), or 'male', (NCIT:C20197). object id, label [{\"id\": \"NCIT:C16576\", \"label\": \"female\"}, {\"id\": \"NCIT:C20197\", \"label\": \"male\"}, {\"id\": \"NCIT:C1799\", \"label\": \"unknown\"}]
NA"},{"location":"schemas-md/obj/source/","title":"Source","text":"Term Description Type Properties Example Enum source The study string NA The Genome Aggregation Database (gnomAD), The European Genome-phenome Archive (EGA) NA"},{"location":"schemas-md/obj/sourceReference/","title":"sourceReference","text":"Term Description Type Properties Example Enum sourceReference A reference to further documentation or details. string NA gnomad.broadinstitute.org/, ega-archive.org/ NA"},{"location":"schemas-md/obj/stage/","title":"Stage","text":"Term Description Type Properties Example Enum stage Definition of an ontology term. object id, label [{\"id\": \"OGMS:0000119\", \"label\": \"acute onset\"}, {\"id\": \"OGMS:0000117\", \"label\": \"asymptomatic\"}, {\"id\": \"OGMS:0000106\", \"label\": \"remission\"}]
NA"},{"location":"schemas-md/obj/start/","title":"Start","text":"Term Description Type Properties Example Enum start Represents age as an ISO8601 duration (e.g., P18Y). object iso8601duration NA NA"},{"location":"schemas-md/obj/toolName/","title":"toolName","text":"Term Description Type Properties Example Enum toolName Name of the tool. string NA Ensembl Variant Effect Predictor (VEP) NA"},{"location":"schemas-md/obj/toolReferences/","title":"toolReferences","text":"Term Description Type Properties Example Enum toolReferences References to the tool object NA [{\"bio.toolsId\": \"https://bio.tools/vep\"}, {\"url\": \"https://www.ensembl.org/vep\"}]
NA"},{"location":"schemas-md/obj/transcriptHGVSIds/","title":"transcriptHGVSIds","text":"Term Description Type Properties Example Enum transcriptHGVSIds NA array NA [\"NC 000023.10(NM004006.2):c.357+1G\"]
NA"},{"location":"schemas-md/obj/treatmentCode/","title":"treatmentCode","text":"Term Description Type Properties Example Enum treatmentCode Definition of an ontology term. object id, label [{\"id\": \"NCIT:C287\", \"label\": \"Aspirin\"}, {\"id\": \"NCIT:C62078\", \"label\": \"Tamoxifen\"}]
NA"},{"location":"schemas-md/obj/treatments/","title":"Treatments","text":"Term Description Type Properties Example Enum treatments Treatment(s) prescribed/administered, defined by treatment ID, date and age of onset, dose, schedule and duration. array ageAtOnset, cumulativeDose, doseIntervals, routeOfAdministration, treatmentCode NA NA"},{"location":"schemas-md/obj/tumorGrade/","title":"tumorGrade","text":"Term Description Type Properties Example Enum tumorGrade Term representing the tumor grade. Child term of NCIT:C28076 (Disease Grade Qualifier) or equivalent. object id, label [{\"id\": \"NCIT:C28080\", \"label\": \"Grade 3a\"}]
NA"},{"location":"schemas-md/obj/tumorProgression/","title":"tumorProgression","text":"Term Description Type Properties Example Enum tumorProgression Tumor progression category indicating primary, metastatic or recurrent progression. Ontology value from Neoplasm by Special Category ontology (NCIT:C7062), e.g. NCIT:C84509 (Primary Malignant Neoplasm). object id, label [{\"id\": \"NCIT:C84509\", \"label\": \"Primary Malignant Neoplasm\"}, {\"id\": \"NCIT:C4813\", \"label\": \"Recurrent Malignant Neoplasm\"}]
NA"},{"location":"schemas-md/obj/unit/","title":"Unit","text":"Term Description Type Properties Example Enum unit The kind of unit. Recommended from NCIT Unit of Category ontology term (NCIT:C42568) descendants object id, label [{\"id\": \"NCIT:C70575\", \"label\": \"Roentgen\"}, {\"id\": \"NCIT:C28252\", \"label\": \"Kilogram\"}, {\"id\": \"NCIT:C28253\", \"label\": \"Milligram\"}]
NA"},{"location":"schemas-md/obj/updateDateTime/","title":"updateDateTime","text":"Term Description Type Properties Example Enum updateDateTime The time the dataset was updated in (ISO 8601 format) string NA 2017-01-17T20:33:40Z NA"},{"location":"schemas-md/obj/value/","title":"Value","text":"Term Description Type Properties Example Enum value The value of the quantity in the units number NA NA NA"},{"location":"schemas-md/obj/variantAlternativeIds/","title":"variantAlternativeIds","text":"Term Description Type Properties Example Enum variantAlternativeIds Definition of an external reference class. Provenance: GA4GH Phenopackets v2 ExternalReference
array id, notes, reference [{\"id\": \"dbSNP:rs587780345\", \"notes\": \"dbSNP id\", \"reference\": \"https://www.ncbi.nlm.nih.gov/snp/rs587780345\"}, {\"id\": \"ClinGen:CA152954\", \"notes\": \"ClinGen Allele Registry id\", \"reference\": \"https://reg.clinicalgenome.org/redmine/projects/registry/genboree_registry/by_caid?caid=CA152954\"}, {\"id\": \"UniProtKB:P35557#VAR_003699\", \"reference\": \"https://www.uniprot.org/uniprot/P35557#VAR_003699\"}]
,[{\"id\": \"OMIM:164757.0001\", \"reference\": \"https://www.omim.org/entry/164757#0001\"}]
NA"},{"location":"schemas-md/obj/variantCaller/","title":"variantCaller","text":"Term Description Type Properties Example Enum variantCaller Reference to variant calling software / pipeline string NA GATK4.0 NA"},{"location":"schemas-md/obj/variantInternalId/","title":"variantInternalId","text":"Term Description Type Properties Example Enum variantInternalId Reference to the internal variant ID. This represents the primary key/identifier of that variant inside a given Beacon instance. Different Beacon instances may use identical id values, referring to unrelated variants. Public identifiers such as the GA4GH Variant Representation Id (VRSid) MUST be returned in the identifiers
section. A Beacon instance can, of course, use the VRSid as their own internal id but still MUST represent this then in the identifiers
section. string NA var00001, v110112 NA"},{"location":"schemas-md/obj/variantLevelData/","title":"variantLevelData","text":"Term Description Type Properties Example Enum variantLevelData NA object clinicalInterpretations, phenotypicEffects NA NA"},{"location":"schemas-md/obj/variantType/","title":"variantType","text":"Term Description Type Properties Example Enum variantType The variantType
declares the nature of the variation in relation to a reference. In a response, it is used to describe the variation. In a request, it is used to declare the type of event the Beacon client is looking for. If in queries variants can not be defined through a sequence of one or more bases (precise
variants) it can be used standalone (i.e. without alternateBases
) together with positional parameters. Examples here are e.g. queries for structural variants such as DUP
(increased allelic count of material from the genomic region between start
and end
positions without assumption about the placement of the additional sequence) or DEL
(deletion of sequence following start
). Either alternateBases
or variantType
is required, with the exception of range queries (single start
and end
parameters). string NA SNP, DEL, DUP, BND NA"},{"location":"schemas-md/obj/variation/","title":"Variation","text":"Term Description Type Properties Example Enum variation NA oneOf LegacyVariation, MolecularVariation, SystemicVariation NA NA"},{"location":"schemas-md/obj/version/","title":"Version","text":"Term Description Type Properties Example Enum version version of the source data. string NA gnomAD v3.1.1 NA"},{"location":"schemas-md/obj/zygosity/","title":"Zygosity","text":"Term Description Type Properties Example Enum zygosity Ontology term for zygosity in which variant is present in the sample from the Zygosity Ontology (GENO:0000391) , e.g heterozygous
(GENO:0000135) object id, label [{\"id\": \"GENO:0000135\", \"label\": \"heterozygous\"}, {\"id\": \"GENO:0000136\", \"label\": \"homozygous\"}, {\"id\": \"GENO:0000604\", \"label\": \"hemizygous X-linked\"}]
NA"}]}
\ No newline at end of file
diff --git a/security/index.html b/security/index.html
new file mode 100644
index 000000000..9844f0d33
--- /dev/null
+++ b/security/index.html
@@ -0,0 +1,1105 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Disclaimer
A stand-alone regulatory and ethics review has been performed on the specification itself> +However, it is the responsibility of the implementer to ensure that appropriate measures are taken +to remove risks related to privacy, confidentiality, and/or security of data.
+ +The Beacon uses a 3-tiered access model: anonymous
, registered
, and controlled access
.
Technical Notes
For detailed information about the technical implementation of the different levels +of security please see the Framework documentation.
+ +For a Beacon to respond to a query at the registered tier, the user must identify themselves to the Beacon, for example by using an ELIXIR identity.
+For a Beacon to respond to a controlled access query, the user must have applied for, and been granted access to, the Beacon (or data derived from one or more individuals within the Beacon) +individuals) whose data is only accessible at specified tiers within the Beacon. This tiered access model allows the owner or controller of a Beacon to determine which responses are returned to whom depending on the query and the user who is making the request, for example to ensure the response respects the consent under which the data were collected.
+Anonymous Beacon can be accessed by any request.
+Synthetic data
+The use of synthetic data for testing is important in that it ensures that the full functionality of a Beacon can be tested and / or demonstrated without risk of exposing data from individuals. In addition to testing or demonstrating a deployment, synthetic data should be used for development, for example when adding new features.
+For querying of genomic variations Beacon v2 builds on and extends the options provided +by earlier versions.
+Sequence Queries query for the existence of a specified sequence at a given genomic +position. Such queries correspond to the original Beacon queries and are used to match +short, precisely defined genomic variants such as SNVs and INDELs.
+referenceName
start
(single value)alternateBases
referenceBases
This is an example for a single base mutation (G>A
) at a specific position (GRCh38 chromosome 17 7577120
)
+in the EIF4A1 eukaryotic translation initiation factor 4A1.
?referenceName=NC_000017.11&start=7577120&referenceBases=G&alternateBases=A
+
datasetIds=__some-dataset-ids__
filters
...{
+ "$schema":"beaconRequestBody.json",
+ "meta": {
+ "apiVersion": "2.0",
+ "requestedSchemas": [
+ {
+ "entityType": "genomicVariation",
+ "schema:": "https://raw.githubusercontent.com/ga4gh-beacon/beacon-v2/main/models/json/beacon-v2-default-model/genomicVariations/defaultSchema.json"
+ }
+ ]
+ },
+ "query": {
+ "requestParameters": {
+ "g_variant": {
+ "referenceName": "NC_000017.11",
+ "start": [7577120],
+ "referenceBases": "G",
+ "alternateBases": "A"
+ }
+ }
+ },
+ "requestedGranularity": "record",
+ "pagination": {
+ "skip": 0,
+ "limit": 5
+ }
+}
+
There are optional parameters [datasetIds
, filters
...] and also the option to specify the response type
+(through requestedGranularity
) and returned data format (requestedSchemas
). Please follow this up in the
+framework documentation.
?assemblyId=GRCh38&referenceName=17&start=7577120&referenceBases=G&alternateBases=A
+
datasetIds=__some-dataset-ids__
?ref=GRCh38&chrom=17&pos=7577121&referenceAllele=C&allele=A
+
beacon=__some-beacon-id__
Before Beacon v0.4 a 1-based coordinate system was being used.
+Beacon Range Queries are supposed to return matches of any variant with at least
+partial overlap of the sequence range specified by reference_name
, start
and end
+parameters.
referenceName
start
(single value)end
(single value)variantType
OR alternateBases
OR aminoacidChange
variantMinLength
variantMaxLength
Use of start
and end
Range queries require the use of single start
and end
parameters, in contrast
+to Bracket Queries.
?assemblyId=GRCh38&referenceName=17&start=7572837&end=7578641
+
{
+ "$schema":"https://raw.githubusercontent.com/ga4gh-beacon/beacon-v2/main/framework/json/requests/beaconRequestBody.json",
+ "meta": {
+ "apiVersion": "2.0",
+ "requestedSchemas": [
+ {
+ "entityType": "genomicVariation",
+ "schema:": "https://raw.githubusercontent.com/ga4gh-beacon/beacon-v2/main/models/json/beacon-v2-default-model/genomicVariations/defaultSchema.json"
+ }
+ ]
+ },
+ "query": {
+ "requestParameters": {
+ "g_variant":
+ "referenceName": "NC_000017.11",
+ "start": [ 7572837 ],
+ "end": [ 7578641 ]
+ }
+ }
+ },
+ "requestedGranularity": "record",
+ "pagination": {
+ "skip": 0,
+ "limit": 5
+ }
+}
+
Range Queries are new to Beacon v2
+Range Queries are new to Beacon v2
+GeneId Queries are in essence a variation of Range Queries in which the coordinates +are replaced by the HGNC gene symbol. It is left to the +implementation if the matching is done on variants annotated for the gene symbol or if +a positional translation is being applied.
+geneId
variantType
OR alternateBases
OR aminoacidChange
variantMinLength
variantMaxLength
?geneId=EIF4A1&variantMaxLength=1000000&variantType=DEL
+
Bracket Queries allow the specification of sequence ranges for both start and end +positions of a genomic variation. The typical example here is the query for similar +structural variants - particularly CNVs - affecting a genomic region but potentially +differing in their exact base extents.
+ +referenceName
start
(min) and start
(max) - i.e. 2 start parametersend
(min) and end
(max) - i.e. 2 end parametersvariantType
(optional)Use of start
and end
Bracket queries require the use of two start
and end
parameters, in contrast
+to Range Queries.
List Parameters in GET Requests
+Since the direct interpretation of list parameters in queries is not supported by
+some server environments (e.g. PHP, GO…), list parameters such as start
and end
+should be provided as comma-concatenated strings when using them in GET requests.
The following example shows a "bracket query" for focal deletions of the TP53 gene locus:
+This leads to matching of deletion CNVs which have at least some base overlap with the gene locus but are not +larger than approx. 5Mb (operational definitions of focality vary between 1 and 5Mb).
+?datasetIds=TEST&referenceName=NC_000017.11&variantType=DEL&start=5000000,7676592&end=7669607,10000000
+
datasetIds=__some-dataset-ids__
filters
...{
+ "$schema":"https://raw.githubusercontent.com/ga4gh-beacon/beacon-v2/main/framework/json/requests/beaconRequestBody.json",
+ "meta": {
+ "apiVersion": "2.0",
+ "requestedSchemas": [
+ {
+ "entityType": "genomicVariation",
+ "schema:": "https://raw.githubusercontent.com/ga4gh-beacon/beacon-v2/main/models/json/beacon-v2-default-model/genomicVariations/defaultSchema.json"
+ }
+ ]
+ },
+ "query": {
+ "requestParameters": {
+ "g_variant": {
+ "referenceName": "NC_000017.11",
+ "start": [ 5000000, 7676592 ],
+ "end": [ 7669607, 10000000 ],
+ "variantType": "DEL"
+ }
+ }
+ },
+ "requestedGranularity": "record",
+ "pagination": {
+ "skip": 0,
+ "limit": 5
+ }
+}
+
There are optional parameters [datasetIds
, filters
...] and also the option to specify the response type
+(through requestedGranularity
) and returned data format (requestedSchemas
). Please follow this up in the
+framework documentation.
?assemblyId=GRCh38&referenceName=17&variantType=DEL&start=5000000,7676592&end=7669607,10000000
+
datasetIds=__some-dataset-ids__
CNV query options were only implemented with Beacon v0.4, based on Beacon+ prototyping.
+TBD
+?allele=NM_004006.2:c.4375C>T
+
to be completed
+TBD
+?aminoacidChange=V600E
+
to be completed
+variantType
Parameter Interpretation¶The variantType
parameter is essential for scoping queries beyond precise sequence
+queries. While versions of Beacon before v2 had demonstrated the use of a few, VCF
+derived values (particularly for CNV queries using DUP
or DEL
), the relation of these
+values to underlying genomic variations had not been precisely defined.
Implementation of variantType
in Beacon Instances
The current Beacon query model does not limit the use of values for variantType
since
+at this time no single specification provides unanimous definitions
+of genomic variation categories.
variantType
parameter useWhile for legacy reasons and widespread use of VCFs as input source Beacon v2 documents +the use of VCF-like terms, in principle other variant terms can be used (though with possibly negative +implications in federated settings). The field of structural genomic variant annotations is rapidly +developing, with more specific terms now becoming available e.g. through the +Experimental Factor Ontology or the GA4GH Variant Representation Standard VRS +(which ligns with the main EFO terms).
+This table is maintained in parallel with the hCNV community documentation.
+EFO | +Beacon | +VCF | +SO | +GA4GH VRS ⇒ VRS proposal1 |
+Notes | +
---|---|---|---|---|---|
EFO:0030070 |
+DUP 2 orEFO:0030070 |
+DUP SVCLAIM=D 3 |
+SO:0001742 copy_number_gain |
+low-level gain (implicit) ⇒ EFO:0030070 copy number gain |
+a sequence alteration whereby the copy number of a given genomic region is greater than the reference sequence | +
EFO:0030071 low-level copy number gain |
+DUP 2 orEFO:0030071 |
+DUP SVCLAIM=D 3 |
+SO:0001742 copy_number_gain |
+low-level gain ⇒ EFO:0030071 low-level copy number gain |
++ |
EFO:0030072 high-level copy number gain |
+DUP 2 orEFO:0030072 |
+DUP SVCLAIM=D 3 |
+SO:0001742 copy_number_gain |
+high-level gain ⇒ EFO:0030072 high-level copy number gain |
+commonly but not consistently used for >=5 copies on a bi-allelic genome region | +
EFO:0030073 focal genome amplification |
+DUP 2 orEFO:0030073 |
+DUP SVCLAIM=D 3 |
+SO:0001742 copy_number_gain |
+high-level gain ⇒ EFO:0030073 focal genome amplification |
+commonly but not consistently used for >=5 copies on a bi-allelic genome region, of limited size (operationally max. 1-5Mb) | +
EFO:0030067 copy number loss |
+DEL 2 orEFO:0030067 |
+DEL SVCLAIM=D 3 |
+SO:0001743 copy_number_loss |
+partial loss (implicit) ⇒ EFO:0030067 copy number loss |
+a sequence alteration whereby the copy number of a given genomic region is smaller than the reference sequence | +
EFO:0030068 low-level copy number loss |
+DEL 2 orEFO:0030068 |
+DEL SVCLAIM=D 3 |
+SO:0001743 copy_number_loss |
+partial loss ⇒ EFO:0030068 low-level copy number loss |
++ |
EFO:0020073 high-level copy number loss |
+DEL 2 orEFO:0020073 |
+DEL SVCLAIM=D 3 |
+SO:0001743 copy_number_loss |
+partial loss ⇒ EFO:0020073 high-level copy number loss |
+a loss of several copies; also used in cases where a complete genomic deletion cannot be asserted | +
EFO:0030069 complete genomic deletion |
+DEL 2 orEFO:0030069 |
+DEL SVCLAIM=D 3 |
+SO:0001743 copy_number_loss |
+complete loss ⇒ EFO:0030069 complete genomic deletion |
+complete genomic deletion (e.g. homozygous deletion on a bi-allelic genome region) | +
assemblyId
parameterreferenceBases
, alternateBases
, variantType
...)
+may be used to scope the range queryaminoacidChange
geneId
variantMinLength
, variantMaxLength
start
and end
positions when querying multi-base
+variants allows for "fuzzy" CNV queriesvariantType
parameter to specify e.g. CNV queries (DUP
, DEL
)variantType
is not required for precise queries with specified referenceBases
+and alternateBases
The VRS annotations refer to the status at v1.2 (2022). The GA4GH VRS team
+is currently (Spring 2023) preparing an updated specification which will introduce
+the new class CopyNumberChange
(discussion...) with the use of the EFO terms (including a new term
+for high level deletion (EFO:0020073)
in the April 2023 EFO release). ↩
While the use of VCF derived (DUP
, DEL
) values had been introduced with
+beacon v1, usage of these terms has always been a recommendation rather than an integral part
+of the API. We now encourage the support of more specific terms (particularly EFO)
+by Beacon developers. As example, the Progentix Beacon API uses EFO terms but
+provides an internal term expansion for legacy DUP
, DEL
support. ↩↩↩↩↩↩↩↩
VCFv4.4 introduces an SVCLAIM
field to disambiguate between in situ events (such as
+tandem duplications; known adjacency/ break junction: SVCLAIM=J
) and events where e.g. only the
+change in abundance / read depth (SVCLAIM=D
) has been determined. Both J and D flags can be combined. ↩↩↩↩↩↩↩↩