Skip to content

Commit

Permalink
Merge pull request #122 from mrueda/website-docs
Browse files Browse the repository at this point in the history
Fix file naming conflict error in schemas-md on macOS APFS (case-insensitive)
  • Loading branch information
mbaudis authored Mar 26, 2024
2 parents d3c2692 + f2f08ca commit 9aefbed
Show file tree
Hide file tree
Showing 25 changed files with 62 additions and 50 deletions.
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,3 @@ site
.DS_Store
models/.DS_Store
/.vs
docs/schemas-md
2 changes: 1 addition & 1 deletion bin/SCHEMAS2MD.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ _NB:_ The script was built to work with the Beacon v2 Model schemas and the auth

_NB:_ The decission to take YAMLs (and not JSON) as an input is deliberate and made by the author.

_NB:_ The script only processes the `Terms` nested **up to 3 degrees of hierarchy**. Before Adoption of VRS/PHX that limit was OK.
_NB:_ The script only processes the `Terms` nested **up to 3 degrees of hierarchy**. Before Adoption of VRS/PXF that limit was OK.

_NB:_ The script also includes the Beacon v2 Models examples from [beacon-v2 repo](https://github.com/ga4gh-beacon/beacon-v2) in JSON format.

Expand Down
File renamed without changes.
60 changes: 36 additions & 24 deletions bin/beacon_yaml2md.pl
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@
#
# Script to convert Beacon v2 Models schemas to Markdown tables
#
# Last Modified: May/05/2022
# Last Modified: Mar/26/2024
#
# Version 2.0.0
#
# Copyright (C) 2021-2022 Manuel Rueda (manuel.rueda@crg.eu)
# Copyright (C) 2021-2024 Manuel Rueda (manuel.rueda@cnag.eu)
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
Expand Down Expand Up @@ -236,6 +236,10 @@ sub yaml2md_obj {
# We parse $yaml to get paths and more...
my ( $base, $dir, $ext ) = fileparse( $yaml, '.yaml' );
$ext =~ s/\.yaml/.md/;

# Ad hoc fix for two files that have same namex except for uc/lc
# AgeRange == ageRange and Value == value on MacOS cwAPFS (Case insensitive)
$base = $base . '_PXF' if ( $base eq 'AgeRange' || $base eq 'Value' );
my $file = catfile( $mo_dir, $base . $ext ); # Note -> $base.$ext
write_file( $file, $out_str );

Expand Down Expand Up @@ -278,11 +282,11 @@ sub yaml_slicer {
# one YAML file for each property and then re-use code from the 'main' schema

##########################################
# **** Note about VRS / PHX adoption *** #
# **** Note about VRS / PXF adoption *** #
##########################################

# The adoption of those standards had technical implications. The script expects objects to have
# <key> for the object and then <properties>. VRS/PHX follow JSON schemas that include /oneOf allOf anyOf/
# <key> for the object and then <properties>. VRS/PXF follow JSON schemas that include /oneOf allOf anyOf/
# plus other complex intructions such as <if:> <else:>.
# This becomes a real challenge with $ref as, for instance, in <g_v.variation> we can not find the key for
# 'MolecularVariation', 'SystemicVariation', 'LegacyVariation'
Expand Down Expand Up @@ -352,7 +356,7 @@ sub yaml_slicer {
sub table_content {

my ( $yaml_properties, $ra_properties, $headers, $obj, $link ) = @_;
my @lc_headers = map { lc } @$headers; # Copy array uc to avoid modifying original $ref
my @lc_headers = map { lc } @$headers; # Copy array uc to avoid modifying original $ref
my $out_str = '';

#---------------------------------------------------------|
Expand Down Expand Up @@ -394,10 +398,10 @@ sub table_content {
if $header eq 'example';

# Slice differentely if $object->{type} eq 'array'
if ($object->{type} eq 'array' ) {
for ('description', 'properties'){
$value_header = $object->{items}{$_} if $header eq $_;
}
if ( $object->{type} eq 'array' ) {
for ( 'description', 'properties' ) {
$value_header = $object->{items}{$_} if $header eq $_;
}
}

# Now convert data structure to string
Expand Down Expand Up @@ -454,7 +458,7 @@ sub ref2str {

# string or undef
else {
$out_str = defined $data->[0] ? join ', ', @$data : 'NA'; # Note ', ' to allow HTML column rendering
$out_str = defined $data->[0] ? join ', ', @$data : 'NA'; # Note ', ' to allow HTML column rendering
}
}
elsif ( ref $data eq 'HASH' ) {
Expand All @@ -480,15 +484,20 @@ sub add_external_links {
my ( $tmp_str, $key ) = @_;

# Note: This is an ad hoc solution to fix errors with deeply-nested data
my @phx = qw( typedQuantities days weeks Quantity high low);
my @vrs = qw(_id state type CURIE Location);
my @pxf = qw( typedQuantities days weeks Quantity high low);
my @vrs = qw(_id state type CURIE Location);
my @framework = ("ontologyTerm");
return ( any { ( $_ eq $key ) } @phx )

return ( any { ( $_ eq $key ) } @pxf )
? "[$key](https://phenopacket-schema.readthedocs.io/en/latest/building-blocks.html)"
: ( any { ( $_ eq $key ) } @vrs )
? "[$key](https://vrs.ga4gh.org/en/stable/terms_and_model.html#$key)"
: ( any { ( $_ eq $key ) } @framework )
? "[$key](https://github.com/ga4gh-beacon/beacon-v2/blob/main/framework/src/common/$key.yaml)"
: ( any { ( $_ eq $key ) } @framework )
? "[$key](https://github.com/ga4gh-beacon/beacon-v2/blob/main/framework/src/common/$key.yaml)"

# NB: Ad hoc solution for properties having equal name (lc)
: ( $key eq 'AgeRange' || $key eq 'Value' )
? "[$key]($tmp_str/${key}_PXF.md)"
: "[$key]($tmp_str/$key.md)";
}

Expand Down Expand Up @@ -588,7 +597,7 @@ sub create_str_yaml {

## ontologyTerm.yaml is needed due to a bug with jsonref2json.js that overrided "parent" <description> field

my $str_ontologyTerm = <<EOF;
my $str_ontologyTerm = <<EOF;
---
additionalProperties: true
description: Definition of an ontology term.
Expand Down Expand Up @@ -676,10 +685,10 @@ sub parse_json_keywords {
'variation' =>
[ 'MolecularVariation', 'SystemicVariation', 'LegacyVariation' ],
'SystemicVariation' => ['CopyNumber'],
'MolecularVariation' => [ 'Allele', 'Haplotype' ],
'location' => [ 'CURIE', 'Location' ],
'MolecularVariation' => [ 'Allele', 'Haplotype' ],
'location' => [ 'CURIE', 'Location' ],
'state' => [ 'SequenceState', 'SequenceExpression' ],
'Value' => [ 'Quantity', 'ontologyTerm' ]
'Value' => [ 'Quantity', 'ontologyTerm' ]
};

# We'll be checking <oneOf allOf anyOf>
Expand All @@ -699,14 +708,17 @@ sub parse_json_keywords {
# my $const = $pointer->get("/$keyword/$property/$count/properties/type/const");
# $tmp_hash->{properties}{$const} = $elements;
#} else{
my $tmp_term = ( $pointer->contains("/$keyword/$count/title") && $pointer->get("/$keyword/$count/title") ne 'Ontology Term' )
my $tmp_term =
( $pointer->contains("/$keyword/$count/title")
&& $pointer->get("/$keyword/$count/title") ne
'Ontology Term' )
? $pointer->get("/$keyword/$count/title")
: @{ $terms->{$property} }[$count];
$tmp_hash->{properties}{$tmp_term} = $elements if $tmp_term; # Ad-hoc some terms appear duplicated and come empty....
#}
$tmp_hash->{properties}{$tmp_term} = $elements if $tmp_term; # Ad-hoc some terms appear duplicated and come empty....
#}
$count++;
}
$data = $tmp_hash; # Adding new reference
$data = $tmp_hash; # Adding new reference
}
}
return $data;
Expand Down Expand Up @@ -872,7 +884,7 @@ =head1 HOW TO RUN BEACON_YAML2MD
I<NB:> The decission to take YAMLs (and not JSON) as an input is deliberate and made by the author.
I<NB:> The script only processes the C<Terms> nested B<up to 3 degrees of hierarchy>. Before Adoption of VRS/PHX that limit was OK.
I<NB:> The script only processes the C<Terms> nested B<up to 3 degrees of hierarchy>. Before Adoption of VRS/PXF that limit was OK.
I<NB:> The script also includes the Beacon v2 Models examples from L<beacon-v2 repo|https://github.com/ga4gh-beacon/beacon-v2> in JSON format.
Expand Down
11 changes: 6 additions & 5 deletions bin/transform_json2md.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@
#
# Script to convert Beacon v2 Models to Markdown
#
# Last Modified: Jul/20/2022
# Last Modified: Mar/26/2022
#
# Version 2.0.0
#
# Copyright (C) 2021-2022 Manuel Rueda (manuel.rueda@crg.eu)
# Copyright (C) 2021-2024 Manuel Rueda (manuel.rueda@cnag.eu)
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
Expand All @@ -26,11 +26,12 @@
set -eu
mod_dir=../models/json/beacon-v2-default-model
fwk_dir=../framework/json
adhoc_url='https://raw.githubusercontent.com/ga4gh-beacon/beacon-v2/main/bin/adhoc'
#adhoc_url='https://raw.githubusercontent.com/g4gh-beacon/beacon-v2/main/bin/adhoc'
adhoc_url='https://raw.githubusercontent.com/mrueda/beacon-v2/main/bin/adhoc'
out_dir=./deref_schemas
jsonref='node ./jsonref2json.js'
yaml2md=./beacon_yaml2md.pl
yaml2json='perl -MYAML -MJSON -0777 -wnl -e'
yaml2json='perl -MYAML::XS -MJSON::XS -0777 -wnl -e'

mkdir -p $out_dir/obj

Expand Down Expand Up @@ -75,7 +76,7 @@ do
rm $out_dir/$schema/defaultSchema.mod.json

echo "Transforming $schema JSON to YAML ..."
$yaml2json 'print YAML::Dump(decode_json($_))' $out_dir/$schema/defaultSchema.json | perl -pe 's/ \*(\d+)$/ $1/' > $out_dir/$schema/defaultSchema.yaml
$yaml2json 'print YAML::XS::Dump(decode_json($_))' $out_dir/$schema/defaultSchema.json | perl -pe 's/ \*(\d+)$/ $1/' > $out_dir/$schema/defaultSchema.yaml

echo "---"
done
Expand Down
4 changes: 2 additions & 2 deletions docs/schemas-md/beacon_terms.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
* [ageAtProcedure](./obj/ageAtProcedure.md)
* [ageOfOnset](./obj/ageOfOnset.md)
* [ageRange](./obj/ageRange.md)
* [AgeRange](./obj/AgeRange.md)
* [AgeRange_PXF](./obj/AgeRange_PXF.md)
* [aligner](./obj/aligner.md)
* [Allele](./obj/Allele.md)
* [alleleFrequency](./obj/alleleFrequency.md)
Expand Down Expand Up @@ -167,8 +167,8 @@
* [tumorProgression](./obj/tumorProgression.md)
* [unit](./obj/unit.md)
* [updateDateTime](./obj/updateDateTime.md)
* [Value](./obj/Value.md)
* [value](./obj/value.md)
* [Value_PXF](./obj/Value_PXF.md)
* [variantAlternativeIds](./obj/variantAlternativeIds.md)
* [variantCaller](./obj/variantCaller.md)
* [variantInternalId](./obj/variantInternalId.md)
Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion docs/schemas-md/obj/Complex Value.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
|Term | Description | Type | Properties | Example | Enum|
| ---| ---| ---| ---| ---| --- |
| Complex Value | Definition of a complex value class. Provenance: GA4GH Phenopackets v2 `TypedQuantity` | object | [typedQuantities](https://phenopacket-schema.readthedocs.io/en/latest/building-blocks.html) | NA | NA|
| Complex Value | Definition of a complex value class. Provenance: GA4GH Phenopackets v2 `TypedQuantity` | object | [required](./required.md), [typedQuantities](https://phenopacket-schema.readthedocs.io/en/latest/building-blocks.html) | NA | NA|
File renamed without changes.
2 changes: 1 addition & 1 deletion docs/schemas-md/obj/ageAtProcedure.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
|Term | Description | Type | Properties | Example | Enum|
| ---| ---| ---| ---| ---| --- |
| ageAtProcedure | NA | oneOf | [Age](./Age.md), [AgeRange](./AgeRange.md), [GestationalAge](./GestationalAge.md), [TimeInterval](./TimeInterval.md) | NA | NA|
| ageAtProcedure | NA | oneOf | [Age](./Age.md), [AgeRange](./AgeRange_PXF.md), [GestationalAge](./GestationalAge.md), [TimeInterval](./TimeInterval.md) | NA | NA|
2 changes: 1 addition & 1 deletion docs/schemas-md/obj/ageOfOnset.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
|Term | Description | Type | Properties | Example | Enum|
| ---| ---| ---| ---| ---| --- |
| ageOfOnset | NA | oneOf | [Age](./Age.md), [AgeRange](./AgeRange.md), [GestationalAge](./GestationalAge.md), [TimeInterval](./TimeInterval.md) | NA | NA|
| ageOfOnset | NA | oneOf | [Age](./Age.md), [AgeRange](./AgeRange_PXF.md), [GestationalAge](./GestationalAge.md), [TimeInterval](./TimeInterval.md) | NA | NA|
2 changes: 1 addition & 1 deletion docs/schemas-md/obj/alternateBases.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
|Term | Description | Type | Properties | Example | Enum|
| ---| ---| ---| ---| ---| --- |
| alternateBases | Alternate bases for this variant (starting from `start`). * Accepted values: IUPAC codes for nucleotides (e.g. `https://www.bioinformatics.org/sms/iupac.html`). * N is a wildcard, that denotes the position of any base, and can beused as a standalone base of any type or within a partially knownsequence. As example, a query of `ANNT` the Ns can take take any form of[ACGT] and will match `ANNT`, `ACNT`, `ACCT`, `ACGT` ... and so forth.* an *empty value* is used in the case of deletions with the maximally trimmed, deleted sequence being indicated in `ReferenceBases`* Categorical variant queries, e.g. such *not* being represented through sequence & position, make use of the `variantType` parameter.* Either `alternateBases` or `variantType` is required.' | string | NA | T, G, N, AG, | NA|
| alternateBases | Alternate bases for this variant (starting from `start`). * Accepted values: IUPAC codes for nucleotides (e.g. `https://www.bioinformatics.org/sms/iupac.html`). * N is a wildcard, that denotes the position of any base, and can be used as a standalone base of any type or within a partially known sequence.* an *empty value* is used in the case of deletions with the maximally trimmed, deleted sequence being indicated in `ReferenceBases` | string | NA | T, G, N, AG, | NA|
2 changes: 1 addition & 1 deletion docs/schemas-md/obj/date.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
|Term | Description | Type | Properties | Example | Enum|
| ---| ---| ---| ---| ---| --- |
| date | Date of the exposure in ISO8601 format. | string | NA | NA | NA|
| date | Date of measurement. Addition compared to Phenopackets model. | string | NA | NA | NA|
2 changes: 1 addition & 1 deletion docs/schemas-md/obj/geneIds.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
|Term | Description | Type | Properties | Example | Enum|
| ---| ---| ---| ---| ---| --- |
| geneIds | NA | array | NA | `["ACE2"]`,<br />`["BRCA1"]` | NA|
| geneIds | NA | array | NA | `["ACE2"]`,<br />`["BRCA1", "ENSG00000012048"]` | NA|
2 changes: 1 addition & 1 deletion docs/schemas-md/obj/id.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
|Term | Description | Type | Properties | Example | Enum|
| ---| ---| ---| ---| ---| --- |
| id | Run ID. | string | NA | SRR10903401 | NA|
| id | A CURIE identifier, e.g. as `id` for an ontology term. | string | NA | ga4gh:GA.01234abcde, DUO:0000004, orcid:0000-0003-3463-0775, PMID:15254584 | NA|
2 changes: 1 addition & 1 deletion docs/schemas-md/obj/measurementValue.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
|Term | Description | Type | Properties | Example | Enum|
| ---| ---| ---| ---| ---| --- |
| measurementValue | NA | oneOf | [Complex Value](./Complex Value.md), [Value](./Value.md) | NA | NA|
| measurementValue | NA | oneOf | [Complex Value](./Complex Value.md), [Value](./Value_PXF.md) | NA | NA|
2 changes: 1 addition & 1 deletion docs/schemas-md/obj/notes.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
|Term | Description | Type | Properties | Example | Enum|
| ---| ---| ---| ---| ---| --- |
| notes | Unstructured text to describe additional properties of this disease instance. | string | NA | Some free text | NA|
| notes | Unstructured text to describe this measurement. Addition compared to Phenopackets model. | string | NA | Some free text | NA|
2 changes: 1 addition & 1 deletion docs/schemas-md/obj/observationMoment.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
|Term | Description | Type | Properties | Example | Enum|
| ---| ---| ---| ---| ---| --- |
| observationMoment | NA | oneOf | [Age](./Age.md), [AgeRange](./AgeRange.md), [GestationalAge](./GestationalAge.md), [TimeInterval](./TimeInterval.md) | NA | NA|
| observationMoment | NA | oneOf | [Age](./Age.md), [AgeRange](./AgeRange_PXF.md), [GestationalAge](./GestationalAge.md), [TimeInterval](./TimeInterval.md) | NA | NA|
2 changes: 1 addition & 1 deletion docs/schemas-md/obj/onset.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
|Term | Description | Type | Properties | Example | Enum|
| ---| ---| ---| ---| ---| --- |
| onset | NA | oneOf | [Age](./Age.md), [AgeRange](./AgeRange.md), [GestationalAge](./GestationalAge.md), [TimeInterval](./TimeInterval.md) | NA | NA|
| onset | NA | oneOf | [Age](./Age.md), [AgeRange](./AgeRange_PXF.md), [GestationalAge](./GestationalAge.md), [TimeInterval](./TimeInterval.md) | NA | NA|
2 changes: 1 addition & 1 deletion docs/schemas-md/obj/population.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
|Term | Description | Type | Properties | Example | Enum|
| ---| ---| ---| ---| ---| --- |
| population | A name for the population. A population could an ethnic, geographical one or just the `members`of a study. | string | NA | East Asian, ICGC Chronic Lymphocytic Leukemia-ES, Men, Children | NA|
| population | A name for the population. A population could an ethnic, geographical one or just the members of a study. | string | NA | East Asian, ICGC Chronic Lymphocytic Leukemia-ES, Men, Children | NA|
2 changes: 1 addition & 1 deletion docs/schemas-md/obj/referenceBases.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
|Term | Description | Type | Properties | Example | Enum|
| ---| ---| ---| ---| ---| --- |
| referenceBases | Reference bases for this variant (starting from `start`). * Accepted values: IUPAC codes for nucleotides (e.g. `https://www.bioinformatics.org/sms/iupac.html`). * N is a wildcard, that denotes the position of any base, and can be used as a standalone base of any type or within a partially known sequence. As example, a query of `ANNT` the Ns can take take any form of `[ACGT]` and will match `ANNT`, `ACNT`, `ACCT`, `ACGT` ... and so forth.* an *empty value* is used in the case of insertions with the maximally trimmed, inserted sequence being indicated in `AlternateBases`.NOTE: Beacon instances may not support UIPAC codes and it is not mandatory for them to do so. In such cases the use of [ACGTN] is mandated. | string | NA | A, T, N, , ACG | NA|
| referenceBases | Reference bases for this variant (starting from `start`). * Accepted values: IUPAC codes for nucleotides (e.g. `https://www.bioinformatics.org/sms/iupac.html`). * N is a wildcard, that denotes the position of any base, and can be used as a standalone base of any type or within a partially known sequence.* an *empty value* is used in the case of insertions with the maximally trimmed, inserted sequence being indicated in `AlternateBases`. | string | NA | A, T, N, , ACG | NA|
2 changes: 1 addition & 1 deletion docs/schemas-md/obj/resolution.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
|Term | Description | Type | Properties | Example | Enum|
| ---| ---| ---| ---| ---| --- |
| resolution | NA | oneOf | [Age](./Age.md), [AgeRange](./AgeRange.md), [GestationalAge](./GestationalAge.md), [TimeInterval](./TimeInterval.md) | NA | NA|
| resolution | NA | oneOf | [Age](./Age.md), [AgeRange](./AgeRange_PXF.md), [GestationalAge](./GestationalAge.md), [TimeInterval](./TimeInterval.md) | NA | NA|
2 changes: 1 addition & 1 deletion docs/schemas-md/obj/unit.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
|Term | Description | Type | Properties | Example | Enum|
| ---| ---| ---| ---| ---| --- |
| unit | The kind of unit. Recommended from NCIT Unit of Category ontology term (NCIT:C42568) descendants | object | [id](./id.md), [label](./label.md) | `[{"id": "NCIT:C70575", "label": "Roentgen"}, {"id": "NCIT:C28252", "label": "Kilogram"}, {"id": "NCIT:C28253", "label": "Milligram"}]` | NA|
| unit | Unit of the exposure. Recommended from NCIT Unit of Category ontology term (NCIT:C42568) descendants. | object | [id](./id.md), [label](./label.md) | `[{"id": "NCIT:C70575", "label": "Roentgen"}, {"id": "NCIT:C28252", "label": "Kilogram"}, {"id": "NCIT:C28253", "label": "Milligram"}]` | NA|
2 changes: 1 addition & 1 deletion docs/schemas-md/obj/value.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
|Term | Description | Type | Properties | Example | Enum|
| ---| ---| ---| ---| ---| --- |
| value | The value of the quantity in the units | number | NA | NA | NA|
| value | Quantification of the exposure. | number | NA | NA | NA|
Loading

0 comments on commit 9aefbed

Please sign in to comment.