dcat:distribution, model fix, inspect API updates #911

canwaf · 2024-04-12T13:11:19Z

With yanked csvcubed 0.5.0 we adopted the following change to the object model.

<4g-coverage.csv#dataset> <http://purl.org/dc/terms/description> "4G coverage in the UK by geographic area" ;
	<http://purl.org/dc/terms/title> "4G Coverage in the UK" ;
	<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/linked-data/cube#Attachable>, <http://purl.org/linked-data/cube#DataSet>, <http://www.w3.org/2000/01/rdf-schema#Resource>, <http://www.w3.org/ns/dcat#Distribution>, <http://www.w3.org/ns/dcat#Resource> .

This impacts csvcubed's inspect command, which calls https://github.com/GSS-Cogs/csvcubed/blob/main/src/csvcubed/inspect/sparql_handler/sparql_queries/select_catalog_metadata.sparql which primarily looks for the dcat:Dataset

        SELECT DISTINCT ?dataset
        WHERE {
            GRAPH ?someGraph {
                ?dataset a dcat:Dataset.
            }
        }

Which is no longer present; however it should be present. Consider the application profile where the CSV-W is the distribution. This leads us to the following:

<4g-coverage.csv#csvqb> a <http://purl.org/linked-data/cube#Attachable>, <http://purl.org/linked-data/cube#DataSet>, <http://www.w3.org/2000/01/rdf-schema#Resource>, <http://www.w3.org/ns/dcat#Distribution>, <http://www.w3.org/ns/dcat#Resource> ;
    <http://www.w3.org/ns/dcat#isDistributionOf> <4g-coverage.csv#dataset> .
<4g-coverage.csv#dataset> <http://purl.org/dc/terms/description> "4G coverage in the UK by geographic area" ;
	<http://purl.org/dc/terms/title> "4G Coverage in the UK" .

So the catalogue metadata is attached to the dataset, but the CSV-W's primary subject is now the Attachable, qb:Dataset, etc.

This should allow the SPARQL query to remain unchanged.

The metadata attached to the dcat:Distribution should be at most (Not these are not requirements, just what we can fill in that we already have we should add, nothing new new please):

classDiagram

class Distribution["Distribution a dcat:Distribution"] {
    +dcterms:identifier ∋ rdfs:Literal as xsd:string
    +dcterms:created ∋ rdfs:Literal as xsd:dateTime
    +dcterms:creator ∋ foaf:Agent
    +dcterms:issued ∋ rdfs:Literal as xsd:dateTime
    +prov:wasDerivedFrom ∋ [prov:Entity]
    +prov:wasGeneratedBy ∋ prov:Activity
    +dcat:downloadURL ∋ rdf:Resource
    +dcat:byteSize ∋ rdfs:Literal as xsd:nonNegativeInteger
    +dcat:mediaType ∋ dcterms:MediaType
    +wdrs:describedBy ∋ rdfs:Resource
    +spdx:checksum ∋ spdx:Checksum
}

tl;dr main subject of the CSV-W metadata file should be <dataset.csv#csvqb> which is dcat:isDistributionOf the dcat:Dataset. The dcat:Dataset is the one which should have the catalogue metadata attached to it.

The text was updated successfully, but these errors were encountered:

SarahJohnsonONS · 2024-06-03T13:57:34Z

Currently, cubes that have been built using csvcubed v0.4.10 or lower cannot be inspected using csvcubed v0.5.0 or greater, as the primary identifier has changed from some-dataset.csv#dataset to some-dataset.csv#csvqb. In order to facilitate this change, a new distribution_uri property has been added to the CatalogMetadata class, and the select_catalog_metadata SPARQL query has been updated to extract the value of this property, if it is present.

Additional information on the version of csvcubed used to build the cube is also now available in the metadata JSON file, which may also be leveraged to determine how the cube should be inspected.

The distribution_uri value is not present in cubes built using older versions of csvcubed, so the inspect command fails if using a newer version of csvcubed. This is due to the MetadataPrinter class now using the distribution_uri in the get_primary_csv_url() method via DataCubeRepository.get_cube_identifiers_for_dataset(). There will be other places where there is a discrepancy, but this is where I would start.

Possible solutions:

Use the csvcubed-build-activity information to extract the version of csvcubed used to build the cube, and use this to implement different versions of the inspect command. Build activity information available in different versions of csvcubed is below.
Use the presence or absence of distribution_uri in the select_catalog_metadata SPARQL results to implement different versions of the inspect command.

Build activity information

csvcubed version < 0.5.0

...
{
    "@id": "aged-16-to-64-years-level-3-or-above-qualifications.csv#dataset",
    "http://www.w3.org/ns/prov#wasGeneratedBy": [
        {
            "@id": "aged-16-to-64-years-level-3-or-above-qualifications.csv#csvcubed-build-activity"
        }
    ]
}
...
{
    "@id": "aged-16-to-64-years-level-3-or-above-qualifications.csv#csvcubed-build-activity",
    "@type": [
        "http://www.w3.org/2000/01/rdf-schema#Resource",
        "http://www.w3.org/ns/prov#Activity"
    ],
    "http://www.w3.org/ns/prov#used": [
        {
            "@id": "https://github.com/GSS-Cogs/csvcubed/releases/tag/v0.4.10"
        }
    ]
}
...

csvcubed version >= 0.5.0

...
{
    "@id": "some-title.csv#csvqb",
    "http://www.w3.org/ns/prov#wasDerivedFrom": [
        {
            "@id": "https://github.com/GSS-Cogs/csvcubed/releases/tag/v0.5.0"
        }
    ],
    "http://www.w3.org/ns/prov#wasGeneratedBy": [
        {
            "@id": "some-title.csv#csvcubed-build-activity"
        }
    ]
}
...
{
    "@id": "some-title.csv#csvcubed-build-activity",
    "@type": [
        "http://www.w3.org/ns/prov#Activity",
        "http://www.w3.org/2000/01/rdf-schema#Resource"
    ],
    "http://www.w3.org/ns/prov#used": [
        {
            "@id": "https://github.com/GSS-Cogs/csvcubed/releases/tag/v0.5.0"
        }
    ]
},
{
    "@id": "https://github.com/GSS-Cogs/csvcubed/releases/tag/v0.5.0",
    "@type": [
        "http://www.w3.org/ns/prov#Entity",
        "http://www.w3.org/2000/01/rdf-schema#Resource"
    ],
    "http://purl.org/dc/terms/title": [
        {
            "@language": "en",
            "@value": "csvcubed v0.5.0"
        }
    ],
    "http://www.w3.org/ns/prov#hasPrimarySource": [
        {
            "@id": "https://pypi.org/project/csvcubed/0.5.0"
        }
    ],
    "http://www.w3.org/ns/prov#wasGeneratedBy": [
        {
            "@id": "some-title.csv#csvcubed-build-activity"
        }
    ]
}

…-updates

* Updating the release version in pyproject.toml * test commit * WIP * WIP * WIP * WIP * WIP * WIP * Tidy up * tidy up * Working * Added comments * fixed pyright errors * more pyright * Changed #csvqb to #qbDataSet * PR comments addressed * poetry lock * poetry lock * oops * small change --------- Co-authored-by: Auto-version-incrementer <[email protected]>

SarahJohnsonONS mentioned this issue Jun 6, 2024

#911 dcat distribution model fix inspect api updates #912

Merged

SarahJohnsonONS added a commit that referenced this issue Jun 20, 2024

Merge branch 'main' into #911-dcat-distribution-model-fix-inspect-API…

c3093c7

…-updates

SarahJohnsonONS added a commit that referenced this issue Jun 21, 2024

Merge branch 'main' into #911-dcat-distribution-model-fix-inspect-API…

80fefdd

…-updates

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dcat:distribution, model fix, inspect API updates #911

dcat:distribution, model fix, inspect API updates #911

canwaf commented Apr 12, 2024

SarahJohnsonONS commented Jun 3, 2024 •

edited by canwaf

Loading

dcat:distribution, model fix, inspect API updates #911

dcat:distribution, model fix, inspect API updates #911

Comments

canwaf commented Apr 12, 2024

SarahJohnsonONS commented Jun 3, 2024 • edited by canwaf Loading

Possible solutions:

Build activity information

csvcubed version < 0.5.0

csvcubed version >= 0.5.0

SarahJohnsonONS commented Jun 3, 2024 •

edited by canwaf

Loading