DCAT - Easy RDF Data Management

Retrieving, publishing and loading data in a single DCAT-centric tool.

One could say, DCAT is to datasets what the pom is to Java software projects.

In a nutshell

Question: How many commands does it take to load the 50+ files of this ckan record into a virtuoso triple store (with default port and credentials)?

Answer: 2

dcat import ckan --url=http://ckan.qrowd.aksw.org --dataset=org-linkedgeodata-osm-bremen-2018-04-04 > /tmp/dcat.nt
dcat deploy virtuoso --allowed=/writeable/dir/readable/by/virtuoso /tmp/dcat.nt

Note: It works also for the DCAT based DBpedia DataID datasets:

dcat show http://downloads.dbpedia.org/2016-10/core-i18n/en/2016-10_dataid_en.ttl > /tmp/dcat.ttl

Question: And how do I create a graph group so I can view all these files as a single graph?

Answer: It already happened

Question: So I have this DCAT file with dcat:downloadURL pointing to local files. How can I publish it to CKAN?

Answer: Like this:

dcat deploy ckan --url=http://ckan.example.org --apikey=my-ckan-api-key dcat.nt

Install Latest Release

Installing as root will perform global install in the folders /usr/local/share/dcat-suite and /usr/local/bin. For non-root users, the folders are ~/Downloads/dcat-suite and ~/bin. Run setup-latest-release.sh uninstall to conveniently remove downloaded and generated files.

via curl

bash -c "$(curl -fsSL https://raw.githubusercontent.com/SmartDataAnalytics/dcat-suite/develop/setup-latest-release.sh)"
via wget

bash -c "$(wget -O- https://raw.githubusercontent.com/SmartDataAnalytics/dcat-suite/develop/setup-latest-release.sh)"

Feature overview

API	DCAT retrieval	Deploy RDF	Deploy non RDF
CKAN	X	X	x
Virtuoso RDF Bulk Loader	.	X	n/a
Generic SPARQL	.	.
URL to DCAT resource	X	n/a	n/a

. = future work

DCAT Example

Here is a short example of a DCAT dataset description in order to give you an impression of what we are talking about.

@prefix eg: <http://example.org/> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .

eg:myDataset
    a dcat:Dataset ;
    dct:identifier "my-dataset" ;
    dct:title "My Dataset" ;
    dct:description "Really useful dataset" ;
    dcat:distribution eg:myFirstDistribution-of-myDataset ;
    .

eg:myFirstDistribution-of-myDataset
    a dcat:Distribution ;
    dct:title "My Distribution" ;
    dct:description "Download of my distribution" ;
    dcat:accessURL <a/relative/path/or/a/url/of/a/web/resource/or/a/named/graph> ;
    .

CLI Quick Usage

Show help

dcat --help

Show all DCAT related information from an RDF URI or filename

dcat show my-dcat.nt

Deploy datasets based on a DCAT description to CKAN

dcat deploy ckan --apikey=yourApiKey --url=yourCkanUrl my-dcat.nt

This will create a copy of the input DCAT file under target/ckan/deploy-dcat.nt file with the dcat:accessURL replaced by the CKAN resources. If you host this file anywhere on the Web, it will give you working download links - neat!

Deploy a self-describing dataset (see below) to CKAN

dcat deploy ckan --apikey=yourApiKey --url=yourCkanUrl mySelfDescribingDataset.nq

Expand the graphs of a self-describing dataset to individual files based on its contained DCAT description

dcat expand mySelfDescribingDataset.nq

# Now you can also deploy the expanded form:
cd target/dcat/mySelfDescribingDataset
dcat deploy ckan dcat.nt --url=yourCkanUrl --apikey=yourSecretKey

Building

mvn clean install

Installing the Debian package (requires root)

After the build run

./reinstall-debs.sh

What is a self-describing dataset (SDD)?

A SDD is simply a quad-based dataset that contains DCAT dataset and distribution information in its default graph.

Extracting datasets from an SDD

The dcat:accessURL attribute of distributions is thereby intepreted as follows:

If at least one of the given accessURLs matches the IRI of a graph within the SDD, ckan-deploy will deploy a an RDF file to CKAN that is the union of all graphs denoted by accessURLs. An error will raised if any other accessURL points to a non-existent graph.
If there is at most one accessURL, a CKAN resource will be created, with the URL attribute set if present.
An error is raised otherwise

How to create self-describing datasets?

You can use your favourite RDF tool.

Shamless self-advertisement: Sparql Integrate is a tool that enables expressing data integration workflows as a sequence of SPARQL queries that make use of function extensions for XML, CSV and JSON processing. Hence, it makes it fairly easy to create quad based datasets. You only need to design your workflow such that it outputs appropriate DCAT descriptions.

Example

This example assumes that the debian packages of ckan-deploy and sparql-integrate are installed.

cd /tmp

git clone https://github.com/QROWD/QROWD-RDF-Data-Integration.git qrowd-rdf-data-integration
cd qrowd-rdf-data-integration/datasets/1046-1051

sparql-integrate workloads.sparql process.sparql emit.sparql > dataset.nq
dcat deploy ckan --url=yourCkanInstance --apikey=yourApiKey dataset.nq

The dataset entry on our CKAN: http://ckan.qrowd.aksw.org/dataset/trento-railway-time-tables

For explanations about the transformations using the *.sparql files, please refer to this page.

CLI Roadmap

These commands are not yet implemented, but appear to be useful. These descriptions are not final.

Generate a meta dcat file that treats another dcat file as a dataset. The meta file can be used to deploy the described file.

dcat meta my-datasests.dcat.nt > meta.dcat.nt

Upload rdf file via SPARQL Update

dcat deploy sparql --user=dba --pass=dba --url=http://example.org/sparql dcat.nt

TODOs

Add support for user agent field on upload
Possibly add support for profiles that bundle commonly needed information, such as apikey and user agent

Name		Name	Last commit message	Last commit date
Latest commit History 273 Commits
dcat-api-jena-core		dcat-api-jena-core
dcat-experimental		dcat-experimental
dcat-suite-app		dcat-suite-app
dcat-suite-binding-ckan		dcat-suite-binding-ckan
dcat-suite-bundle		dcat-suite-bundle
dcat-suite-cli		dcat-suite-cli
dcat-suite-core		dcat-suite-core
dcat-suite-data-client		dcat-suite-data-client
dcat-suite-debian-cli		dcat-suite-debian-cli
dcat-suite-server		dcat-suite-server
dcat-suite-service		dcat-suite-service
docs		docs
limbo		limbo
mvn-sync		mvn-sync
tools		tools
trash		trash
.gitignore		.gitignore
Java-API.md		Java-API.md
LICENSE		LICENSE
README-wip.md		README-wip.md
README.cli.md		README.cli.md
README.md		README.md
dcat.ttl		dcat.ttl
demo.txt		demo.txt
pom.xml		pom.xml
reinstall-debs.sh		reinstall-debs.sh
setup-latest-release.sh		setup-latest-release.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DCAT - Easy RDF Data Management

In a nutshell

Install Latest Release

Feature overview

DCAT Example

CLI Quick Usage

Building

Installing the Debian package (requires root)

What is a self-describing dataset (SDD)?

Extracting datasets from an SDD

How to create self-describing datasets?

Example

CLI Roadmap

TODOs

About

Releases 3

Packages

Contributors 3

Languages

License

SmartDataAnalytics/dcat-suite

Folders and files

Latest commit

History

Repository files navigation

DCAT - Easy RDF Data Management

In a nutshell

Install Latest Release

Feature overview

DCAT Example

CLI Quick Usage

Building

Installing the Debian package (requires root)

What is a self-describing dataset (SDD)?

Extracting datasets from an SDD

How to create self-describing datasets?

Example

CLI Roadmap

TODOs

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 3

Languages

Packages