Skip to content

Virtuoso Setup Guide

Junjun Cao edited this page Jun 21, 2024 · 7 revisions

Current Staging Instance

The staging server (https://virtuoso.staging.simssa.ca) was setup according to the instructions bellow. For information on the server itself, see the DDMAL internal Wiki.

Set up docker

(official Virtuoso Docker setup guide here)

  1. Pull the docker image (line 1) and check the image version (optional, line 2).

    sudo docker pull openlink/virtuoso-opensource-7
    sudo docker run openlink/virtuoso-opensource-7 version
  2. Start a docker container.

    sudo mkdir my_virtdb
    cd my_virtdb
    sudo docker run \
        --name my_virtdb \
        --interactive \
        --tty \
        --env DBA_PASSWORD=mysecret \
        --publish 1111:1111 \
        --publish  8890:8890 \
        --volume `pwd`:/database \
        openlink/virtuoso-opensource-7:latest

This creates a new Virtuoso database in the my_virtdb subdirectory and starts a Virtuoso instance with the HTTP server listening on port 8890 and the ISQL data server listening on port 1111.

Note that you should change the DBA_PASSWORD to the desired password.

Add data to the local instance

This can be done before or after the configuration.

  1. Get into the local instance.
docker exec -it <docker id> bash

The <docker id> can be retrieved by running docker ps.

  1. Download data in compact json to local instance.
#!/bin/bash

# Download and rename files from different URLs
wget -O simssadb.jsonld raw.githubusercontent.com/DDMAL/linkedmusic-datalake/main/simssadb/jsonld/compact.jsonld
wget -O cantusdb.jsonld raw.githubusercontent.com/DDMAL/linkedmusic-datalake/main/cantusdb/jsonld/compact.jsonld
  1. Upload data.

Open the isql CLI (use the correct username and password)

isql -U dba -P mysecret

Then load the json-ld files: (see details of rdf_load_json_ld() here)

rdf_load_json_ld (file_to_string('simssadb.jsonld'),'', 'urn:simssadb');
rdf_load_json_ld (file_to_string('cantusdb.jsonld'),'', 'urn:cantusdb');

Add packages to virtuoso

  1. Go to the local server http://localhost:8890/. Log into conductor using
username: dba 
password: mysecret
  1. Go to System Admin > Packages. Download conductor, fct, iSPARQL, rdf_mappers (download rdf_mappers [here](http://download3.openlinksw.com/uda/vad-vos-packages/7.2/rdf_mappers_dav.vad) and install from upload). You can find the rest of the packages here if not previously installed.

  2. Check if faceted search works here http://localhost:8890/fct/. Try SPARQL here http://localhost:8890/sparql/.

  3. Configure data and permissions.

    Open the ISQL CLI:

    -- Permission for Sponging (optional)
    -- see https://github.com/openlink/virtuoso-opensource/issues/1180
    
    DB.DBA.RDF_DEFAULT_USER_PERMS_SET ('SPARQL', 7); 
    DB.DBA.RDF_DEFAULT_USER_PERMS_SET ('nobody', 7); 
    
    -- Post Installation Setup for Virtuoso Faceted Browser
    -- see: https://vos.openlinksw.com/owiki/wiki/VOS/VirtFacetBrowserInstallConfig#Post%20Installation
    RDF_OBJ_FT_RULE_ADD (null, null, 'All');
    VT_INC_INDEX_DB_DBA_RDF_OBJ ();
    urilbl_ac_init_db();
    s_rank();
    
    -- For federated SPARQL query search, see https://community.openlinksw.com/t/sparql-federated-query/4162/4
    grant execute on "DB.DBA.SPARQL_SINV_IMP" to "SPARQL";
    grant select on "DB.DBA.SPARQL_SINV_2" to "SPARQL";
    
    -- Grant privileges to user "SPARQL", might not be needed?? 
    -- TODO: See if this is actually needed
    grant SPARQL_SELECT to "SPARQL";
    grant SPARQL_UPDATE to "SPARQL";
    grant SPARQL_SPONGE to "SPARQL";
    
    

Note: Make sure to rerun these lines after loading a new JSON-LD (for text indexing and entity label table)

    VT_INC_INDEX_DB_DBA_RDF_OBJ ();
    urilbl_ac_init_db();

Other configurations:

1. Add the Name Space Prefix to facilitate the SPARQL query

  • Set prefixes under "Conductor">"Namespaces" (for example: setting "wdt" and "wd" for wikidata prefixes to assist SPARQL querying)

wd:http://www.wikidata.org/entity/
wdt:http://www.wikidata.org/prop/direct/

2. Sponger

Optional: Sponge urls within the json-ld

!Note: The current Virtuoso Staging instance doesn't Sponge external information. This documentation is here in case we decide to do it in the future.

This is for retrieving external RDF data that can be reached from the loaded JSON-LD (ie. Wikidata RDF). After discussing with Ich, this might or might not be what we want.

(See more about sponging here)

In interactive SQL (ISQL), run: (Change the grab-depth and limit)

SPARQL
define input:grab-all "yes" define input:grab-depth 2 define input:grab-limit 100
SELECT * 
FROM NAMED <urn:test>
WHERE { GRAPH ?g { ?s ?p ?o } };

Accounting for codes above:

Upon execution, one may find there appear New Named Graphs(presumed as NNG) in your local Virtuoso, which graphs are named according to instances from the <urn:test> graph. As long as an instance is an accessible URL(presumed as A), namely a visitable webpage, sponger can incorporate those URLs(presumed as B1,B2,...) that link A, and convert them into RDF in the NNG.

To focus on sponging wikidata fields:

SPARQL
define input:grab-all "yes"
define input:grab-depth 5
define input:grab-limit 20

SELECT ?s ?p ?o
FROM NAMED <urn:test>
WHERE {
  GRAPH ?g {
    ?s ?p ?o .
    FILTER(STRSTARTS(STR(?p), "http://www.wikidata.org/"))
  }
};