-
Notifications
You must be signed in to change notification settings - Fork 36
Using template variables to construct new values
Tim L edited this page May 29, 2014
·
86 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)
Global variables:
-
[/]- base_uri/ -
[/s]- base_uri/source/source_identifier -
[/sd]- base_uri/source/source_identifier/dataset/dataset_identifier -
[/sdv]- base_uri/source/source_identifier/dataset/dataset_identifier/version/version_identifier -
[v]- the dataset's version_identifier -
[e]- the dataset's enhancement_identifier -
[D]- the dataset's subject_discriminator -
[/sD]- base_uri/source/source_identifier/dataset/dataset_identifier[/subjectDiscriminator] -
[/sDv]- base_uri/source/source_identifier/dataset/dataset_identifier[/subjectDiscriminator]/version/dataset_version -
[uuid]- provide a UUID (Note, this should only be used in extreme cases; try to construct your URIs from the data itself so that reconversion will produce the same URIs)
Contextual:
-
[@]- the local name of this property -
[r]- the row of this value -
[c]- the column of this value -
[.]- the value of the cell; empty value proceeds without special processing or omission. (implemented as[+]) -
[+]- the value of the cell; if empty, provide unique non-empty value. (only partially implemented) -
[!]- the value of the cell; if empty, omit any triple using this template variable. -
[#N]- the value of this row at column N.-
[#N]- if empty, behaves like[+]by providing a unique non-empty value. -
[#N/]- if empty, behaves like[.]by proceeding without special processing or omission.
-
-
[@PROPERTY_NAME]- the value of this row at column whose output property will be PROPERTY_NAME (undefined when multiple columns are consolidated to a single predicate).
Operators:
-
^- upper case a value; e.g.[^.^]and[^#1^] -
_- lower case a value; e.g.[_._]and[_#1_] -
[^.-]- capitalize first letter of value; leave rest the same. -
[^._]- e.g. "HARTFORD HOSPITAL" into "Hartford Hospital"` -
><- applyreplaceAll("[^a-zA-Z_0-9\\-]","_")andwhile(gsub(/__/,"_"))(Note: this is [done by default](On Identity) when constructing URIs and will be implemented if we need to relax that assumption). This was added for literals to help conversion:object_search, but only trims spaces. xsd:decimal([#4])- e.g.
"[/]id/url/md5/md5([#1])" - e.g.
"http://lod.hackerceo.org/VIVO2DOI/bundle/increment([#1])" - e.g.
"domain([#1])
Regex Contextual:
-
[\\1]- first capture group of the conversion:regex on an conversion:object_search.- (Note: it is actually
[\1], but slashes need to be escaped in Turtle.)
- (Note: it is actually
Partially implemented:
-
[H]- the original header -
[L]- the conversion:label of this property
Contextual (not implemented yet):
-
[D]- domain of this property -
[R]- range of this property
Experimental:
-
[#H+1]- the value of the cell one below the header of the current column (i.e.,[c]) -
[#H+2]- the value of the cell two below the header of the current column (i.e.,[c])
(informative)
-
[/sdv]thing_[r]- the URI for the row -
[/sd]value-of/[@]/[.]- the URI for a cell predicate-scoped promoted -
[/sd]typed/[R]/[.]- the URI for a cell type-promoted
Datasets in http://logd.tw.rpi.edu/sparql that use templates (results):
prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix ov: <http://open.vocab.org/terms/>
select ?p ?o count(?o) as ?count
where {
graph <http://purl.org/twc/vocab/conversion/ConversionProcess> {
?s ov:csvCol ?col; ?p ?o .
filter (?p != (conversion:label)) # templates to name predicate are not recognized.
filter (?p != (conversion:comment)) # templates in predicate comments are not recognized.
filter (?p != (conversion:delimits_object)) # delimits_object specifies a pattern, not template.
filter (?p != (conversion:key_template)) # key_template is DEPRECATED; replaced by domain_template.
filter regex(?p, "^http://purl.org/twc/vocab/conversion/.*")
filter regex(?o, ".*\\[.*\\]") # NOTE: This string is not correctly rendered.
}
} group by ?p ?o order by ?p ?o desc(?count)
- conversion:domain_template (will become conversion:subject_template)
- conversion:range_template (will become conversion:object_template)
See also Patterns versus Templates.
The comments on edu.rpi.tw.data.csv.impl.CSVRecordTemplateFiller list the template variables and their behavior.
The following methods implement the template filling:
-
edu.rpi.tw.data.csv.impl.DefaultEnrichmentParameters#fillTemplatefills the namespace-type variables[/]etc. -
edu.rpi.tw.data.csv.impl.CSVRecordTemplateFiller#fillTemplate(String)fills the row-contextual variables[@p],[#3],[r],[c], etc. -
edu.rpi.tw.data.csv.valuehandlers.LiteralCodebookValueHandlerfills the regex capture groups on its own.
https://github.com/timrdf/csv2rdf4lod-automation/issues/issue/13