Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Join between non-RDF and RDF data on the subject position #12

Open
LorenzBuehmann opened this issue Feb 18, 2020 · 1 comment
Open

Join between non-RDF and RDF data on the subject position #12

LorenzBuehmann opened this issue Feb 18, 2020 · 1 comment
Labels
Milestone

Comments

@LorenzBuehmann
Copy link
Member

Currently, RDF data is parsed as URI and put into a DataFrame with (shortened) URIs.

Consider the N-Triples

<http://example.org/a1> <http://example.org/p1> <http://example.org/b1> .
<http://example.org/a1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/A> .

and the mapping

<#AMapping>
	rml:logicalSource [
		rml:source "/tmp/datalake-test/a.nt";
		nosql:store nosql:rdf
	];
	rr:subjectMap [
		rr:template "{id}";
		rr:class ex:A
	];

	rr:predicateObjectMap [
		rr:predicate ex:p1;
		rr:objectMap [rml:reference "example.org/p1"]
	] .

the data will be converted to this DataFrame

root
|-- id: string (nullable = true)
|-- example.org/p1: string (nullable = true)

+--------------+--------------+
|id |example.org/p1|
+--------------+--------------+
|example.org/a1|example.org/b1|
+--------------+--------------+

The problem now is, any other data is just handled by the plain values contained the the corresponding datasource, i.e. it's never handled internally as URI as one would expect by the RML mappings.

Consider the CSV file

nr,p2
b1,c1
b2,c2
b3,c3

and the mapping

<#BMapping>
	rml:logicalSource [
		rml:source "/tmp/datalake-test/b.csv";
		nosql:store nosql:csv
	];
	rr:subjectMap [
		rr:template "http://example.org/{nr}";
		rr:class ex:B
	];

	rr:predicateObjectMap [
		rr:predicate ex:p2;
		rr:objectMap [rml:reference "p2"]
	] .

the DataFrame will just be

root
 |-- nr: string (nullable = true)
 |-- p2: string (nullable = true)

+---+---+
|nr |p2 |
+---+---+
|b1 |c1 |
|b2 |c2 |
|b3 |c3 |
+---+---+

Clearly, any join would fail and result in an empty DataFrame:

prefix ex: <http://example.org/>

select * where {
  ?s a ex:A ;
       ex:p1 ?o .
   ?o ex:p2 ?o1 .
}
@mnmami mnmami changed the title RDF support broken Join on RDF subjects not working Feb 18, 2020
@mnmami
Copy link
Collaborator

mnmami commented May 1, 2020

Hi Patrick,

Thanks for creating an issue for this interesting situation when one needs to query RDF and non-RDF data in the same query. Since the subject (in RDF) is always a URI, one cannot join non-RDF data with RDF data on the subject position.

For this to work, we need to incorporate a way (both at RML mappings and Squerall code) to either extract a plain value from the URI in the RDF data, or create a URI from a non-RDF plain value (by attaching a full namespace).

@mnmami mnmami changed the title Join on RDF subjects not working Join between non-RDF and RDF data on the subject position May 1, 2020
@JensLehmann JensLehmann added this to the 0.9 milestone Jun 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants