Is there a good way of doing bulk updates w/ SPARQLUpdateStore? #423

pudo · 2014-08-23T20:58:12Z

I want to speed up some imports, so I've just made this update buffer. Is there a less insane way of doing this?

uholzer · 2014-08-25T18:45:19Z

Not really. There are solutions for special cases, however:

When you just want to add triples, use addN.

If you need to create/update/delete whole graphs, check whether your endpoint supports the graph store HTTP Protocol.

uholzer · 2014-08-26T20:53:47Z

By the way, it just occured to me that indeed there is a transactional interface (Store.commit and Store.rollback called by Graph.commit and Graph.rollback). So maybe it would be better to implement this interface in a subclass of SPARQLUpdateStore. Of course, the implementaion would look exactly like your solution.

pudo · 2014-08-27T08:05:01Z

@uholzer on a related note, is there any write-up on which triplestores actually work with rdflib, and what dance one has to dance to make that happen? Fuseki has worked for me but is really slow, Virtuoso and Stardog don't seem to get along with RDFLib --

gromgull · 2014-08-27T08:09:27Z

I've not touched this code in a while, but when I wrote it I tested against Fuseki (and only fuseki :)

It will always be kind of slow as long as you are using the SPO store interface (add/remove/slicing/subjects/etc.) Serializing and deserializing everything over http eats pretty much any advantage you gain from using a faster non-python based store.

pudo · 2014-08-27T08:12:04Z

@gromgull Oh, I'm not actually so concerned about write speed, but their SPARQL interface just doesn't seem to scale at all. I'm doing a reasonably complex graph query and it takes 14s to come back - which just makes it not an option for a production web application.

wwaites · 2014-08-27T08:19:39Z

Ages ago I wrote some bindings via pyodbc for Virtuoso,

 https://bitbucket.org/ww/virtuoso

I seem to remember it was very picky about its idea of transaction
isolation and locking -- much more so than any other database that I
have used. If you can get past that, there might be some mileage in
it. I think it presented itself as an rdflib store...

There might also a way to compose your query differently for Fuseki
and Jena to make it run faster. I've worked with Dave Reynolds (@der)
before who knows the internal details and may be able to help.

Best,
-w

pudo · 2014-08-27T09:21:38Z

Yeah, I've seen the package but a) I'm scared of things that haven't been maintained in more than three years (as you've seen I've already had to get into telescope much more than i wanted); and b) I just want to keep the deployment process reasonably simple - HTTP helps a lot there, while ODBC just seems like an unnecessary hurdle.

I really need to work on these queries, but my sense is that it's just incredibly easy to slow the whole thing down to a crawl. Perhaps I'm doing some stuff fundamentally wrong, though.

Here's more discussion on the subject: uf6/design#6

uholzer · 2014-08-27T09:41:22Z

As @gromgull said, using SPARQLStore's SPO interface is slow as it has to do a query for every operation. That a sincle SPARQL query is executed slowly on a SPARQL endpoint is entirely the endpoint's fault (or you wrote a difficult query). All you can do is to try different endpoints. Also try endpoints you maybe never have heard about yet. I think SWI Prolog also has one which has good performance when using the in-memory store. I never did and don't know of a proper comparison though.

pudo · 2014-08-27T10:21:30Z

@uholzer thanks for your advice. I understand that the speed of SPARQL query execution is a backend (or query implementation) issue, that's why I was trying to try out different servers before I realized that rdflib only really connects to fuseki (from what I gather).

On the whole, my sense is that this entire ecosystem is more tailored to the demands of an academic environment rather than user-facing web apps; so I should probably just go with Neo4J for graph storage like everybody else :)

pudo closed this as completed Aug 26, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a good way of doing bulk updates w/ SPARQLUpdateStore? #423

Is there a good way of doing bulk updates w/ SPARQLUpdateStore? #423

pudo commented Aug 23, 2014

uholzer commented Aug 25, 2014

uholzer commented Aug 26, 2014

pudo commented Aug 27, 2014

gromgull commented Aug 27, 2014

pudo commented Aug 27, 2014

wwaites commented Aug 27, 2014

pudo commented Aug 27, 2014

uholzer commented Aug 27, 2014

pudo commented Aug 27, 2014

Is there a good way of doing bulk updates w/ SPARQLUpdateStore? #423

Is there a good way of doing bulk updates w/ SPARQLUpdateStore? #423

Comments

pudo commented Aug 23, 2014

uholzer commented Aug 25, 2014

uholzer commented Aug 26, 2014

pudo commented Aug 27, 2014

gromgull commented Aug 27, 2014

pudo commented Aug 27, 2014

wwaites commented Aug 27, 2014

pudo commented Aug 27, 2014

uholzer commented Aug 27, 2014

pudo commented Aug 27, 2014