-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a good way of doing bulk updates w/ SPARQLUpdateStore? #423
Comments
Not really. There are solutions for special cases, however: When you just want to add triples, use If you need to create/update/delete whole graphs, check whether your endpoint supports the graph store HTTP Protocol. |
By the way, it just occured to me that indeed there is a transactional interface ( |
@uholzer on a related note, is there any write-up on which triplestores actually work with rdflib, and what dance one has to dance to make that happen? Fuseki has worked for me but is really slow, Virtuoso and Stardog don't seem to get along with RDFLib -- |
I've not touched this code in a while, but when I wrote it I tested against Fuseki (and only fuseki :) It will always be kind of slow as long as you are using the SPO store interface (add/remove/slicing/subjects/etc.) Serializing and deserializing everything over http eats pretty much any advantage you gain from using a faster non-python based store. |
@gromgull Oh, I'm not actually so concerned about write speed, but their SPARQL interface just doesn't seem to scale at all. I'm doing a reasonably complex graph query and it takes 14s to come back - which just makes it not an option for a production web application. |
Ages ago I wrote some bindings via pyodbc for Virtuoso,
I seem to remember it was very picky about its idea of transaction There might also a way to compose your query differently for Fuseki Best, |
Yeah, I've seen the package but a) I'm scared of things that haven't been maintained in more than three years (as you've seen I've already had to get into telescope much more than i wanted); and b) I just want to keep the deployment process reasonably simple - HTTP helps a lot there, while ODBC just seems like an unnecessary hurdle. I really need to work on these queries, but my sense is that it's just incredibly easy to slow the whole thing down to a crawl. Perhaps I'm doing some stuff fundamentally wrong, though. Here's more discussion on the subject: uf6/design#6 |
As @gromgull said, using SPARQLStore's SPO interface is slow as it has to do a query for every operation. That a sincle SPARQL query is executed slowly on a SPARQL endpoint is entirely the endpoint's fault (or you wrote a difficult query). All you can do is to try different endpoints. Also try endpoints you maybe never have heard about yet. I think SWI Prolog also has one which has good performance when using the in-memory store. I never did and don't know of a proper comparison though. |
@uholzer thanks for your advice. I understand that the speed of SPARQL query execution is a backend (or query implementation) issue, that's why I was trying to try out different servers before I realized that rdflib only really connects to fuseki (from what I gather). On the whole, my sense is that this entire ecosystem is more tailored to the demands of an academic environment rather than user-facing web apps; so I should probably just go with Neo4J for graph storage like everybody else :) |
I want to speed up some imports, so I've just made this update buffer. Is there a less insane way of doing this?
The text was updated successfully, but these errors were encountered: