Consistent/reproducible TTL and TRIG formatting with Jena? #2672
Replies: 3 comments 6 replies
-
Firstly - be careful what you ask for! Setting Jena up to process blank node as given is fragile and not RDF. Processing RDF as a graph is not editing the file. If a file is read in twice, in the absence of any other information, RDF requires the blank nodes be kept apart. That's why the parser create large unique ids. One approach is to output blank nodes are Another approach: The parsers can run in a non-compliant mode whereby blank node labels are preserved. Coupled with a custom writer, formatting might be preserved. The "might" is because adding triples may make a major change to the internal indexing which uses hash maps. The triples to be output. may come out in a very different order for a small change to the graph. The writer is going to have to slurp the whole graph and output in its own defined order i.e. sort the data. What are you going to do about See also #2549 - Example code: Graph graph = RDFParser.source("D.ttl").labelToNode(LabelToNode.createUseLabelAsGiven()).toGraph();
NodeFormatter fmt = new NodeFormatterNT() {
@Override
public void formatBNode(AWriter w, String label) {
w.print("_:");
//String lab = NodeFmtLib.encodeBNodeLabel(label);
// w.print(lab);
w.print(label);
}
};
AWriter out = IO.wrapUTF8(System.out);
StreamRDF stream = new WriterStreamRDFPlain(out, fmt) ;
StreamRDFOps.graphToStream(graph, stream); |
Beta Was this translation helpful? Give feedback.
-
True.
Unlabelled blank nodes become Not sure about RDF Dataset Canonicalization - it might help but it is choosing the blank node hash. As you note, it's only stable if no changes happen which might or might not suit your situation. |
Beta Was this translation helpful? Give feedback.
-
My approach is to extend turtle-formatter |
Beta Was this translation helpful? Give feedback.
-
I am looking into a way of working with RDF files (TTL, mostly) in a git repo. The data can contain blank nodes. I would like to use a formatter via a build tool so as to have nice human-readable files that are always formatted the same way such that a git diff will only show changes to the actual data. The only problem I see there is the ordering of blank nodes. Is there a way to achieve consistent ordering of blank nodes in TTL output with JENA?
... Edit: also asking for a friend: atextor/turtle-formatter#8
Beta Was this translation helpful? Give feedback.
All reactions