-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YAML Streams and JSON Sequences #63
Comments
Thank you for pointing this out. This approach is also known as:
It seems that a conversion of a JSON Lines stream to a YAML stream, and vice versa, is straightforward. I have tested yq. I cannot seem to be able to convert a JSON Sequence file into a YAML Stream using that tool. Documented it here: mikefarah/yq#1279 The reverse direction is functional though: $ cat sample.yaml
foo: bar
---
baz: kaboom
$ yq -o=json -I=0 . sample.yaml
{"foo":"bar"}
{"baz":"kaboom"} So, at least partially, JSON Sequence to YAML Stream mapping is kind of supported in the ecosystem already. Would be interesting to draw a comparison matrix for other tools and libraries. |
Edited: as @TallTed points, it's application/json-seq @anatoly-scherbakov could you please integrate the examples above showing the 0x1E and 0x0A characters when they are present? I have not understood whether the output is a compliant JSON Sequence (mediatype Producing a seq+json might need to define a new mediatye (e.g. ld+json-seq :DDD). This is another interesting thread. Personally, I don't know if |
Regarding media types with multiple plus signs, see Draft RFC Media Types with Multiple Suffixes. They're not yet officially permitted nor sanctioned by IANA nor IETF, but at least one such media type registration is pending. |
Note that NDJSON/JSONL doesn't make use of the
They both seem generally related to the concept of a YAML Stream, but differ from the treatment of multiple JSON script tags in HTML.
Actually As @TallTed notes, there is a proposal for multiple If we constrain ourself to the JSON-LD internal representation, we don't have a target for a document stream. Of course, we could introduce such by extension, but it doesn't really seem to relate to the -LD use case, without also extending into some notion of multiple graphs, which aren't treated as a Dataset. |
It turns out that |
Thank you for pointing out the difference between the spec and NDJSON. Indeed, they use different separator characters. I have encountered usages of NDJSON (say, you can process that with jq) and support for that format in commercial systems (say in https://data.world, which is by the way RDF oriented and supports SPARQL). I haven't seen usages for JSON Sequences RFC before. |
@anatoly-scherbakov if you could wrote a one pager presenting the various JSON seq alternatives on the market i will ask to mediatype folks if standardizing a different format for JSON is advisable. This is clearly unrelated to this spec and this issue should be probably addressed in yaml media type document. Wdyt? |
@ioggstream I started to write a little memo about it but found this page in Wikipedia which, it seems, about does the job. Does it? |
@anatoly-scherbakov I think we can say
WDYT? |
If we say this, then we can't really have any tests involving multi-document streams, which is fine. If the concept of multi-document streams is important, it would be for JSON-LD as well, and should probably be taken up there. However, I don't really see how it fits with our data model, which already has the concept of named graphs which is a closely related way of dealing with this in the RDF world. |
Yes, certainly. |
Can you clarify the relation between a YAML-LD document and a named graph? iiuc a YAML stream can contain multiple documents, and each one can contain multiple named graphs. |
Named Graphs are often used to describe the content of some particular RDF source, particularly when used in SPARQL. That is why I said that the closet analog to multiple files from the RDF world is probably named graphs. Generally, each separate document would be considered an "RDF Source", and the only system that I can think of that deals with more than one RDF Source is SPARQL. For example:
Where http://example.org/alice and http://example.org/bob represent different endpoints/RDF Sources. (Example taken from 13.2.3 Combining FROM and FROM NAMED in SPARQL 1.1 Query. In this view, each document in a YAML-LD stream would be a different RDF Source, although the analogy breaks down as IIUC there is no way to name separate documents in a YAML stream, but we might define a confention, if naming individual documents is deemed important. |
How many graphs does the following yaml document from https://w3c.github.io/json-ld-syntax/#example-referencing-named-graphs-using-an-id-map-with-none contain? ---
- "@id": http://example.org/foaf-graph
http://www.w3.org/ns/prov#generatedAtTime:
- "@value": 2012-04-09T00:00:00
"@type": http://www.w3.org/2001/XMLSchema#dateTime
http://example.org/graphMap:
- "@graph":
- "@id": http://manu.sporny.org/about#manu
"@type":
- http://xmlns.com/foaf/0.1/Person
http://xmlns.com/foaf/0.1/name:
- "@value": Manu Sporny
http://xmlns.com/foaf/0.1/knows:
- "@id": https://greggkellogg.net/foaf#me
- "@graph":
- "@id": https://greggkellogg.net/foaf#me
"@type":
- http://xmlns.com/foaf/0.1/Person
http://xmlns.com/foaf/0.1/name:
- "@value": Gregg Kellogg
http://xmlns.com/foaf/0.1/knows:
- "@id": http://manu.sporny.org/about#manu
... |
If you click on the "TriG" tab, you'll see it in RDF. It contains two anonymous graphs:
Except at the very top |
So iiuc I have a single YAML-LD document containing two graphs, right? |
Yes (you can try it in my distiller), but I needed to quote the datetime value. |
@gkellogg I think the simplest thing to do is to just map a YAML-LD stream to a sequence of JSON-LD files. If there's no analogy with JSON-LD, I don't think we should not force an RDF structure on YAML streams. |
We certainly need a case for when a stream contains just a single document for the API methods to operate upon. We don't have a model for how to run API methods over multiple documents in a single go. In the For the other cases, we could say that the API is run on each document, in turn, and the result is a stream containing the result of processing each document. It's really the case of the stream being turned into JSON, or the results of processing being rendered as JSON where we don't have a stream model. If we are to cover this case, without introducing something like NDJSON-LD, the expected result would probably look like how JSON-LD handles multiple HTML script elements containing JSON-LD, or it is simply left unspecified. |
I think it's a reasonable choice since we don't currently have user feedback. We could provide further specifications on that. Q: is a file containing a JSON-LD Frame, a .jsonld file? Do json-ld frames have a specific media type/file extension? |
Yes, that’s the convention. IIRC, there is an HTTP profile parameter that can be used to identify a frame document, but never happens in practice. |
|
Currently, we don't really deal with streams. An extension for something like NDJSON-LD might be an interesting topic for TPAC. |
Generally feeling that this may be useful beyond JSON-LD and YAML-LD, and something like an "LD Streams" framework might be useful which this could fit into.Gregg Kellogg: Touched on this earlier, JSON-LD doesn't have concept of multiple documents. How did we deal w/ YAML streams? Treat each document in there as its own JSON-LD document and process accordingly. ✪
Gregg Kellogg: JSON-LD defined as API, might need sequences API calls and recompose possibly, YAML-LD, compact things in stream, do them in sequence? Seems that this needs to bounce back up to JSON-LD. Is there an analog? ✪
Pierre-Antoine Champin: My concern regarding that, I can see a number of use caess, sensor use case earlier. ✪
Pierre-Antoine Champin: I'm not sure if we can come up with a unique way of dealing with those things. That might be just a lack of imagination. ✪
Benjamin Young: Yes, would like to see this happen, not expressly YAML related... YAML's origin is out of mime documents and email containers, where you were sending a bundle that was all inerrelated. First document was foundational, other documents were attachments. ✪
Anatoly Scherbakov: Thank you very much all, I will unfortunately have to leave. It was quite interesting to participate, thank you again! ✪
Benjamin Young: Newline deliminated JSON and JSON -- server sent events, event notifications, JSON - what's coming next... YAML multidoc is understood as a unit together. Context documents being sent together in stream. For most newline deliminted JSON, interesting things to explore here, what's been done for Link header for example on bare JSON documents. ✪
Benjamin Young: For example, if you start w/ context, maybe that context applies to everything in the stream. Where should these happen, where shouldn't they happen, this isn't only about YAML. ✪
Gregg Kellogg: There is a broader concept of LD streams, could have applications in JSON-LD, fits in nicely with YAML... but why not other formats, why not NTriple streams? One could argue that NTriples are another multidocument mechanism since all statements stand on their own. ✪
Gregg Kellogg: There might be a notion of stream documents, each element of stream could have its own format. Does each have its own location? Even though this issue isn't about YAML streams, it begs for additional work for LD Streams and until that happens, with regard to this issue... there are two ways forward, 1) YAML-LD is only defined for streams in a single document, or 2) YAML-LD streams are treated the way multiple script elements are treated, ✪
That algorithm creates an array, merges content to it, that's the only way you can really do this until something bigger/better happens. ✪
Phil Archer: Linked Data fragments was similar. ✪
Gregg Kellogg: Yes, it was, had more to do with SPARQL querying... ✪
Gregg Kellogg: There was work done on JSON-LD streaming, but specifically took into consideration open pipe on which you were continuously interpreting. ✪
Gregg Kellogg: These are all references that should be considered. ✪
Gregg Kellogg: We can leave this for a future meeting noting this discussion. ✪
Benjamin Young: JSON-LD Streaming note was about parsing a *single* JSON documents as it was streamed into the parser (which is very different than a stream of individual JSON-LD docs)--just to clarify relationships. ✪
|
@gkellogg sorry for the delay. Will the streams topic be left as unspecified and deferred to further documents/wg? |
A necessary start for this is a description of NDJSON-LD, for which we've setup a repo (thanks @pchampin) https://github.com/json-ld/ndjson-ld. @niklasl has done some related work and volunteered to get a start on a specification. Given that, we can refer to it from YAML-LD. |
Ok. We are in WGLC for YAML media types, and we are waiting for IETF media type feedback. I really hope that when YAML is registered (e.g. hopefully before 2023/06) we are ready to file the .yamlld registration - even if with a preliminary work that allows us to enable its basic usage for IDEs and content-negotiation. |
We created https://github.com/json-ld/ndjson-ld to work on a specification for NDJSON-LD; @niklasl volunteered to get it started. It should be fairly straight forward, just delegating the various API calls to each line from the NDJSON document, and imposing a serialization requirement on the result that each line be serialized without additional whitespace. This could, perhaps, just use JCS, but that might be overkill. The stumbling blocks in the current spec could then defer and update the API methods from NDJSON-LD. We may need to consider provisions for operating on YAML as a stream or a document, as is the case for most existing YAML libraries. |
The NDJSON issue ndjson/ndjson.github.io#1 goes into some of the areas of divergence. It also notes LDJSON, but IMO, LDJSON-LD would be a bit too much :) Basically, NDJSON purports to be a living spec, while JSON Lines does not. But, looking through the comments, RFC7464 may actually be the better fit, as it does not restrict the use of newlines within individual JSON records, as RS is not otherwise valid within JSON. We can dog-shed the naming issue later. We should put many of these issues into the ndjson-ld repo (or whatever we eventually decide to name it). cc/ @lrosenthol |
This issue was discussed in the 2022-10-12 meeting.Subtopic: YAML Streams and JSON Sequences yaml-ld#63 ✪Gregg Kellogg: That's what pushed NDJSON-LD ✪
Gregg Kellogg: Roberto proposes to map a YAML-LD to a sequence of JSON-LD files ✪
Gregg Kellogg: Proposing to update the spec with a hypothetical mapping to NDJSON-LD so as we can start to flush out the missing components of the spec right now. I will spend some time on that. ✪
Leonard Rosenthol: Does this only apply to streams, or also for a YAML-LD file that contains multiple documents? ✪
Gregg Kellogg: In YAML, stream is a sequence of documents separated by "---". This has a well defined meaning within YAML. In YAML-LD spec, part of the process is to convert YAML-LD into Internal Representation, which includes splitting stream into individual documents. ✪
Gregg Kellogg: What if a stream contains a single document? Does it yield that document, or a stream with that document? For NDJSON-LD probably that's the latter, and for YAML-LD this might depend upon HTTP media type or an API method perhaps (different methods for streams vs documents). This is a subject of consideration. ✪
Leonard Rosenthol: Makes sense. I am thinking of this in respect to having physical files more than something else. ✪
Gregg Kellogg: In file representation or, say, in a multipart/MIME email, or in a stream where you process records as they come through, — this can be hard in API sense. API endpoints create promises and you might expect the promise to fulfill only once the entire stream is processed. Might be not adequate for a real time stream. But we might just focus on the "closed" use case and leave the "open stream" use case for later. ✪
Gregg Kellogg: We need to list use cases for both and look at the other W3C work on realtime processing and open data streams to see if we can find any relevance. ✪
|
Question
I've been pointed to JSON Sequences https://datatracker.ietf.org/doc/html/rfc7464
Maybe this can be a related work for converting multi-document YAML Streams to JSON-LD.
WDYT?
@gkellogg @VladimirAlexiev
The text was updated successfully, but these errors were encountered: