Digital Article Git Tree (DAGT) #46

castedo · 2022-05-14T02:04:07Z

castedo
May 14, 2022

Some folks, especially @tarleb, might be interested some possibilities I've recently learned about and plan to explore.

The lovely Software Heritage is already archiving JATS XML files and directories with pandoc markdown. Furthermore, they can be identified and retrieved via cryptographic hashes. Just by publicly using gitlab and github, authors and some publishers are archiving them with persistent IDs (the cryptographic hashes). Most of them probably don't even know. For example, one of the submitters to the Journal of Open Source Software has their submitted directory with pandoc markdown persistently identified by this standard URI:

swh:1:dir:5743b6828e5c7bcc7e5e50968f01e0447825d6e9

No DOI needed. And a copy of the archived data can be retrieved via:

https://archive.softwareheritage.org/swh:1:dir:5743b6828e5c7bcc7e5e50968f01e0447825d6e9

5743b6828e5c7bcc7e5e50968f01e0447825d6e9 persistently identifies the contents of an article independent of how it gets presented (HTML vs PDF) or where. The directory is encoded and hashed in a way such that the same directory contents result in the same cryptographic hash identifier. The encoding of the directory is the git tree object. It is worth clarifying that the git tree does not include all the history and commits in a git repository. It's only the contents of the directory.

For lack of a better term, I'm thinking of using the term "Digital Article Git Tree" (DAGT) for a standardized layout of a directory which can be processed kind of how some publishers use JATS XML as their single-source. I'm exploring this now because I'm realizing it's going to take me too much time to do similar with JATS XML. Generating HTML and PDF is going to be easier and doing diffs is going to be MUCH easier. Interop with docx will be easier too. I suspect long term, a JATS XML file is better more interoperable encoding of article source. But for now, I think I'm going to make faster progress with this DAGT approach.

Has anybody seen similar approaches? Any feedback, warnings, potential issues, advice?

I suspect there are some great uses for using cryptographic hashes as "instant DOIs" to scholarly articles encoded as DAGT and/or JATS XML.

I should clarify that my idea for DAGT is NOT that this is the original source that authors write. The DAGT is most likely almost always generated from source in separate git repositories or from user friendly applications. I'm thinking of DAGT, like JATS XML, as an interoperability format. Not the true original source that generates the article content.

tarleb · 2022-05-20T20:17:52Z

tarleb
May 20, 2022
Maintainer

Thanks for the info @castedo! I love the approach.

1 reply

mrchristian Oct 8, 2022
Maintainer

Hi @castedo can you post the link here of your demo examle of the crypto ID example that you showed at the community week a while back, sorry mislaid the link :-) I really like the idea and I'd like to try it out. Thanks

castedo · 2022-05-20T22:30:30Z

castedo
May 20, 2022
Author

Quick update. I've got my website demonstrating the presentation of web and pdf manifestations/renderings that are generated from the JATS XML persistently identified by a cryptographic hash.

Currently, you can see an example here: https://castedo.com/osa/137

I'm now thinking this DAGT idea is not such a good idea as a long-term interoperability format. I've got all of my documents persistently identified by cryptographic hash via JATS XML except one. That one exception has embedded images which are not identified inside the JATS XML. So the generated/rendered web and pdf outputs don't really have content identified by the cryptographic hash. But JATS XML seems to have enough hooks to be able to reasonably do this.

0 replies

castedo · 2022-10-08T14:11:49Z

castedo
Oct 8, 2022
Author

@mrchristian, I'm happy to share a few more examples to give since a month ago.

Here's a document describing the benefits of Digital Succession Identifiers (DSI) from the perspective of an author:

https://perm.pub/aEBkfZe1f4ooWcgt2Qs9gjtmkFo/0

Here's a partial draft specification of DSI:

https://perm.pub/ji2STto1mZ3i2BmnGxbkebejKH4/0

Here's an example user experience of reaching an out-of-date edition:

https://popgen.es/czdv8PyJKF7LneTnaVT6pgAKyh8/0.1

All of the above are digital successions have digital object editions as a directory with a JATS XML file (which popgen.es will then render in both HTML and PDF). The data records are stored in both archive.softwareheritage.org and gitlab.com.

Here's an example of a document that is actually about research population genetics, and that I should be updating rather than coding DSI tech 😄 :

https://popgen.es/DZFCt68peNNajZ34WtZni9VYxzo/0.3

Here's an example which is a digital succession of PDF files as the digital objects (but a PDF is more limited in document meta data compared to JATS):

https://perm.pub/cNA6UJNQvawejAA45NqofL1FLBs

For those brave enough to try this pre-pre-pre-alpha version :-), I've started documentation for the core tool for making digital successions here:

https://hidos.readthedocs.io

Feedback welcome.

Let me know if anyone wants to create and post digital successions of PDFs or articles archived as JATS XML. Digital objects could be other things, but at the moment I'm not developing for those scenarios. The hidos software however is agnostic as to the format of the digital object editions. But hidos is a low level tool.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Digital Article Git Tree (DAGT) #46

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Digital Article Git Tree (DAGT) #46

castedo May 14, 2022

Replies: 3 comments · 1 reply

tarleb May 20, 2022 Maintainer

mrchristian Oct 8, 2022 Maintainer

castedo May 20, 2022 Author

castedo Oct 8, 2022 Author

castedo
May 14, 2022

Replies: 3 comments 1 reply

tarleb
May 20, 2022
Maintainer

mrchristian Oct 8, 2022
Maintainer

castedo
May 20, 2022
Author

castedo
Oct 8, 2022
Author