Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish a WLC TEI edition that includes macula identifiers #122

Merged
merged 28 commits into from
May 1, 2024

Conversation

jacobwegner
Copy link
Collaborator

@jacobwegner jacobwegner commented Apr 12, 2024

This PR:

  • Adds a new tei-transform pipeline that fetches XML from Tanach.us and transforms it into the TEI dataset published at WLC/tei.
  • Reverts changes in sources/tanach.us/xml (all processing happens on the pipeline, making it easier to "refresh" the upstream XML as-needed)

The XML file for each book can be viewed in a browser and will be rendered with the wlc-tei.css stylesheet:

WLC/tei/08-ruth.xml

image

samekh and pe elements are rendered with additional whitespace / line breaks, mimicing the display of the HTML on Tanach.us:

WLC/tei/09-1samuel.xml

image

Or view in the Symphony Browser

image

The XML uses significant whitespace, e.g. o090010090111 and o090010090121 are in separate words, but there is no whitespace between them:

<w ref="1SA 1:9!11"><m xml:id="o090010090111" ref="1SA 1:9!11">עַל־</m></w><w ref="1SA 1:9!12"><m xml:id="o090010090121" ref="1SA 1:9!12">הַ</m><m xml:id="o090010090122" ref="1SA 1:9!12">כִּסֵּ֔א</m></w> <w ref="1SA 1:9!13"><m xml:id="o090010090131" ref="1SA 1:9!13">עַל־</m></w><w ref="1SA 1:9!14"><m xml:id="o090010090141" ref="1SA 1:9!14">מְזוּזַ֖ת</m></w> <w ref="1SA 1:9!15"><m xml:id="o090010090151" ref="1SA 1:9!15">הֵיכַ֥ל</m></w>

Rendering of the new XML can be viewed in the Symphony Browser by adding a tanachTEI=y querystring parameter, e.g.

https://deploy-preview-370--symphony-preview.netlify.app/?workspace=reading&osisRef=1Sam.1.9&tanachTEI=y&selectedLemma=%D7%A8%D6%B8%D7%90%D6%B8%D7%94

image

Gives us a consistent starting place for any additional transforms
* w elements are not at morpheme level
* xml:id indicators are not valid Macula IDs
Look at RUT 1:8 (k, q) elems.

Nodes include q but omit k.

RUT 4:17 has a pe element that is rendered on Tanach.us and on marble.bible, but is treated as an "after" attribute in Nodes
Consult with Tanach.xsd for other allowed elements.
Ends up being some differences, e.g.

* RUT 1:2!9
* Numbering in RUT 2:1
* RUT 4:17!16 (nearly like Marble)

This approach is really just using Tanach.us XML as a "skeleton" to hang the w elements; whitespace sensitivity coming next if we go to the morpheme level.
@jacobwegner jacobwegner changed the title WIP: WLC TEI Publish a WLC TEI edition that includes macula identifiers Apr 15, 2024
Copy link
Contributor

@jonathanrobie jonathanrobie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to compare Genesis 1 and Psalm 1 to the equivalent WLC on tanach.us, and they look good. I did a spot check on both paseq and ׃פ in both tanach.us and symphony in Genesis 1 to make sure they show up in the same places.

I can't guarantee no errors, but this is definitely better than the status quo.

@jacobwegner jacobwegner merged commit d95d12a into main May 1, 2024
1 check passed
@jacobwegner jacobwegner deleted the feat/tei branch May 1, 2024 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants