Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Preservation of OSHB's Unique Ids for Words #8

Open
rkjtan opened this issue Mar 25, 2022 · 6 comments
Open

Inconsistent Preservation of OSHB's Unique Ids for Words #8

rkjtan opened this issue Mar 25, 2022 · 6 comments

Comments

@rkjtan
Copy link
Contributor

rkjtan commented Mar 25, 2022

Because some words had to be broken up into constituent parts for analysis, one unique id would have to be shared across its two or three constituent parts to carry over into the trees. For example:

    <verse osisID="Gen.1.1">
      <w lemma="b/7225" n="1.0" morph="HR/Ncfsa" id="01xeN">בְּ/רֵאשִׁ֖ית</w>
      <w lemma="1254 a" morph="HVqp3ms" id="01Nvk">בָּרָ֣א</w>
      <w lemma="430" n="1" morph="HNcmpa" id="01TyA">אֱלֹהִ֑ים</w>
      <w lemma="853" morph="HTo" id="01vuQ">אֵ֥ת</w>
      <w lemma="d/8064" n="0.0" morph="HTd/Ncmpa" id="01TSc">הַ/שָּׁמַ֖יִם</w>
      <w lemma="c/853" morph="HC/To" id="01k5P">וְ/אֵ֥ת</w>
      <w lemma="d/776" n="0" morph="HTd/Ncbsa" id="01nPh">הָ/אָֽרֶץ</w><seg type="x-sof-pasuq">׃</seg>
    </verse>

"in beginning", "the heavens," "and [object marker]", "the earth" all didn't keep their OSHB unique Ids due to having been separated into 2 parts, while "created", "God", "[object marker]" still show their OSHB ids in the trees. Perhaps should strip all the OSHB ids to avoid confusion.

@jonathanrobie
Copy link
Contributor

Yes, stripping these ids is the easiest thing to do, and we have another way to cross-reference, using our @n attributes.

jonathanrobie added a commit that referenced this issue Mar 31, 2022
@jonathanrobie
Copy link
Contributor

Stripped them by hand in the initial release. We need to add this step in the pipeline (it's just a delete nodes //m/@id).

@pdurusau
Copy link

pdurusau commented Apr 2, 2022 via email

@pdurusau
Copy link

pdurusau commented Apr 2, 2022

To clarify, I'm adding it to prepare-oshb-for-trees.bxs as a separate step.

@jonathanrobie
Copy link
Contributor

Maybe all we need to do is change this line, in explode-word-parts.xq:

 <m>{ $w/@*, $w/text() }</m>

to this:

 <m>{ $w/@* except $w/@id, $w/text() }</m>

Could you please see if that does the trick? It would save us a separate step in the pipeline.

@pdurusau
Copy link

pdurusau commented Apr 2, 2022

Setting test for tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants