Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial mapping SIL - Macula (word and morph level) #62

Open
klosoter opened this issue Jul 5, 2022 · 3 comments
Open

Partial mapping SIL - Macula (word and morph level) #62

klosoter opened this issue Jul 5, 2022 · 3 comments

Comments

@klosoter
Copy link
Contributor

klosoter commented Jul 5, 2022

This is what the extracted SIL data looks like (full file here)

{
    'wd': 'B.:/R")$I73YT',
    'ws': 'בְּרֵאשִׁ֖ית',
    'wt': 'bərēʾšîṯ',
    'wc': 'בְּרֵאשִׁית',
    'bf': '\\p',
    'vdm': '{"Temporal <H>בְּ</H>" 39.6.2}',
    'netB': '',
    'wbc': '{"The definite article is lacking, but \'in the beginning\' is an acceptable translation" GEN.1.1.b}',
    'egs': 'in.beginning',
    'morphs': {
        '010010010011': {
            'm': 'B.:',
            'ms': 'בְּ',
            'mt': 'bə',
            'l': 'B.:',
            'ls': 'בְּ',
            'lt': 'bə',
            'dfA': '\\7b\\d0\\62\\7d',
            'df': '}בּ{',
            't': 'Pp'
        },
        '010010010012': {
            'm': 'R")$I73YT',
            'ms': 'רֵאשִׁ֖ית',
            'mt': 'rēʾšiyṯ',
            'l': 'R")$IYT',
            'ls': 'רֵאשִׁית',
            'lt': 'rēʾšîṯ',
            'dfA': '\\7b\\74\\79\\69\\48\\27\\e3\\72\\7d',
            'df': '}רֵאשִׁית{',
            't': 'ncfsa',
            'str': '{07225}'
        }
    }
}
@klosoter klosoter changed the title SIL OT which attributes to use? Partial mapping SIL - Macula (word and morph level) Jul 6, 2022
@klosoter
Copy link
Contributor Author

klosoter commented Jul 6, 2022

A mapping file can be found here

The mapping is done on both the word and morph level since SIL has some useful attributes that are only present on either of them (e.g., contextual glosses egs and morphology coding t)

<maculaSilMapping>
  <word maculaText="בְּרֵאשִׁ֖ית" maculaId="01001001001" SILText="בְּרֵאשִׁ֖ית" SILId="01001001001" SILGlosses="in.beginning" SILTransliteration="bərēʾšîṯ">
    <morph maculaText="בְּ" maculaId="010010010011" SILText="בְּ" SILId="010010010011" SILMorphology="Pp" SILTransliteration=""/>
    <morph maculaText="רֵאשִׁ֖ית" maculaId="010010010012" SILText="רֵאשִׁ֖ית" SILId="010010010012" SILMorphology="ncfsa" SILTransliteration="rēʾšiyṯ"/>
  </word>
  <word maculaText="בָּרָ֣א" maculaId="01001001002" SILText="בָּרָ֣א" SILId="01001001002" SILGlosses="he.created" SILTransliteration="bārāʾ">
    <morph maculaText="בָּרָ֣א" maculaId="010010010021" SILText="בָּרָ֣א" SILId="010010010021" SILMorphology="vqp3ms" SILTransliteration="bārāʾ"/>
  </word>
  <word maculaText="אֱלֹהִ֑ים" maculaId="01001001003" SILText="אֱלֹהִ֑ים" SILId="01001001003" SILGlosses="God" SILTransliteration="ʾĕlōhîm">
    <morph maculaText="אֱלֹהִ֑ים" maculaId="010010010031" SILText="אֱלֹהִ֑ים" SILId="010010010031" SILMorphology="ncmpa" SILTransliteration="ʾĕlōhiym"/>
  </word>
  <word maculaText="אֵ֥ת" maculaId="01001001004" SILText="אֵ֥ת" SILId="01001001004" SILGlosses="(et)" SILTransliteration="ʾēṯ">
    <morph maculaText="אֵ֥ת" maculaId="010010010041" SILText="אֵ֥ת" SILId="010010010041" SILMorphology="Po" SILTransliteration="ʾēṯ"/>
  </word>

@klosoter
Copy link
Contributor Author

klosoter commented Jul 6, 2022

However, these mappings are not perfectly one-to-one. Generally, there are three cases:

1. More than one SILId for one maculaId (morph level, 2 cases)

Use

for $node in //morph[@SILId => contains(";")]
return $node/..

to find all words which have morphemes containing more than one SILId (this does not occur at the word level):

<word maculaText="בָּארוּמָ֑ה" maculaId="07009041003" SILText="בָּארוּמָ֑ה" SILId="07009041003" SILGlosses="at.(the).Arumah" SILTransliteration="bāʾrûmâ">
  <morph maculaText="בָּ" maculaId="070090410031" SILText="בָּ" SILId="070090410031" SILMorphology="Pp" SILTransliteration=""/>
  <morph maculaText="ארוּמָ֑ה" maculaId="070090410032" SILText="|ארוּמָ֑ה" SILId="070090410032;070090410033" SILMorphology="Pa|np" SILTransliteration="–|ʾrûmāh"/>
</word>
<word maculaText="הָרֹאֶ֖ה" maculaId="13002052007" SILText="הָרֹאֶ֖ה" SILId="13002052006" SILGlosses="Haroeh" SILTransliteration="hārōʾeh">
  <morph maculaText="הָרֹאֶ֖ה" maculaId="130020520071" SILText="הָ|רֹאֶ֖ה" SILId="130020520061;130020520062" SILMorphology="Pa|ncmsa" SILTransliteration="hā|rōʾeh"/>
</word>

2. More than one maculaId for one SILId (word level, 1003 cases)

Use

for $node in //word[@maculaId => contains(";")]
return $node

to find all words that have more than one maculaId for one SILId:

<word maculaText="עַל|כֵּן֙" maculaId="01002024001;01002024002" SILText="עַל־כֵּן֙" SILId="01002024001" SILGlosses="therefore" SILTransliteration="ʿal-kēn">
  <morph maculaText="עַל|כֵּן֙" maculaId="010020240011;010020240021" SILText="עַל־כֵּן֙" SILId="010020240011" SILMorphology="Pd" SILTransliteration="ʿal-kēn"/>
</word>
<word maculaText="תּ֣וּבַל|קַ֔יִן" maculaId="01004022006;01004022007" SILText="תּ֣וּבַל קַ֔יִן" SILId="01004022006" SILGlosses="Tubal-Cain" SILTransliteration="tûḇal qayin">
  <morph maculaText="תּ֣וּבַל|קַ֔יִן" maculaId="010040220061;010040220071" SILText="תּ֣וּבַל קַ֔יִן" SILId="010040220061" SILMorphology="np" SILTransliteration="tûḇal qayin"/>
</word>

3. More than one maculaId for one SILId (morph level, 48,453 cases)

Use

for $node in //morph[@maculaId => contains(";")]
return $node/..

to find all morphs that have more than one maculaId for one SILId:

<word maculaText="לְמִינ֔וֹ" maculaId="01001011013" SILText="לְמִינ֔וֹ" SILId="01001011013" SILGlosses="to.its.kind" SILTransliteration="ləmînô">
  <morph maculaText="לְ" maculaId="010010110131" SILText="לְ" SILId="010010110131" SILMorphology="Pp" SILTransliteration=""/>
  <morph maculaText="מִינ֔|וֹ" maculaId="010010110132;010010110133" SILText="מִינ֔וֹ" SILId="010010110132" SILMorphology="ncmscX3ms" SILTransliteration="mînô"/>
</word>
<word maculaText="זַרְעוֹ" maculaId="01001011015" SILText="זַרְעוֹ־" SILId="01001011015" SILGlosses="its.seed" SILTransliteration="zarʿô-">
  <morph maculaText="זַרְע|וֹ" maculaId="010010110151;010010110152" SILText="זַרְעוֹ־" SILId="010010110151" SILMorphology="ncmscX3ms" SILTransliteration="zarʿô-"/>
</word>

These are mostly cases where a suffix is involved (SIL does not split suffixes).
Use

for $node in //morph[@maculaId => contains(";")]
where not(contains($node/@SILMorphology, "X"))
return $node/..

to filter these cases.

The remaining 1041 cases are mostly compounds. Use

for $node in //morph[@maculaId => contains(";")]
where not(contains($node/@SILMorphology, "X"))
where not(contains($node/../@maculaId, ";"))
return $node/..

to filter these cases too.
The remaining 63 cases involve mostly (implied articles):

<word maculaText="כָּעֵ֣ת" maculaId="01018010005" SILText="כָּעֵ֣ת" SILId="01018010005" SILGlosses="about.[the].time" SILTransliteration="kāʿēṯ">
  <morph maculaText="כָּ" maculaId="010180100051" SILText="ךָּ" SILId="010180100051" SILMorphology="Pp" SILTransliteration=""/>
  <morph maculaText="|עֵ֣ת" maculaId="010180100051ה;010180100052" SILText="עֵ֣ת" SILId="010180100052" SILMorphology="ncbsa" SILTransliteration="ʿēṯ"/>
</word>
<word maculaText="כָּעֵ֥ת" maculaId="01018014007" SILText="כָּעֵ֥ת" SILId="01018014007" SILGlosses="about.[the].time" SILTransliteration="kāʿēṯ">
  <morph maculaText="כָּ" maculaId="010180140071" SILText="ךָּ" SILId="010180140071" SILMorphology="Pp" SILTransliteration=""/>
  <morph maculaText="|עֵ֥ת" maculaId="010180140071ה;010180140072" SILText="עֵ֥ת" SILId="010180140072" SILMorphology="ncbsa" SILTransliteration="ʿēṯ"/>
</word>

@klosoter
Copy link
Contributor Author

klosoter commented Jul 6, 2022

To be done:

  • split SIL morph data (e.g., transliteration) into suffixes aligning with Macula
  • split egs attributes, contextual glosses, at the word level into smaller glosses corresponding to morphemes

@klosoter klosoter mentioned this issue Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant