Skip to content

Latest commit

 

History

History
105 lines (92 loc) · 3.74 KB

README.md

File metadata and controls

105 lines (92 loc) · 3.74 KB

TEI Entity Lookup

The script ./tools/enhance-tei.php is a command-line tool that looks for external entity references in the body of a TEI text, attempts to fetch information about those entities from supported web sources (currently: Geonames and the EHRI Portal), and outputs a new TEI with that information embedded in the TEI header.

For example, the minimal TEI document:

<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="testing">
    <teiHeader>
        <fileDesc>
            <!-- SNIP -->
            <sourceDesc>
                <bibl>King's College London</bibl>
            </sourceDesc>
        </fileDesc>
    </teiHeader>
    <text>
        <body>
            <p>An example placename: <placeName ref="http://www.geonames.org/2643743/">London</placeName>.</p>
        </body>
    </text>
</TEI>

would become:

<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="testing">
    <teiHeader>
        <fileDesc>
            <!-- SNIP -->
            <sourceDesc>
                <bibl>King's College London</bibl>
                <listPlace>
                    <place>
                      <placeName>London</placeName>
                      <location>
                        <geo>51.50853 -0.12574</geo>
                      </location>
                      <linkGrp>
                        <link type="normal" target="http://www.geonames.org/2643743/"/>
                        <link type="desc" target="http://en.wikipedia.org/wiki/London"/>
                      </linkGrp>
                    </place>
                </listPlace>
            </sourceDesc>
        </fileDesc>
    </teiHeader>
    <text>
        <body>
            <p>An example placename: <placeName ref="http://www.geonames.org/2643743/">London</placeName>.</p>
        </body>
    </text>
</TEI>

Supported entities are <placeName, <persName>, orgName, and <term>. If an entity reference does not have a ref= attribute an entity will be added to the header without external information using just the enclosed text.

Preferred language can be selected with the --lang option followed by a three-character langauge code. Where possible, the script will preferably fetch information in the chosen language.

For entities for which an external source either does not exist or is not supported, local "dictionary" file(s) can be supplied with the --dict <file.xml> option. A dictionary file consists of a TEI document containing entities in the <sourceDesc> that can be referred to by other files, using an anchor reference to their xml:id attribute.

For example, a dictionary file might resemble:

<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="EHRI-BF-local_dictionary">
    <teiHeader>
        <fileDesc>
            <sourceDesc>
                <listPlace>
                    <place xml:id="test-place">
                        <placeName>Test Place</placeName>
                        <location>
                            <geo>51.848637 -0.55462</geo>
                        </location>
                        <note><p>Testing.</p></note>
                        <linkGrp>
                            <link type="desc" target="https://en.wikipedia.org/wiki/Whipsnade_Zoo"/>
                        </linkGrp>
                    </place>
                </listPlace>
            </sourceDesc>
        </fileDesc>
    </teiHeader>
    <text>
        <body>
        </body>
    </text>
</TEI>

and a reference to it could be made in another file using the <placeName ref="#test-place">The Place</placeName>. This will result in the data from the dictionary file being copied to the subject TEI.

Usage

 ./tools/enhance-tei.xml [-d|--dict <dict.xml>] [-l|--lang XXX] <source-tei,xml>