Skip to content

Importing a preexisting database

Irene Vagionakis edited this page May 16, 2021 · 2 revisions

Work in progress

This topic is not strictly related to EFES but is connected more broadly to the creation of EpiDoc files from preexisting document collections. The following instructions were not intended as an ideal universal workflow, but derive from specific sample cases.

How to create EpiDoc XML files from a preexisting Excel/CSV file

Assuming that you have an Excel file in which each row contains some data related to a specific document/inscription:

  1. Fill in the empty cells, e.g. with '-'

  2. Export the Excel file as XML Spreadsheet, saving it on the Desktop e.g. as 'all.xml' (this will generate one XML file with all the documents)

  3. Restore any missed apostrophes and accents in the xml file (e.g. replace all ' with ')

  4. Delete all the occurrences of 'ss:' from the xml file

  5. In the xml file replace all the occurrences of <Row .+?> with <Row> (using Regular Expressions; this can be done with Oxygen XML Editor 'Find/Replace' selecting the 'Regular expression' option)

  6. Delete everything before the <Row> of the first useful row and everything after the </Row> of the last useful row

  7. Type the following command in the Terminal: cd Desktop && awk '{if ($0 ~ /<Row>/) a++} { print >> ("doc"a".xml") } {close("doc"a".xml")}' all.xml (this will generate one XML file for each document)

  8. Create an XSLT file to transform the generated raw XML files into XML files based on the EpiDoc template; you can name it e.g. 'xml-to-epidoc.xsl' and save it on the Desktop (see an example here)

  9. In Oxygen XML Editor create a new Project (from the Project menu or tab) and add all the generated XML files to the Project

  10. Select all the XML files from the Project side tab, right click on them and select 'Transform' > 'Configure transformation scenario' > 'New' > 'XML transformation with XSLT' (with these values: XML URL ${currentFileURL}, XSL URL ${cfd}/xml-to-epidoc.xsl, Save as ${cfd}/epidoc/${cfne}; these values should be changed if your XML and XSLT files are located elsewhere); then select 'Apply associated' (this will generate an EpiDoc XML file for each document)

  11. Add the link to the EpiDoc schema to all files with Oxygen XML Editor 'Find/Replace in Files', selecting as Scope the 'epidoc' folder with the new EpiDoc files, selecting the 'Regular expression' option and replacing <TEI with <?xml-model href="http://epidoc.stoa.org/schema/latest/tei-epidoc.rng" schematypens="http://relaxng.org/ns/structure/1.0"?>\n<TEI

  12. Move all the EpiDoc files inside the EFES 'epidoc' folder

How to create EpiDoc XML files from a preexisting FileMaker database

  1. Export the database as a single XML file, saving it on the Desktop e.g. as 'all.xml'
  2. Restore any missed apostrophes and accents in the xml file (e.g. replace all &apos; with ')
  3. In the xml file replace all the occurrences of <ROW .+?> with <ROW> (using Regular Expressions; this can be done with Oxygen XML Editor 'Find/Replace' selecting the 'Regular expression' option)
  4. Delete everything before the <ROW> of the first useful row and everything after the </ROW> of the last useful row
  5. Type the following command in the Terminal: cd Desktop && awk '{if ($0 ~ /<ROW>/) a++} { print >> ("doc"a".xml") } {close("doc"a".xml")}' all.xml (this will generate one XML file for each document)
  6. Follow steps 8-12 of the previous section