Skip to content

(stoa0022a)#387

Merged
AlisonBabeu merged 7 commits intomasterfrom
stoa0040a_to_stoa0022a
Apr 25, 2022
Merged

(stoa0022a)#387
AlisonBabeu merged 7 commits intomasterfrom
stoa0040a_to_stoa0022a

Conversation

@AlisonBabeu
Copy link
Copy Markdown
Contributor

Converted stoa0040a (Pseudo-Augustine) to stoa0022a *(Ambrosiaster), re long ago issue #305.
While had created catalog_data for this edition at the time had no way to change the data.
Renumbered file and also combined the two-only one work.

Converted stoa0040a (Pseudo-Augustine) to stoa0022a *(Ambrosiaster), re long ago issue #305.
While had created catalog_data for this edition at the time had no way to change the data.
Renumbered file and also combined the two-only one work.
@lcerrato
Copy link
Copy Markdown
Contributor

lcerrato commented Apr 12, 2022

@AlisonBabeu
It looks like you added books to stoa0022a.stoa001.opp-lat1.xml, is that correct? There are some oddly named divs in this file.
This isn't going to work as you've added another level (book) without editing the refs decl or making the old file = book 1.
It's not the just the naming of the sections that is causing the problems.

The comments are also unclear. I'm not sure what files were added or when.
This should probably be a detailed note in the header or in the change log rather than comments in the body.

<!-- End of Work 1-previous TEI-XML files -->
  <!-- Beginnin of SEcond XML file -->

@AlisonBabeu
Copy link
Copy Markdown
Contributor Author

hi @lcerrato those comments were just for myself as I was combining two XML files I had planned to do further updating.
I hadn't asked for your review yet as I always do because I wanted to see if it worked and if I could figure out fixing it myself before bothering you! :)

This is all one work but there were two XML files, one had the main body and the other had what I guess for lack of a better term are appendices, its very complicated. When it first failed I tried changing the names of the sections, the previous names of the two new sections were originally "neu" and "old" or something like that. I spent the better part of an hour trying to think about how to name the next two divs or best combine this file.

@lcerrato
Copy link
Copy Markdown
Contributor

@AlisonBabeu
Let me know if you want to revisit the editing.
For instance, you can name the divs appendix1 or appendix_A orapp_1 etc.

Something that reflects that these are appendices is better than making them books 2 and 3, if they are not books 2 and 3. And would be much better than neu or old. (They could be appendix_new etc.)

You cannot have a . in an n attribute. That was part of the initial issue.

@AlisonBabeu
Copy link
Copy Markdown
Contributor Author

hi @lcerrato so I changed the divs back to appendix1 and 2 and added a change log but at this point I'm not sure how to fix it. I see in the Travis message for the broken build Forbidden characters found: 'appendix2.29 '" Forgive my ignorance but I can't find what is mean by this in the file anywhere. I know this exact string is not in TEI-XMl file so what am I missing.

@lcerrato
Copy link
Copy Markdown
Contributor

@AlisonBabeu
It won't pass until the refsDecl reflects the correct file structure.

So, first, you need to enclose the original file in a book level div. You've added on two new books but the initial work now needs to be in the same level. So you need a n="main" or n="0" or something to designate the text you had there before.

Then the refsDecl needs to add this level.

@lcerrato
Copy link
Copy Markdown
Contributor

@AlisonBabeu
You also have checked in a /.directory file that should be deleted.

@lcerrato
Copy link
Copy Markdown
Contributor

lcerrato commented Apr 13, 2022

@AlisonBabeu
Forbidden characters found: [215]() stoa0022a.stoa001.opp-lat1.xml 'appendix2.29 '

This means there is something in the markup that is not allowed. In this case, there is a space after 29 as <div n="29 "

This may be because the sections have a b and the b was deleted, leaving a space?

@lcerrato
Copy link
Copy Markdown
Contributor

This is an example of a three level refsDecl. Some of these older texts use (.+) instead of (\w+). For our purposes, they are interchangeable.

It always has to be written from smallest chunk to largest.

 <refsDecl n="CTS">
            <cRefPattern n="part" matchPattern="(\w+).(\w+).(\w+)" replacementPattern="#xpath(/tei:TEI/tei:text/tei:body/tei:div/tei:div[@n='$1']/tei:div[@n='$2']/tei:div[@n='$3'])">
                <p>This pointer pattern extracts poem, line and part.</p></cRefPattern>
            <cRefPattern n="line" matchPattern="(\w+).(\w+)" replacementPattern="#xpath(/tei:TEI/tei:text/tei:body/tei:div/tei:div[@n='$1']/tei:div[@n='$2'])">
                <p>This pointer pattern extracts poem and line.</p></cRefPattern>
            <cRefPattern n="poem" matchPattern="(\w+)" replacementPattern="#xpath(/tei:TEI/tei:text/tei:body/tei:div/tei:div[@n='$1'])">
                <p>This pointer pattern extracts poem.</p></cRefPattern>
        </refsDecl>

A simple version for your file would be

<refsDecl n="CTS">
        <cRefPattern n="section" matchPattern="(.+).(.+).(.+)" replacementPattern="#xpath(/tei:TEI/tei:text/tei:body/tei:div[@type='edition']/tei:div[@n='$1']/tei:div[@n='$2'])/tei:div[@n='$3'])"/>
        <cRefPattern n="chapter" matchPattern="(.+).(.+)" replacementPattern="#xpath(/tei:TEI/tei:text/tei:body/tei:div[@type='edition']/tei:div[@n='$1']/tei:div[@n='$2'])"/>
        <cRefPattern n="book" matchPattern="(.+)" replacementPattern="#xpath(/tei:TEI/tei:text/tei:body/tei:div[@type='edition']/tei:div[@n='$1'])"/>
      </refsDecl>

Fixed forbidden character.
Fire in the hole.
@AlisonBabeu
Copy link
Copy Markdown
Contributor Author

So @lcerrato I updated the refsDecl and enclosed the first text in a book level div. This has graduated me to a new set of errors:

Unique nodes found by XPath
Word Counting
Passage level parsing
Empty References

At this point I must admit I'm stumped.

@lcerrato
Copy link
Copy Markdown
Contributor

lcerrato commented Apr 20, 2022

@AlisonBabeu
Unfortunately, this is an unhelpful set of error messages.

Because the count is only picking up book and chapter (note the Nodes are 3;246 when you want three digits there), the problem is in the section level. Although I no longer use this refsDecl format, I think that is ok (that's always a place to look — and it's impossible to spot issues there sometimes). It might not hurt to copy/paste a refsDecl from a working three level text in case mine was wrong.

There are may be chapters that do not have sections, but this would mean that the appendices should have been passing previously. (I would have to look back at that structure to see how that was handled).

This can be spotted by using the outline view in oxygen.

At first glance, it looks like there are a lot of things in the appendices that aren't chapters at all but rather chapter headers? Again, I would need to see what the previous version looked like to know if this is legit markup or just an oversight.

@lcerrato
Copy link
Copy Markdown
Contributor

@AlisonBabeu

For instance,

<div n="appendix1" subtype="book" type="textpart">
 <head>
  <title type="main">QVAESTIONES [SANCTI AUGUSTINI] DE UETERI ET NOUO TESTAMENTO. </title>
 </head> 
<div n="1a" subtype="chapter" type="textpart">
<ab> <title>I. </title> </ab>
          <p>I huius recensionis = I recensionis quaestionum numero CXXVII. 
</p>
</div>

here https://archive.org/details/corpusscriptoru16wiengoog/page/419/mode/2up?view=theater

That is not a chapter. There is a "book/appendix" title, then that's just an explanation of the title, so something like a subtitle. But because that has no section, it's not going to pass.

On the top of this page https://archive.org/details/corpusscriptoru16wiengoog/page/421/mode/2up?view=theater

there is
III—XII = II-XI
XIII = XXXV
XIIII-XXXVI = XII-XXXIIII

which has also been encoded as chapters. That's probably more of a notation, just telling the users that there are no differences here from the main text in these sections. So it's debatable that that would be made "chapters" for markup purposes. It's a can of worms, really.

I would guess that's what's breaking this.

@AlisonBabeu
Copy link
Copy Markdown
Contributor Author

@lcerrato I give up at this point in all honesty, I'm going to split the files back up I think and just make them two works, rather than try and fight this out. I didn't change any of the encoding that you are reference, just added letters into the divs to avoid the duplicate node problem. Thanks for all the time you've spent.

@lcerrato
Copy link
Copy Markdown
Contributor

@AlisonBabeu
Give me a chance to take a closer look — I can't test anything until I pull it all offline.

I didn't think you changed that encoding (it was there). I was just looking at the original files to see if they passed like this.

@lcerrato
Copy link
Copy Markdown
Contributor

@AlisonBabeu
I think it's just the refsDecl. The old one from the deleted file is working.

@lcerrato lcerrato self-requested a review April 20, 2022 18:09
Copy link
Copy Markdown
Contributor Author

@AlisonBabeu AlisonBabeu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you thank you for finding and fixing this @lcerrato

@AlisonBabeu AlisonBabeu merged commit 6ff80a5 into master Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants