Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Displaying modern and ancient punctuation in text and apparatus, respectively #132

Open
samosafuz opened this issue Mar 25, 2022 · 11 comments
Assignees

Comments

@samosafuz
Copy link
Member

PR #97 added a handful of new <g> @type characters to the list of items displayed in the PN apparatus criticus. It appears there are still a few kinks to work out.

The first is that they are not at the moment being displayed in the apparatus consistently. In 60527, for example, there are three instances of <g type="high-punctus"/> (which uses the Unicode point ˙) but only two of them appear in the apparatus. In 60757, the issue is even more widespread: there are 65 occurrences of <g type="high-punctus"/> but only 74 occurrences of ˙. If the 65 items were duplicated in the apparatus we would expect 130 hits. The @types middot and hypodiastole display fine in the text after being added to EpiDoc stylesheets, but do not seem to be appearing at all in the apparatus.

Once that issue is solved, a second, more serious, issue looms. There is a tendency for modern and ancient punctuation to appear together and clutter the display. Ideally, the @types that display in the apparatus should be suppressed in the PN text, and (conversely) any modern/editorial punctuation that appears in the PN text should be suppressed in the apparatus. Punctuation or glyphs, in other words, should appear in the apparatus OR the text but not in both. It frequently happens, for example, that an editorial middot or apostrophe is followed by a scribal one, or that a high-punctus is preceded by modern punctuation of one sort or another. When that happens, both are currently displayed in both places, which suggests to the user that something is wrong.

If it is at all helpful, the typical practice for encoding such combinations has been to put modern/editorial punctuation first and ancient/scribal punctuation second ("ours, then theirs", in other words).

@hcayless
Copy link
Member

So this is actually a new feature request. We've never had ancient punctuation trigger an apparatus entry, nor is the use of <g type=...> particularly distinguished from, e.g. a modern apostrophe. The convention is simply to replace the <g> with a substitute character or "(type)" if there's no substitution available. The ones that are appearing in the apparatus are there because they happen to be adjacent to something else that got picked up as an app entry. This is distinct from the treatment on ancient diacritics and I will stipulate that this is inconsistent.

So the question is: what should actually happen here, and what additional infrastructure do we need to support it? I will say that if ’<g type="apostrophe"/> doesn't result in some distinction between the two, there's no point in doing it.

@samosafuz
Copy link
Member Author

It is important that PN includes modern / editorial punctuation and also indicates any ancient / scribal marks clearly. Papyrologists want to know exactly what is on a papyrus, but all users also expect to be provided with clean, articulated Greek that adheres to modern editorial conventions.

The requests outlined in this issue are not entirely new. We have for some time displayed <g type="apostrophe"/> in the PN apparatus, to specify that the scribe wrote it and that it is not editorial. A good example is P.Hamb. 3.228.12:

Screen Shot 2022-03-27 at 11 45 17 AM

This practice is similar to, but distinct from, how we use <hi> to indicate scribal accents, breathings, diaireseis, etc.: those also appear in the PN apparatus (see, for example ­ϊσατιν papyrus in the apparatus for recto, line 3, in the Hamburg example above: that is <hi rend="diairesis">ἰ</hi> in xml). With <hi>, however, the specified @rend appears only in the apparatus and not in the text itself.

Pull #97 added support for a handful of scribal punctuation marks that PN encodes as gtypes (e.g., high-punctus, low-punctus, middot, hypodiastole, diastole), with the goal of displaying them much like <g type="apostrophe"/> . My first issue is that this updated .xsl is evidently not in production yet: do I need also to propose the same changes to EpiDoc's htm-tpl-apparatus.xsl (https://github.com/EpiDoc/Stylesheets/blob/master/htm-tpl-apparatus.xsl)?

What we are additionally requesting is a) for these points of scribal punctuation to display in the apparatus but not in the text (much like ­ϊσατιν papyrus appears in the apparatus for recto, line 3 while the diairesis does not appear in the text). I had hoped that the changes I made in pull #97 would achieve this with the indicated gtypes instead of <hi> (and, from a TEI perspective, we should persist is using <g> for scribal punctuation; I do not believe <hi> is the appropriate element.)

At the same time, b), it would be helpful if editorial punctuation did not appear in the apparatus: if you look at the apparatus entry for line 14 from the Hamburg example above, the implication is that the period (and not just the diairesis on ϋμασ. papyrus) is also on the papyrus. That suppression of modern punctuation may well be a new feature request, but I would point out how we similarly suppress the modern/editorial accents and breathings on items whose <hi> triggers an apparatus entry.

@jcowey, have I described the issue accurately? It is perhaps the case that updating EpiDoc stylesheets so that the changes proposed in #97 will get us a good way down the road.

@hcayless
Copy link
Member

hcayless commented May 9, 2022

There are so many things wrong here that my brain kind of bounces off it. I'll try to enumerate them:

  1. Certain <g>s should be displayed in the apparatus and not the text.
  2. The rendering with '(*)' links to the apparatus is chaos when so many features occur.
  3. I really wonder whether <g> is the appropriate element for this, as opposed to https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-pc.html
  4. The apparatus rendering is busted in a variety of ways. see https://papyri.info/dclp/60527 (l. 24) ιδ’ for example. l. 9 (Ὕρίη[ν]) tells me nested <hi> doesn't work.

@jcowey
Copy link
Member

jcowey commented May 10, 2022

Took some time to create the following HTML presentation of two texts. I hope that it helps to make one or two things clearer.

@jcowey
Copy link
Member

jcowey commented May 10, 2022

Few initial thoughts on the list above:

  1. "Certain <g>s should be displayed in the apparatus and not the text." Most certainly: stauros, rho-cross, chirho for starters. When considering these changes, there was awareness the <g>s would have to be split into two categories: those for the apparatus and those to appear in the text itself.
  2. "The rendering with '(*)' links to the apparatus is chaos when so many features occur." Probably remains a very tricky problem.
  3. "I really wonder whether is the appropriate element for this, as opposed to https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-pc.html" <pc> might indeed be good for a few cases (but only a very few I reckon) @type="middot" serves for at least two different things: one a Greek colon; one a Latin interpunct. The second is not really a <pc> but rather a word divider. Probably by introducing this we would be making for more problems than benefits.
  4. "The apparatus rendering is busted in a variety of ways. see https://papyri.info/dclp/60527 (l. 24) ιδ’ for example. l. 9 (Ὕρίη[ν]) tells me nested doesn't work." Indeed multiple <hi>s on one letter does not really work properly and has not ever done so in papyri.info, I think.

@hcayless
Copy link
Member

We could probably have a long and interesting debate over beers whether Latin interpunct counts as punctuation. My vote is 'yes' :-).

@samosafuz
Copy link
Member Author

I'm open to the conversation (and the beers) about adopting <pc>; supposing that we can craft the XSLT and XSugar for it, <pc> does seem a convenient way of separating out one type of <g> from the other (since it is the punctuation, in particular, that we want to note in the apparatus). <pc>, presumably, would be like <hi> and would always trigger an apparatus entry on this logic. And if I'm understanding the TEI entry correctly, we'd be tagging the modern punctuation, which would eliminate the problem of the duplication of, for example, apostrophes.

@hcayless
Copy link
Member

Not a hill I'd be particularly willing to die upon, but my argument would be that the Romans sometimes used punctuation to divide words, where we use whitespace. We use whitespace to indicate section divisions too, where the Greeks used paragraphoi. Merely different conventions.

I'd argue for tagging ancient punctuation with <pc>, since modern is ubiquitous. But yes, it might help us to eliminate the duplication problem, though there may be other, better strategies. I wonder about: <g type="apostrophe">‘</g> with a convention that if <g> contains text, you just print that text. We already have Leiden+ for that: *apostrophe,‘* should work. We could invent Leiden+ for <pc> analogous to what we do with <g>. Maybe **middot* instead of *middot*?

Just to fully pry open the can of worms, I guess paragraphos is punctuation too... (runs away)

And then to dance around, throwing the contents everywhere: I still want to complete the move from g/@type to g/@ref, which I made a start on in the winter, but then got distracted from. And I'd also like to clean up our approach to apparatus, which is too complex (that's orthogonal to this issue, but may be connected if I'm going to spend time in its guts anyway).

@hcayless hcayless self-assigned this May 17, 2022
@hcayless
Copy link
Member

Ok, I've done some work on app construction. Compare https://papyri.info/dclp/60527 to https://papyri-dev.lib.duke.edu/dclp/60527. The latter has both the new XSLT, changes <g type="apostrophe"/> to <g type="apostrophe">‘</g>, and also changes the nesting order of some of the <hi>s. We can now handle the latter (see https://sourceforge.net/p/epidoc/bugs/187/), but the order matters, otherwise combining diacritics can appear stacked incorrectly. The inner <hi>, closest to the character is evaluated first, the outer <hi> second. Note that the new app. crit. picks up some stuff the old one didn't. It relies on tokenizing words before processing, rather than trying to figure out what goes alongside an ancient diacritic or punctuation character. If you want, e.g., a high punctus to appear with a word, just put it next to the preceding word rather than separated by whitespace. Another difference is that app. entries do not automatically close, e.g., runs of supplied text. Compare PMichInv.xi/xii.2. γλῶσσα[ι,] papyrus to the new PMichInv.xi/xii.2. γλῶσσα[ι papyrus. I'd argue the new way is better and the old one misleading, but you might feel differently.

If you all could take a look at this, and also poke around https://papyri-dev.lib.duke.edu to look for errors, I'd be grateful. Look for empty app. entries or anything else that seems goofy.

@samosafuz
Copy link
Member Author

This all appears to be working nicely, though I shudder at the idea of a fixed nesting order for combinations of <hi>s. A reference document would be helpful on this front! I personally prefer the open leaving runs of supplied text open, too.

The things I noticed are either of the complicated variety or are more properly desiderata. I'll post the latter in separate tickets.

Complicated things: where there are |subst| tags, any apparatus-triggering <hi> or <g> is absent from the apparatus. There is something missing in the corr. ex readings in xi/xii.5, 6, 16, 44. I wonder whether this will be an issue for |reg|, |ed|, or |corr| tags, too.

I didn't see any empty tags, but will keep poking around in search of bugs.

@jcowey
Copy link
Member

jcowey commented Jan 10, 2023

δ*apostrophe,’*: comma, modern apostrophe is what you enter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants