Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Font issues, particularly Windows "core" extensions #213

Open
PhilterPaper opened this issue Mar 25, 2024 · 126 comments
Open

Font issues, particularly Windows "core" extensions #213

PhilterPaper opened this issue Mar 25, 2024 · 126 comments
Labels
general discussion roadmaps, etc., discuss direction help wanted we could use some help from you guys

Comments

@PhilterPaper
Copy link
Owner

Hi folks,

I was looking through some notes, sorting out upcoming work on PDF::Builder, and found that I need to gather some information about what's available on non-Windows systems. I'm hoping that @sciurius and @terefang in particular can help out here. Feel free to tag others who might be able to contribute.

My Font Manager subsystem (part of Builder, not Windows) includes an entry for someone to define their generic "script" font face (cursive, handwritten look). I do not default any font, as I don't know if Linux and Mac systems can be counted on to have any script faces pre-installed. Windows, on the other hand, appears to ship with FrenchScriptMT (FRSCRIPT.TTF) and ScriptMTBold (SCRIPTBL.TTF), and either would make a nice "as shipped" default. Can anyone tell me what the story is on non-Windows systems? I suppose that I could default them to either Times-Italic or perhaps Helvetica-Oblique. I have an example for the Font Manager that shows script, using FrenchScriptMT, but I suspect it will fail to install that font on non-Windows systems, and not display that example correctly.

PDF::Builder (and its predecessor, PDF::API2) defined 15 or so extensions to "core" fonts, for Windows. These are Verdana (a sans-serif face), Trebuchet (a very similar sans-serif face), and Georgia (a serif face), plus two Dingbats-like fonts (Wingdings and Webdings). A Roman sans-serif face, Bank Gothic, has metrics provided, but I don't pre-load it into the Font Manager as I'm not sure if my Windows system is even finding it when a Reader requests it (I think it's substituting something else). Anyway, all these fonts are supposedly "core", so a Reader knows where to find them, but the metrics are provided (even though they are all TTF). Any suggestions on what I might replace them with on Linux and Mac systems? A user can always define their own entries, but I'd like to have some reasonable defaults for users of non-Windows systems.

I realize that "Linux" covers a multitude of systems, each likely to load their own beloved set of fonts into their own favorite places. Macs might be a little more consistent, but I don't know for sure. I'm just hoping to find some consistent, widely installed, set of substitutes for the Windows "core" extensions. Then there's the issue of producing PDFs on non-Windows systems that might in turn be displayed on a PDF Reader on a Windows system. Do Readers on Windows systems actually know where to go for these "core" fonts, or are they just substituting something reasonable behind the scenes? I figure that I should get this straightened out and document it well before too long.

Thanks much to anyone who can shed some light on these issues.

@PhilterPaper PhilterPaper added help wanted we could use some help from you guys general discussion roadmaps, etc., discuss direction labels Mar 25, 2024
@terefang
Copy link

terefang commented Mar 27, 2024

i know something for windows:

  • there is a global font directory – dropping a font file there makes it globally available.
  • there is a system-registry path – you can register fonts there under arbitrary paths, makes it globally available.
  • there is a user-registry path – work like the system one but makes it only available to the user.

for linux, directories scanned are:

  • system – /usr/share/fonts, /usr/local/share/fonts
  • user – ~/.local/share/fonts, ~/.fonts
  • wherever the XDG_* environment variable point to.

if you generalize from Linux to Unix you might be on the safe side.

i have seen for this http://xahlee.info/linux/linux_fonts.html mostly applicable for multi-language debian/ubuntu installs, also basically Linux Mint.

the most correct answer for linux is "it depends" and simply depends on the preferences of the particular user.

i would classify them into the following:

  • the default multi-language installed fonts from above.
  • a ghostscript user should have the "URW base 35" fonts installed
    • C059,
    • D050000L,
    • Nimbus Mono PS,
    • Nimbus Roman,
    • Nimbus Sans,
    • Nimbus Sans Narrow,
    • P052,
    • Standard Symbols PS,
    • URW Bookman,
    • URW Gothic,
    • Z003 (Zapf Chancery)
  • a LaTeX/PdfTex user would have the "TexGyre" fonts installed
    • TeX Gyre Adventor (URW Gothic L),
    • TeX Gyre Bonum (URW Bookman L),
    • TeX Gyre Chorus (URW Chancery L Medium Italic),
    • TeX Gyre Cursor (URW Nimbus Mono L family),
    • TeX Gyre Heros (URW Nimbus Sans L),
    • TeX Gyre Pagella (URW Palladio L),
    • TeX Gyre Schola (URW Century Schoolbook L),
    • TeX Gyre Termes (URW Nimbus Roman No9 L)
  • a LaTeX/PdfTex user might have the "ADF" fonts installed
    • Accanthis
    • Aurelis
    • Baskervald (Baskeville)
    • Berenis
    • Electrum
    • Gillius (Gill Sans)
    • Ikarius
    • Switzera
    • Irianis
    • Libris
    • Verana (Vera)
    • Mekanus
    • VeranaSans
    • Romande
    • Solothurn
    • Tribun (Times)
    • Universalis
    • Ornements
    • MintSpirit
    • Oldania
    • NeoGothis
  • a Wine user would have the "msttcore" fonts installed
    • Arial
    • Tahoma
    • Webdings
    • Verdana
    • Trebucet
    • Times
    • Impact
    • Georgia
    • Courier New
    • Comic Sans
    • Andale Mono
  • "dejavu" fonts
    • DejaVuSans,
    • DejaVuSansMono,
    • DejaVuSerif
  • "freefont" fonts
    • FreeMono (Courier),
    • FreeSans (Helvetica),
    • FreeSerif (Times)
  • "liberation" fonts
    • LiberationMono (Courier),
    • LiberationSans (Arial),
    • LiberationSerif (Times)
  • "CrosCore" fonts
    • Arimo (Arial),
    • Cousine (Courier New),
    • Tinos (Times New Roman),
    • Carlito (Calibri),
    • Caladea (Cambria)
  • LibreOffice Fonts
    • OpenSans
    • OpenSymbol
    • LiberationMono (Courier),
    • LiberationSans (Arial),
    • LiberationSerif (Times)

@terefang
Copy link

terefang commented Mar 27, 2024

@PhilterPaper
Copy link
Owner Author

Thanks for the information, Alfred. It sounds like, unfortunately, for non-Windows systems there are no close equivalents to the Windows "core" extensions that I can count on being there. I was hoping to find "use X instead of Verdana, use Y instead of Georgia," etc. I can think about doing an extensive search for the fonts you listed, but I'm not sure it's worth the effort. And if nothing shows up, I still have to substitute something. Maybe I would be better off just substituting (on non-Windows systems) Helvetica for Verdana and Trebuchet, Times for Georgia, and who knows what (Zapf-Dingbats?) for Webdings and Wingdings. At least something reasonable would be produced. Any thoughts on that approach? Users would still have the freedom to add whatever fonts they want to use; I'm just looking at the "core" fonts. And producing documents using Windows core extension fonts could produce interesting results on non-Windows systems -- assuming that Readers under Windows are actually using those as locally-installed (by default) fonts and not some substitute.

Regarding a default script font, maybe I'll just use Times-Italic on non-Windows systems. A user can always find and install a real script font if they want to run the examples/FontManager, but it still uses a bunch of other fonts (T1, bitmap, CJK, etc.) that the typical user won't have installed. So long as the example degrades gracefully for non-installed fonts, at least it will be runnable. Let's say you're a Linux user with a bog standard set of installed fonts -- what would you expect as reasonable results from examples/FontManager, which uses all the "core" fonts (including the Windows extensions) as well as a script and a bunch of odd other fonts? Would you expect some "safe" substitutions, and if so, what?

Could you do me a favor and on a non-Windows machine or two, download and open https://www.catskilltech.com/Examples/PDF/Builder/FontManager.pdf and see what it looks like? The "Georgia" expected at the end of the second paragraph may or may not show up, and much of the third paragraph may ??? There's a bunch of fonts in different formats that a user would need to install, and I'm not sure how it looks when they're not there. At the very least, the example should not blow up (which I would need to fix). Thanks!

there is a global font directory – dropping a font file there makes it globally available.

If you're talking about \Windows\Fonts, my experience has been that often a font file "dropped" into there gets moved to something under my Users path. It's documented somewhere in the Font handling section where to find a font file and its path and name.

@sciurius
Copy link
Contributor

Could you do me a favor and on a non-Windows machine or two, download and open https://www.catskilltech.com/Examples/PDF/Builder/FontManager.pdf and see what it looks like?

Is there any reason why this document would not look precisely as intended? That's the whole idea behind the P in PDF...
According to the specs, all fonts except for the 14 corefonts must be embedded in a PDF document. Viewers are responsible for providing the corefonts (or decent alternatives).

In this case, Georgia, which is not one of the 14 corefonts, is not embedded so it completely depends on the viewer what will be shown. According to the screendumps they seem to so a good job.

Attached screendumps of Debian Bookworm, Fedora 39, Linux Mint, Fedora Rawhide, MacOS Catalina and Windows 10. Note that Fedora39 is my personal workstation which has lots of non-standard things installed. That is probably the reason the greek characters (except for π) do not show.

fontmanager.zip

@terefang
Copy link

did you intend that some fonts where embedded?

name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
AdobeGothicStdL                      CID Type 0C       Identity-H       yes no  yes    231  0
[none]                               Type 3            Custom           yes no  no      39  0
Courier                              Type 1            Custom           no  no  no      12  0
DejaVuSans                           CID TrueType      Identity-H       yes no  yes     16  0
DejaVuSansOblique                    CID TrueType      Identity-H       yes no  yes     21  0
DejaVuSansBold                       CID TrueType      Identity-H       yes no  yes     26  0
DejaVuSansBoldOblique                CID TrueType      Identity-H       yes no  yes     31  0
FrenchScriptMT                       CID TrueType      Identity-H       yes no  yes    236  0
Georgia                              TrueType          Custom           no  no  no      14  0
Helvetica                            Type 1            Custom           no  no  no      11  0
Symbol                               Type 1            Custom           no  no  no      13  0
Times-Bold                           Type 1            Custom           no  no  no       9  0
Times-BoldItalic                     Type 1            Custom           no  no  no      10  0
Times-Italic                         Type 1            Custom           no  no  no       8  0
Times-Roman                          Type 1            Custom           no  no  no       7  0
CBN+URWPalladioL-Roma                Type 1            Custom           yes no  no      36  0

@terefang
Copy link

terefang commented Mar 27, 2024

Adobe Font MS (Core) Font URW Font TeXGyre Font
Times Times Nimbus Roman TeX Gyre Termes
Helvetica Arial Nimbus Sans TeX Gyre Heros
Courier Courier New Nimbus Mono PS TeX Gyre Cursor
Symbol MS Symbol Standard Symbols PS (OpenSymbol)
Zapf Chancery Comic Sans Z003 (Chancery) TeX Gyre Chorus
Zapf Dingbats WingDings D050000L (Dingbats) (OpenSymbol)
Century Schoolbook C059 (Century Schoolbook) TeX Gyre Schola
Palatino P052 (Palladio) TeX Gyre Pagella
Minion Georgia URW Bookman TeX Gyre Bonum
Avant Garde Verdana URW Gothic TeX Gyre Adventor

@terefang
Copy link

i would assume that Linux users know how to install the TexGyre and OpenSymbol fonts, since support for the "free" URW fonts is slowly fading.

@terefang
Copy link

terefang commented Mar 27, 2024

btw, the symbols from Wingdings and Dingbats are the matching, Webdings is a different can of worms.

that said, OpenSymbol has support for the Symbol, Dingbat and some of Webdings

@sciurius
Copy link
Contributor

I'm surprised to see Verdana as a replacement for Avant Garde. It has a totally different appearance.
Same for Comic Sans as a replacement for Zapf Chancery.

@terefang
Copy link

terefang commented Mar 27, 2024

i looked in my fonts.conf substitution file, but that is subject (overridden) by the availability of the fonts as i have msttcore fonts installed.

@terefang
Copy link

FontManager.zip

Linux Mint screenshots with Acroread9, xpdf, mupdf, okular, evince

@sciurius
Copy link
Contributor

PDF::Builder (and its predecessor, PDF::API2) defined 15 or so extensions to "core" fonts, for Windows. [...] Anyway, all these fonts are supposedly "core", so a Reader knows where to find them,

While several (many? most?) viewers do, as stated earlier, PDF viewers are only guaranteed to know the 14 core fonts.

@sciurius
Copy link
Contributor

Linux Mint screenshots with Acroread9, xpdf, mupdf, okular, evince

This clearly shows how the readers try to cope with a missing font. Acroread9 and mupdf do a better job on replacing Georgia than the others.

@terefang
Copy link

terefang commented Mar 27, 2024

i put them into PDF::API2 because they where also used in the PDF1.3/1.4 samples, and i was still primarily a windows user at that time like most other people viewing the generated pdfs.

today i would decide differently as i have with my java library which actually ships the texgyre and opensymbol fonts and has aliases that fallback if known core fonts are not installed.

@terefang
Copy link

terefang commented Mar 27, 2024

btw, i have also noticed that Adobe Reader nowadays ships with Minion and Myriad, which might be currently unmatched in the linux world, unless someone owns an adobe font dvd.

(Calibri, Cambria on MS?)

@sciurius
Copy link
Contributor

IIRC these are multi-master fonts intended to provide drop-in replacements all serif and sans corefonts.

@terefang
Copy link

terefang commented Mar 27, 2024

so if i use modern Adobe Reader with a pdf that has un-embedded Times or Georgia, i "might" get to see Minion instead – wow thats progress.

and i thought only tuxers had such problems.

@terefang
Copy link

@PhilterPaper while i understand your point, that you want to let the user get away with non-embedding for size and/or license reasons, i would not recommend it.

i know from personal experience that such produced documents do not age well, dont conform at all to PDF/A and PDF/X archival standards and can become totally unreadable with the next technology change.

if it is not for size but license, there is still the way of rendering the font into Type3 glyph xobjects.

@sciurius
Copy link
Contributor

fonts.adobe.com reads:

Licensing Information
The full Adobe Fonts library is cleared for both personal and commercial use.

I don't know if there are many proprietary licensed fonts anymore.

@PhilterPaper
Copy link
Owner Author

PhilterPaper commented Mar 27, 2024

I would expect that all the explicitly-given TTF or OTF fonts would end up being embedded, though I'm a bit surprised to see the CJK and T1 fonts apparently embedded, too. My experience with CJK (e.g., AdobeGothicStdL) has been that sometimes they'll embed, and other times not.

I can see that Georgia is not exactly the same from sample to sample (some of the metrics seem a little "off"). Apparently different Readers are substituting different things. IIRC, the bitmapped font is stored as Type 3 (graphics draws per character), so it is "embedded" (more or less).

Thank you much for the efforts the two of you have put in to gather this information. I appreciate that only the 14 "true" core fonts are guaranteed to be there, and anything else that's not embedded (not TTF/OTF) is a roll of the dice as to what (if anything) is substituted.

Where I'm going with all this is two-fold:

  1. There are a couple of decent choices in Windows for a default (TTF, and therefore embedded) "script" face in Font Manager, but who knows what is usable in non-Windows systems. I think I'll just have to let non-Windows installations default "script" to "Times-Italic" as the closest thing to a script font guaranteed to be there. Of course, a user can install and choose to use anything they want for a cursive script font, but I'm just looking here for a decent default if someone runs examples/FontManager.pl without installing or selecting any fonts explicitly.
  2. If I'm going to define the 14 additional Windows "core" extensions in Font Manager, I feel that I owe it to non-Windows users to have something "decent" substituted for them. As we saw with Georgia, the present substitutions are not perfect, but at least something is happening. I'm open to suggestions as to whether I should just let non-Windows Readers substitute as they choose (and apparently doing a fairly good job at it), with a reminder in the documentation that this will be done, or if I should actively substitute as proposed earlier. In either case, I would prepopulate the Windows core extensions for everyone, not just on Windows systems. "Core" font extensions for Windows seem to more or less work on non-Windows systems, so perhaps I should just offer them (with a reminder that it's Reader-dependent).

At this point, the only embedding I'm familiar with is TTF/OTF (as well as the Type 3 bitmapped font), although it appears that some CJK and possibly T1 may embed. Regarding embedding in general, yes, I appreciate that it's considered good form to embed fonts into a PDF (or else use the 14 core fonts) for portability purposes. I really should address embeddability of all fonts at some point (#80, #81), but not right now.

@terefang
Copy link

fonts.adobe.com

but only if you have a valid subscription, you cannot simply download and redistribute adobe fonts at will.

@terefang
Copy link

At this point, the only embedding I'm familiar with is TTF/OTF (as well as the Type 3 bitmapped font), although it appears that some CJK and possibly T1 may embed. Regarding embedding in general, yes, I appreciate that it's considered good form to embed fonts into a PDF (or else use the 14 core fonts) for portability purposes. I really should address embeddability of all fonts at some point, but not right now.

Type3 is not limited to bitmapped.

test-fonts-all-as-t3.pdf

this is an example extracting the glyph curves from the fonts and making T3 analogs.

@sciurius
Copy link
Contributor

Interesting. The Fedora viewers Evince and Atril show empty pages (except for the heading). MuPDF shows the glyphs ok.

@terefang
Copy link

terefang commented Mar 27, 2024

  • Mint Xreader shows no text.
  • Xpdf renders correctly
  • Mupdf renders correctly
  • Qpdfview renders correctly
  • Okular renders correctly
  • Vivaldi/Chrome renders correctly
  • LibreOffice 7.3.7.2 opens correctly

iirc they incorrectly dont bleed the graphic context for rendering the xo via t3 font, which has been the default behavior of adobe.

else you wouldnt be able to either style nor color the text.

@PhilterPaper
Copy link
Owner Author

Hmm. When you say, "empty page" or "no text", is the entire page literally empty? Even the text from core fonts? That sounds like something is going belly up right at the beginning of the PDF, rather than something unwanted happening with a core extension or the Type 3 bitmapped font.

@terefang
Copy link

only the t3 text is missing
image

@terefang
Copy link

vs mupdf
image

@PhilterPaper
Copy link
Owner Author

Oh, so that is apparently an issue with your PDF Reader, rather than FontManager.pdf (a general lack of Type 3 support)? Just out of curiosity, does the bitmapped (Type 3) text in FontManager.pdf fail to show up in those readers?

@sciurius
Copy link
Contributor

Just thinking loud...
Noone is going to use PDF::Builder if it can also be done with PDF::API2.
If you are going to use PDF::Builder for its numerous enhancements, you're dealing with a new library anyway and you can drop the legacy, either in a hard way (drop) or soft (emulate in a better way).

@sciurius
Copy link
Contributor

sciurius commented Jul 9, 2024

FYI: https://behdad.org/text2024/

@terefang
Copy link

Interestingly i encountered some "historic" test files for interoperability with API2 produced by Acrobat Printer.

It looks like "historic versions" of Acrobat not only refused to embed TrueType fonts with the "fsType" flag set but also well-known fonts from the Postscript Level 3 Set, unless they where either managed by Adobe Type Manager or available as native Type1 (pfb) format.

There seems also some "feature-mismatch" across Windows and Macintosh (pre-OSX) versions of the software.

I will do some further research, because i was asked to help fixing such old PDFs for conversion to PDF/A-ish long time archival.

@PhilterPaper
Copy link
Owner Author

I'd appreciate hearing about anything interesting with regards to old Acrobat Readers and "feature mismatch"es. Of course, if really old Adobe products are messed up, but reasonably recent ones (say, 10 years old or newer) are OK, we probably shouldn't spend time worrying about it. Certainly, free products such as Acrobat Reader should be kept up to date, but even paid products can't be expected to run forever, and need to be updated.

I have several users interested in PDF/A support (see #52), so I need to pay more attention to that area. Among other things, this would help differentiate PDF::Builder from PDF::API2.

@terefang
Copy link

terefang commented Jul 21, 2024

I'd appreciate hearing about anything interesting with regards to old Acrobat Readers and "feature mismatch"es. Of course, if really old Adobe products are messed up, but reasonably recent ones (say, 10 years old or newer) are OK, we probably shouldn't spend time worrying about it. Certainly, free products such as Acrobat Reader should be kept up to date, but even paid products can't be expected to run forever, and need to be updated.

Unfortunately it seems that many people still have old software out there just because it works on their system and they dont need anything more for their established workflows.

Especially "Adobe Creative Suite 2" (from 2005), which was the last version working on 32-bit PowerMacs is somewhat still alive and kicking.

Also "Adobe Creative Suite 6" (EOLed 2013), has a large following as it is the last non-Cloud version.

@PhilterPaper
Copy link
Owner Author

I require a Perl from within the last 6 years or so (currently 5.26) so I'm not trying to fix glitches which are the fault of old Perls. Yet, I try not to use really "new" features of Perl, so that users don't have to constantly update their systems. Perl is free, so it's not a great burden to keep reasonably updated (just time to do the update, check that it's OK, and in some cases, have it validated for security lapses and the Legal Department wants to check the license!). At least this isn't PHP, where existing features are constantly being withdrawn, resulting in old code that no longer works!

PDF::Builder supports most of PDF 1.4, with a few features from 1.5. I can't see ever supporting anything beyond 1.7, as that has been around for so long that any reasonable tool should support it. Nevertheless, I would expect very few features beyond 1.4 to ever make it in. One near-term possibility is Object Streams, as they seem to be showing up in "wild" PDFs. Of course, users do try to read in PDFs which use post-1.4 features (produced by other tools), so thought needs to be given on how to at least tolerate such features, even if new PDF output (from PDF::Builder) doesn't make use of them.

This leaves tools such as PDF Readers. Like any other software, they can have bugs in them, and should be kept somewhat up to date. I can sympathize with users who have paid out good money for a suite of tools, and don't want to have to keep upgrading them to work with PDFs from PDF::Builder. Simple PDF Readers are free, so there's no real excuse to fall far behind on one (see Perl update issues above), but tool suites which do editing and such could be a real problem if they are backlevel and buggy. I'm just not sure what to do about that. Insisting on using 10 or 20 year old software is your business, but if it breaks, don't expect me to "fix" PDF::Builder to work around it (assuming the PDF output is actually legit).

Thoughts, suggestions, recommendations?

@terefang
Copy link

PDF versions 1.5 thru 1.7 are somewhat the norm, but the new Adobe Suite seems to create 2.0 (ISO Standard) by default.

If we apply the standard engineering 7 year obsolescence rule, you might be good.

If you want to support updating PDF files having Object Streams support is a must ... even though i would be against creating new PDFs with it by default for compatiblity reasons ... just give the programmer an option to change that behavior from the default.

Hmmm ... for interoperability my goto is always xpdf, if it works there it should be expected to work anywhere else.

@sciurius
Copy link
Contributor

Another question is how to deal with vertical font metrics. There are (at least?) three different sets of vertical metrics: hhea, win and typo. For example, PDF::Builder $font->ascender will give me which? hheaAscend? WinAscend? TypoAscender? For baseline-to-baseline distances, there is also hheaLinegap and TypoLineGap. And, not to forget, fsSelection flag USE_TYPO_METRICS.

@terefang
Copy link

terefang commented Jul 23, 2024

    $data->{'ascender'} = int($font->{'hhea'}->read()->{'Ascender'} * 1000 / $data->{'upem'});
    $data->{'descender'} = int($font->{'hhea'}{'Descender'} * 1000 / $data->{'upem'});

at the time i implemented this i encountered a lot of broken fonts so the hhea table was the best bet.

if you want to cope with all cases you need a sensible fallback mechanism like:

hheaAscend!=0 ? hheaAscend : WinAscend!=0 ? WinAscend : TypoAscender!=0 ? TypoAscender : 800

@PhilterPaper
Copy link
Owner Author

So a given font could use one of (?) hhea, Win, or Typo style ascenders and descenders; and try for first defined/nonzero value? Are they the same value (after processing with upem), or different ones? If the same, why would anyone have bothered defining three different methods? If different, which one to believe? This sounds very messy.

@sciurius
Copy link
Contributor

For typesetting, I need to know where to start (i.e. the distance of the baseline from the top), and how much to advance vertically to the next line. The distance of the baseline from the top is the ascender. The bare baseline-to-baseline distance is, by definition, the font size. Piece of cake. Well, take a look at the attached program and its results.

First thing to remark is that font properties $descender, $ascender and @fontbbox are in 1/1000 units, while $capheight, $underlinethickness, and $underlineposition are in 1/UPEM units.

We can see that all TrueType fonts have a decent ascender, descender and capsheight. The underline position and thickness seems reasonable too. The capheight and descender of FreeSerif is a bit tight and may result in clipping.
The (built-in) metrics for Times-Roman have an ascender that is just too small, while capheight would be correct for ascender. These metrics completely ignore the accents.

The red line indicates the font size offset to the descender. If this is used to advance font glyphs will overlap.

When considering to use the font bounding box for advancing, you'll quickly notice that this would be fine for ITCGaramond and Times-Roman only. FreeSerif has a reasonable top value, but the bottom value seems to include aesthetic space. IBMPlexMonoRegular also includes aesthetic space, but evenly divides it over the top and bottom.
Roboto includes the space at the top, which matches the classical use of leading to space lines.

So the initial questions, where to start and how much to advance, cannot be answered using the font properties currently provided by PDF::Builder and PDF::API2.

fontmetrics2.zip

@PhilterPaper
Copy link
Owner Author

Well, it sounds like font packagers are not using consistent definitions of ascender, descender, capheight, fontsize, etc. As long as they are internally consistent within a font, I don't think we can do much programmatically in a user such as PDF::Builder. All we can do is, by trial-and-error, adjust our use of fontsize and leading to give aesthetically pleasing results for a given font file. Mixing different fonts on one line could be something of a problem. Perhaps we should think about additional parameters when opening a font file, to adjust or override built-in font definitions? For example, "subtract 150 units to remove built-in leading", or "add 100 units to the ascender height to leave space for accents"?

@sciurius
Copy link
Contributor

In our applications we do not work with 'bare' fonts, but via an intermediate layer (fontconfig, fontmanager, ...). That gives the option to overrule the properties on a per-font basis.

Properties we would need:

  • leading (beseline of first line = leading * ascender from top)
  • ascender
  • descender
  • capheight?
  • xheight?
  • baselinestretch (btbd = fontsize, aestethic distance = fontsize * baselinestretch)
  • underlineposition
  • underlinethickness
  • strikethroughposition
  • strikethroughthickness (defaults to underlinethickness)

@terefang
Copy link

terefang commented Jul 25, 2024

this is my goto picture:

ascender-descender-baseline-info

Most personal fonts use auto-calculated values for these, while most commercial fonts have sensible values set by the type designers -- so the answer is "it depends".

Modern Word Processors(WP) and Layout Applications(LA) have the same issue here.

While most WPs use default values and mostly dont care (blaming bad font/designer), some like LibreOffice let you adjust/scale them.

Most LAs use default values, but offer the option to set those values explicitly for each font by the designer/editor.

While there are ways to extract info from the fontfile and auto-calculate others; i would suggest that you give your users the option to set/override the values themselves during font loading, if they dont like them.

@terefang
Copy link

please also note the CSS concepts are sleightly different:

ascender-descender-baseline-info-css

@PhilterPaper
Copy link
Owner Author

In our applications we do not work with 'bare' fonts, but via an intermediate layer (fontconfig, fontmanager, ...). That gives the option to overrule the properties on a per-font basis.

That's what I was thinking along the lines of. They would be new options for ttfont()/font(), etc., and font managers (such as FontManager in Builder) could store them (the overrides) for each font. Maybe get_font() could also override the overrides on-the-fly. I'm not sure what the best way would be to reduce an author's workload in using the same overrides over and over (between similar applications) would be -- maybe it wouldn't be too bad to do the work once, and copy-paste the option parameter code from a reference file?

Properties we would need:

I think it only uses a few of these properties, but it wouldn't hurt to be able to define the whole set. Maybe you're more familiar with the innards of font handling than I am. One question would be whether the overrides overwrite the actual font global data structure at font reading, or do we carry these overrides separately, and apply them whenever we work with a font? The latter might be needed if we need to go back to the defaults for some reason, or otherwise change the overrides.

Just out of curiosity, do non-TTF/OTF fonts (core, T1, etc.) define these too, and in the same manner?

this is my goto picture:

Do all of these show up in TTF fonts, or does it vary? It looks like top and bottom "shoulders" might interact with "leading". Speaking of leading, this one appears to use the newer definition (baseline to baseline spacing), rather than the traditional "extra" amount to be added between lines.

please also note the CSS concepts are [slightly] different:

Any stack of accents/diacritics above a capital letter should fall within the ascender? From what Johan was describing earlier, it sounds like accents often are above the cap height and even above the ascender! I wonder if Vietnamese, which uses the Latin alphabet with lots of accents, has its own fonts which allow for all the possible accent combinations?

@terefang
Copy link

terefang commented Jul 25, 2024

Just out of curiosity, do non-TTF/OTF fonts (core, T1, etc.) define these too, and in the same manner?

for Core/Type1 the source is either the AFM or PFM file which commonly has:

UnderlinePosition -100
UnderlineThickness 50
FontBBox -526 -281 1306 1055
CapHeight 662
XHeight 450
Ascender 662
Descender -217

there are some heuristics for missing values:

CapHeight

  • if capHeight is not given the first fallback is the height (or yMax) of the "H" glyph
  • if the "H" glyph is not defined, 80% of the fonts BBox.yMax is used.

XHeight

  • if XHeight is not given the first fallback is the height (or yMax) of the "x" glyph
  • if the "x" glyph is not defined, 50% of the fonts BBox.yMax is used.

Descender

  • if Descender is not given the difference of heights of the "y" and "a" glyphs is used.
  • if one of "y" or "a" is not defined, a standard value of 200 is used.

Ascender

  • if Ascender is not given the 120% of CapHeight is used.
  • if CapHeight is not defined, 90% of the fonts BBox.yMax is used.

UnderlinePosition

  • if UnderlinePosition is not given, the 50% of Descender is used.
  • if Descender is not defined, a standard value of 100 is used.

UnderlineThickness

  • if UnderlineThickness is not given, the 50% of UnderlinePosition is used.
  • if UnderlinePosition is not defined, 25% of Descender is used.
  • if Descender is not defined, a standard value of 50 is used.

strikethroughposition

quote from https://stackoverflow.com/questions/8215754/css-strike-through-not-centered-through-the-text

A strike-through is traditionally some percentage (70% to 90%) of the x-height (or the height of a lower case 'x'). If the line were at the 50% of cap-height, it may be possible the strike-through would be above or at the top of any lowercase letter in the set. A strike-through is supposed to put a line through all letters in the text, which is why is gauged from the x-height.

a quick and dirty default is (capHeight+xHeight)/4

strikethroughthickness

(defaults to underlinethickness)

leading

Generations have discussed this, but good default seem to be among:

  • Golden Ratio (1.681...) divided by the Square Root of 2 = 1.14...
  • the Square Root of 2 = 1.41...
  • the average of the two above = 1.279... or simply 1.2
  • let the user set it explicitly

baselinestretch

whatever the current aestethic is.

@sciurius
Copy link
Contributor

The definition of 'leading' hurts my typesetting background (the lead strips are inserted to increase the space between the lines). Never mind…
In contrast to all other properties (which are part of the font design, hence fixed) the leading is intended to be varied according to the application.
The modern (CSS) term linespacing feels better, including the suggestion to spread the added space over the top and bottom. The name clearly indiciates it is the spacing between lines

To get back to my original questions:

  • Where to start?
    the baseline is ascender from the top. The value is taken from the font properties, unless overruled by a user value. I18n ignoring applications can use capheight instead of ascender.
  • Advance to the next line?
    linespacing. The default value is the font size ('solid' typesetting), usually increased to a value 1.15-1.3 larger by the user/application.

Boxes around texts should be based on the individual glyph bounding boxes.
scrot20240725095750

Font properties that may/need be user-specified:

  • linespacing (baseline of next line is current baseline + linespacing)
  • ascender (baseline of first line = ascender from top)
  • descender (do we need it?)
  • capheight (do we need it?)
  • xheight (may be needed to provide default for strikethroughposition)
  • underlineposition
  • underlinethickness
  • strikethroughposition
  • strikethroughthickness (defaults to underlinethickness)

@terefang
Copy link

@sciurius
Copy link
Contributor

Good point. For the practical purposes of <sup> and <sub> I suggest to use 60% size, raise 33% / lower 8% (similar to LibreOffice).

@terefang
Copy link

terefang commented Jul 25, 2024

looking at the picture, i would suggest:

  • script size = xHeight or 50% whichever is greater
  • super-script = centered on capHeight (eg. capHeight - (xHeight/2))
  • sub-script = centered on baseline (eg. baseline - (xHeight/2))

i have found here https://www.openoffice.org/api/docs/common/ref/com/sun/star/report/XReportControlFormat.html#CharEscapement

and here https://github.com/LibreOffice/core/blob/04184aa7e3aada8f4d938d20dfdb54b3a7dd3896/include/editeng/escapementitem.hxx#L28

  • CharEscapementHeight make sense with 58%
  • CharEscapement :
#define DFLT_ESC_SUPER   33     // 42% (100 - DFLT_ESC_PROP) of ascent (~80% of font height) = 33% of total font height
#define DFLT_ESC_SUB     -8     // 42% of descent (~20% of font height) = -8%. previously -33% (pre-2020), previously 8/100 (pre-2000?)
#define DFLT_ESC_PROP    58

@sciurius
Copy link
Contributor

I've got the same from the dialogs ☺.

scrot20240725162716
scrot20240725162745

@PhilterPaper
Copy link
Owner Author

Wow, that's a lot of stuff to digest! Anyway, it's something to consider to fine-tune various sizes and positions, especially for super- and sub-scripts (not to forget, superscripts on superscripts, subscripts on subscripts, etc.). I have to say that I'm a bit concerned about superscripts clashing with the line above (unless this baseline is temporarily lowered) or subscripts clashing with the line below (lower the next line's baseline). We might want to think about changing the ascender or descender value on lines with super- or sub-scripts.

Any further thoughts on whether it would be better to change the font table values (when requested, to fine-tune settings) as soon as the font is loaded (i.e., "permanently"), or keep overrides to the side to apply as-needed, allowing a return to original default values? The code is a bit more complicated in the latter case, but it does permit an easy change in the overrides at any time.

@sciurius
Copy link
Contributor

sciurius commented Jul 25, 2024

I think the example in #213 (comment) is slightly misleading. In the superscript case, the superscript exceeds the ascender, but so would accented capitals like À. It's a font with bad metrics.

@PhilterPaper
Copy link
Owner Author

Would a font's built-in metrics cover super- and sub-scripts? Wouldn't the above case be a violation by whatever application is deciding to do about size and placement of *scripts placing them above the ascender or below the descender? Then you still have the issue of superscripts-on-superscripts x^2^2, which would probably involve lowering the baseline.

On the other hand, diacritics (accents), if built into a font (canned), you would think would obey ascender and descender limits, while composite accented letters (non-spacing diacritic(s) + a letter) probably could not obey the limits. Here you would need to lower baseline(s) to get everything to fit. Also, figuring where to place a given accent mark (if stacking multiple diacritics) would be an interesting problem to solve. I wonder if regular non-spacing diacritics know enough to stack and not collide with each other? Presumably someone (HarfBuzz?) has done it for languages such as Vietnamese.

@terefang
Copy link

terefang commented Jul 27, 2024

here the values from a Font designed by a named designer:

'head' Table - Font Header
--------------------------
         unitsPerEm:             1000
         xMin:                   -600
         yMin:                   -190
         xMax:                   1407
         yMax:                   1026

'hhea' Table - Horizontal Header
--------------------------
         yAscender:              1000
         yDescender:             -200
         yLineGap:               0
         advanceWidthMax:        1467
         minLeftSideBearing:     -600
         minRightSideBearing:    -100
         xMaxExtent:             1407
         caretSlopeRise:         1
         caretSlopeRun:          0

'post' Table - PostScript
-------------------------
         italicAngle:            0.0
         underlinePosition:      -75
         underlineThichness:     50

'OS/2' Table - OS/2 and Windows Metrics
---------------------------------------
         xAvgCharWidth:          547
         usWeightClass:          400     'Normal'
         usWidthClass:           5       'Medium'
         ySubscriptXSize:        650
         ySubscriptYSize:        600
         ySubscriptXOffset:      0
         ySubscriptYOffset:      75
         ySuperscriptXSize:      650
         ySuperscriptYSize:      600
         ySuperscriptXOffset:    0
         ySuperscriptYOffset:    350
         yStrikeoutSize:         50
         yStrikeoutPosition      324
         sTypoAscender:          800
         sTypoDescender:         -200
         sTypoLineGap:           200
         usWinAscent:            1000
         usWinDescent:           200
         sxHeight:               540
         sCapHeight:             710

Note: the OS/2 table has seen 6 versions since its inception, so the sxHeight and sCapHeight fields are absent (truncated) in many old fonts for table versions 0 and 1.

@terefang
Copy link

and here for comparison a common open substitute (TeXGyreHeros)

'head' Table - Font Header
--------------------------
         unitsPerEm:             1000
         xMin:                   -529
         yMin:                   -284
         xMax:                   1353
         yMax:                   1148


'hhea' Table - Horizontal Header
--------------------------
         yAscender:              1148
         yDescender:             -284
         yLineGap:               0
         advanceWidthMax:        1398
         minLeftSideBearing:     -529
         minRightSideBearing:    -255
         xMaxExtent:             1353
         caretSlopeRise:         1
         caretSlopeRun:          0

'post' Table - PostScript
-------------------------
         italicAngle:            0.0
         underlinePosition:      -102
         underlineThichness:     50

'OS/2' Table - OS/2 and Windows Metrics
---------------------------------------
         xAvgCharWidth:          534
         usWeightClass:          400     'Normal'
         usWidthClass:           5       'Medium'
         ySubscriptXSize:        650
         ySubscriptYSize:        600
         ySubscriptXOffset:      0
         ySubscriptYOffset:      75
         ySuperscriptXSize:      650
         ySuperscriptYSize:      600
         ySuperscriptXOffset:    0
         ySuperscriptYOffset:    350
         yStrikeoutSize:         50
         yStrikeoutPosition      314
         sTypoAscender:          784
         sTypoDescender:         -216
         sTypoLineGap:           200
         usWinAscent:            1148
         usWinDescent:           284
         sxHeight:               524
         sCapHeight:             729

@terefang
Copy link

terefang commented Jul 27, 2024

now the same font as the first but from 1991:

'head' Table - Font Header
--------------------------
         unitsPerEm:             1000
         xMin:                   -157
         yMin:                   -20
         xMax:                   736
         yMax:                   780

'hhea' Table - Horizontal Header
--------------------------
         yAscender:              780
         yDescender:             -220
         yLineGap:               0
         advanceWidthMax:        780
         minLeftSideBearing:     -157
         minRightSideBearing:    -167
         xMaxExtent:             736

'post' Table - PostScript
-------------------------
         italicAngle:            0.0
         underlinePosition:      -100
         underlineThichness:     50

'OS/2' Table - OS/2 and Windows Metrics
---------------------------------------
         xAvgCharWidth:          0
         usWeightClass:          400     'Normal'
         usWidthClass:           5       'Medium'
         ySubscriptXSize:        700
         ySubscriptYSize:        650
         ySubscriptXOffset:      0
         ySubscriptYOffset:      143
         ySuperscriptXSize:      700
         ySuperscriptYSize:      650
         ySuperscriptXOffset:    0
         ySuperscriptYOffset:    453
         yStrikeoutSize:         50
         yStrikeoutPosition      259
         sTypoAscender:          780
         sTypoDescender:         220
         sTypoLineGap:           0
         usWinAscent:            780
         usWinDescent:           20

@PhilterPaper
Copy link
Owner Author

PhilterPaper commented Jul 27, 2024

Interesting. So some fonts contain the desired super- and sub-script size and positioning (applicable to any OS and rendering engine). If they do, presumably they should fit within the ascender and descender limits. If they don't, or you wish to override them for some reason, use best practices (staying within the limits).

For odd situations such as superscripts on superscripts, something would have to be done to either fit within the limits, or move the baseline down. Of course, if you're doing such serious mathematical typesetting, you'll want to be using MathJax (creating SVG images) anyway (support expected in release 3.028).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
general discussion roadmaps, etc., discuss direction help wanted we could use some help from you guys
Projects
None yet
Development

No branches or pull requests

3 participants