-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Font issues, particularly Windows "core" extensions #213
Comments
i know something for windows:
for linux, directories scanned are:
if you generalize from Linux to Unix you might be on the safe side. i have seen for this http://xahlee.info/linux/linux_fonts.html mostly applicable for multi-language debian/ubuntu installs, also basically Linux Mint. the most correct answer for linux is "it depends" and simply depends on the preferences of the particular user. i would classify them into the following:
|
for mac you might take a look here: https://en.wikipedia.org/wiki/List_of_typefaces_included_with_macOS |
Thanks for the information, Alfred. It sounds like, unfortunately, for non-Windows systems there are no close equivalents to the Windows "core" extensions that I can count on being there. I was hoping to find "use X instead of Verdana, use Y instead of Georgia," etc. I can think about doing an extensive search for the fonts you listed, but I'm not sure it's worth the effort. And if nothing shows up, I still have to substitute something. Maybe I would be better off just substituting (on non-Windows systems) Helvetica for Verdana and Trebuchet, Times for Georgia, and who knows what (Zapf-Dingbats?) for Webdings and Wingdings. At least something reasonable would be produced. Any thoughts on that approach? Users would still have the freedom to add whatever fonts they want to use; I'm just looking at the "core" fonts. And producing documents using Windows core extension fonts could produce interesting results on non-Windows systems -- assuming that Readers under Windows are actually using those as locally-installed (by default) fonts and not some substitute. Regarding a default script font, maybe I'll just use Times-Italic on non-Windows systems. A user can always find and install a real script font if they want to run the examples/FontManager, but it still uses a bunch of other fonts (T1, bitmap, CJK, etc.) that the typical user won't have installed. So long as the example degrades gracefully for non-installed fonts, at least it will be runnable. Let's say you're a Linux user with a bog standard set of installed fonts -- what would you expect as reasonable results from examples/FontManager, which uses all the "core" fonts (including the Windows extensions) as well as a script and a bunch of odd other fonts? Would you expect some "safe" substitutions, and if so, what? Could you do me a favor and on a non-Windows machine or two, download and open https://www.catskilltech.com/Examples/PDF/Builder/FontManager.pdf and see what it looks like? The "Georgia" expected at the end of the second paragraph may or may not show up, and much of the third paragraph may ??? There's a bunch of fonts in different formats that a user would need to install, and I'm not sure how it looks when they're not there. At the very least, the example should not blow up (which I would need to fix). Thanks!
If you're talking about \Windows\Fonts, my experience has been that often a font file "dropped" into there gets moved to something under my |
Is there any reason why this document would not look precisely as intended? That's the whole idea behind the P in PDF... In this case, Georgia, which is not one of the 14 corefonts, is not embedded so it completely depends on the viewer what will be shown. According to the screendumps they seem to so a good job. Attached screendumps of Debian Bookworm, Fedora 39, Linux Mint, Fedora Rawhide, MacOS Catalina and Windows 10. Note that Fedora39 is my personal workstation which has lots of non-standard things installed. That is probably the reason the greek characters (except for π) do not show. |
did you intend that some fonts where embedded?
|
|
i would assume that Linux users know how to install the TexGyre and OpenSymbol fonts, since support for the "free" URW fonts is slowly fading. |
btw, the symbols from Wingdings and Dingbats are the matching, Webdings is a different can of worms. that said, OpenSymbol has support for the Symbol, Dingbat and some of Webdings |
I'm surprised to see Verdana as a replacement for Avant Garde. It has a totally different appearance. |
i looked in my fonts.conf substitution file, but that is subject (overridden) by the availability of the fonts as i have msttcore fonts installed. |
Linux Mint screenshots with Acroread9, xpdf, mupdf, okular, evince |
While several (many? most?) viewers do, as stated earlier, PDF viewers are only guaranteed to know the 14 core fonts. |
This clearly shows how the readers try to cope with a missing font. Acroread9 and mupdf do a better job on replacing Georgia than the others. |
i put them into PDF::API2 because they where also used in the PDF1.3/1.4 samples, and i was still primarily a windows user at that time like most other people viewing the generated pdfs. today i would decide differently as i have with my java library which actually ships the texgyre and opensymbol fonts and has aliases that fallback if known core fonts are not installed. |
btw, i have also noticed that Adobe Reader nowadays ships with Minion and Myriad, which might be currently unmatched in the linux world, unless someone owns an adobe font dvd. (Calibri, Cambria on MS?) |
IIRC these are multi-master fonts intended to provide drop-in replacements all serif and sans corefonts. |
so if i use modern Adobe Reader with a pdf that has un-embedded Times or Georgia, i "might" get to see Minion instead – wow thats progress. and i thought only tuxers had such problems. |
@PhilterPaper while i understand your point, that you want to let the user get away with non-embedding for size and/or license reasons, i would not recommend it. i know from personal experience that such produced documents do not age well, dont conform at all to PDF/A and PDF/X archival standards and can become totally unreadable with the next technology change. if it is not for size but license, there is still the way of rendering the font into Type3 glyph xobjects. |
fonts.adobe.com reads: Licensing Information I don't know if there are many proprietary licensed fonts anymore. |
I would expect that all the explicitly-given TTF or OTF fonts would end up being embedded, though I'm a bit surprised to see the CJK and T1 fonts apparently embedded, too. My experience with CJK (e.g., AdobeGothicStdL) has been that sometimes they'll embed, and other times not. I can see that Georgia is not exactly the same from sample to sample (some of the metrics seem a little "off"). Apparently different Readers are substituting different things. IIRC, the bitmapped font is stored as Type 3 (graphics draws per character), so it is "embedded" (more or less). Thank you much for the efforts the two of you have put in to gather this information. I appreciate that only the 14 "true" core fonts are guaranteed to be there, and anything else that's not embedded (not TTF/OTF) is a roll of the dice as to what (if anything) is substituted. Where I'm going with all this is two-fold:
At this point, the only embedding I'm familiar with is TTF/OTF (as well as the Type 3 bitmapped font), although it appears that some CJK and possibly T1 may embed. Regarding embedding in general, yes, I appreciate that it's considered good form to embed fonts into a PDF (or else use the 14 core fonts) for portability purposes. I really should address embeddability of all fonts at some point (#80, #81), but not right now. |
but only if you have a valid subscription, you cannot simply download and redistribute adobe fonts at will. |
Type3 is not limited to bitmapped. this is an example extracting the glyph curves from the fonts and making T3 analogs. |
Interesting. The Fedora viewers Evince and Atril show empty pages (except for the heading). MuPDF shows the glyphs ok. |
iirc they incorrectly dont bleed the graphic context for rendering the xo via t3 font, which has been the default behavior of adobe. else you wouldnt be able to either style nor color the text. |
Hmm. When you say, "empty page" or "no text", is the entire page literally empty? Even the text from core fonts? That sounds like something is going belly up right at the beginning of the PDF, rather than something unwanted happening with a core extension or the Type 3 bitmapped font. |
Oh, so that is apparently an issue with your PDF Reader, rather than FontManager.pdf (a general lack of Type 3 support)? Just out of curiosity, does the bitmapped (Type 3) text in FontManager.pdf fail to show up in those readers? |
Just thinking loud... |
Interestingly i encountered some "historic" test files for interoperability with API2 produced by Acrobat Printer. It looks like "historic versions" of Acrobat not only refused to embed TrueType fonts with the "fsType" flag set but also well-known fonts from the Postscript Level 3 Set, unless they where either managed by Adobe Type Manager or available as native Type1 (pfb) format. There seems also some "feature-mismatch" across Windows and Macintosh (pre-OSX) versions of the software. I will do some further research, because i was asked to help fixing such old PDFs for conversion to PDF/A-ish long time archival. |
I'd appreciate hearing about anything interesting with regards to old Acrobat Readers and "feature mismatch"es. Of course, if really old Adobe products are messed up, but reasonably recent ones (say, 10 years old or newer) are OK, we probably shouldn't spend time worrying about it. Certainly, free products such as Acrobat Reader should be kept up to date, but even paid products can't be expected to run forever, and need to be updated. I have several users interested in PDF/A support (see #52), so I need to pay more attention to that area. Among other things, this would help differentiate PDF::Builder from PDF::API2. |
Unfortunately it seems that many people still have old software out there just because it works on their system and they dont need anything more for their established workflows. Especially "Adobe Creative Suite 2" (from 2005), which was the last version working on 32-bit PowerMacs is somewhat still alive and kicking. Also "Adobe Creative Suite 6" (EOLed 2013), has a large following as it is the last non-Cloud version. |
I require a Perl from within the last 6 years or so (currently 5.26) so I'm not trying to fix glitches which are the fault of old Perls. Yet, I try not to use really "new" features of Perl, so that users don't have to constantly update their systems. Perl is free, so it's not a great burden to keep reasonably updated (just time to do the update, check that it's OK, and in some cases, have it validated for security lapses and the Legal Department wants to check the license!). At least this isn't PHP, where existing features are constantly being withdrawn, resulting in old code that no longer works! PDF::Builder supports most of PDF 1.4, with a few features from 1.5. I can't see ever supporting anything beyond 1.7, as that has been around for so long that any reasonable tool should support it. Nevertheless, I would expect very few features beyond 1.4 to ever make it in. One near-term possibility is Object Streams, as they seem to be showing up in "wild" PDFs. Of course, users do try to read in PDFs which use post-1.4 features (produced by other tools), so thought needs to be given on how to at least tolerate such features, even if new PDF output (from PDF::Builder) doesn't make use of them. This leaves tools such as PDF Readers. Like any other software, they can have bugs in them, and should be kept somewhat up to date. I can sympathize with users who have paid out good money for a suite of tools, and don't want to have to keep upgrading them to work with PDFs from PDF::Builder. Simple PDF Readers are free, so there's no real excuse to fall far behind on one (see Perl update issues above), but tool suites which do editing and such could be a real problem if they are backlevel and buggy. I'm just not sure what to do about that. Insisting on using 10 or 20 year old software is your business, but if it breaks, don't expect me to "fix" PDF::Builder to work around it (assuming the PDF output is actually legit). Thoughts, suggestions, recommendations? |
PDF versions 1.5 thru 1.7 are somewhat the norm, but the new Adobe Suite seems to create 2.0 (ISO Standard) by default. If we apply the standard engineering 7 year obsolescence rule, you might be good. If you want to support updating PDF files having Object Streams support is a must ... even though i would be against creating new PDFs with it by default for compatiblity reasons ... just give the programmer an option to change that behavior from the default. Hmmm ... for interoperability my goto is always xpdf, if it works there it should be expected to work anywhere else. |
Another question is how to deal with vertical font metrics. There are (at least?) three different sets of vertical metrics: hhea, win and typo. For example, PDF::Builder |
at the time i implemented this i encountered a lot of broken fonts so the hhea table was the best bet. if you want to cope with all cases you need a sensible fallback mechanism like:
|
So a given font could use one of (?) hhea, Win, or Typo style ascenders and descenders; and try for first defined/nonzero value? Are they the same value (after processing with |
For typesetting, I need to know where to start (i.e. the distance of the baseline from the top), and how much to advance vertically to the next line. The distance of the baseline from the top is the ascender. The bare baseline-to-baseline distance is, by definition, the font size. Piece of cake. Well, take a look at the attached program and its results. First thing to remark is that font properties We can see that all TrueType fonts have a decent ascender, descender and capsheight. The underline position and thickness seems reasonable too. The capheight and descender of FreeSerif is a bit tight and may result in clipping. The red line indicates the font size offset to the descender. If this is used to advance font glyphs will overlap. When considering to use the font bounding box for advancing, you'll quickly notice that this would be fine for ITCGaramond and Times-Roman only. FreeSerif has a reasonable top value, but the bottom value seems to include aesthetic space. IBMPlexMonoRegular also includes aesthetic space, but evenly divides it over the top and bottom. So the initial questions, where to start and how much to advance, cannot be answered using the font properties currently provided by PDF::Builder and PDF::API2. |
Well, it sounds like font packagers are not using consistent definitions of ascender, descender, capheight, fontsize, etc. As long as they are internally consistent within a font, I don't think we can do much programmatically in a user such as PDF::Builder. All we can do is, by trial-and-error, adjust our use of fontsize and leading to give aesthetically pleasing results for a given font file. Mixing different fonts on one line could be something of a problem. Perhaps we should think about additional parameters when opening a font file, to adjust or override built-in font definitions? For example, "subtract 150 units to remove built-in leading", or "add 100 units to the ascender height to leave space for accents"? |
In our applications we do not work with 'bare' fonts, but via an intermediate layer (fontconfig, fontmanager, ...). That gives the option to overrule the properties on a per-font basis. Properties we would need:
|
this is my goto picture: Most personal fonts use auto-calculated values for these, while most commercial fonts have sensible values set by the type designers -- so the answer is "it depends". Modern Word Processors(WP) and Layout Applications(LA) have the same issue here. While most WPs use default values and mostly dont care (blaming bad font/designer), some like LibreOffice let you adjust/scale them. Most LAs use default values, but offer the option to set those values explicitly for each font by the designer/editor. While there are ways to extract info from the fontfile and auto-calculate others; i would suggest that you give your users the option to set/override the values themselves during font loading, if they dont like them. |
That's what I was thinking along the lines of. They would be new options for
I think it only uses a few of these properties, but it wouldn't hurt to be able to define the whole set. Maybe you're more familiar with the innards of font handling than I am. One question would be whether the overrides overwrite the actual font global data structure at font reading, or do we carry these overrides separately, and apply them whenever we work with a font? The latter might be needed if we need to go back to the defaults for some reason, or otherwise change the overrides. Just out of curiosity, do non-TTF/OTF fonts (core, T1, etc.) define these too, and in the same manner?
Do all of these show up in TTF fonts, or does it vary? It looks like top and bottom "shoulders" might interact with "leading". Speaking of leading, this one appears to use the newer definition (baseline to baseline spacing), rather than the traditional "extra" amount to be added between lines.
Any stack of accents/diacritics above a capital letter should fall within the ascender? From what Johan was describing earlier, it sounds like accents often are above the cap height and even above the ascender! I wonder if Vietnamese, which uses the Latin alphabet with lots of accents, has its own fonts which allow for all the possible accent combinations? |
for Core/Type1 the source is either the AFM or PFM file which commonly has:
there are some heuristics for missing values: CapHeight
XHeight
Descender
Ascender
UnderlinePosition
UnderlineThickness
strikethroughpositionquote from https://stackoverflow.com/questions/8215754/css-strike-through-not-centered-through-the-text
a quick and dirty default is strikethroughthickness(defaults to underlinethickness) leadingGenerations have discussed this, but good default seem to be among:
baselinestretchwhatever the current aestethic is. |
also consider https://en.wikipedia.org/wiki/Subscript_and_superscript |
Good point. For the practical purposes of |
looking at the picture, i would suggest:
i have found here https://www.openoffice.org/api/docs/common/ref/com/sun/star/report/XReportControlFormat.html#CharEscapement
|
Wow, that's a lot of stuff to digest! Anyway, it's something to consider to fine-tune various sizes and positions, especially for super- and sub-scripts (not to forget, superscripts on superscripts, subscripts on subscripts, etc.). I have to say that I'm a bit concerned about superscripts clashing with the line above (unless this baseline is temporarily lowered) or subscripts clashing with the line below (lower the next line's baseline). We might want to think about changing the ascender or descender value on lines with super- or sub-scripts. Any further thoughts on whether it would be better to change the font table values (when requested, to fine-tune settings) as soon as the font is loaded (i.e., "permanently"), or keep overrides to the side to apply as-needed, allowing a return to original default values? The code is a bit more complicated in the latter case, but it does permit an easy change in the overrides at any time. |
I think the example in #213 (comment) is slightly misleading. In the superscript case, the superscript exceeds the ascender, but so would accented capitals like |
Would a font's built-in metrics cover super- and sub-scripts? Wouldn't the above case be a violation by whatever application is deciding to do about size and placement of *scripts placing them above the ascender or below the descender? Then you still have the issue of superscripts-on-superscripts On the other hand, diacritics (accents), if built into a font (canned), you would think would obey ascender and descender limits, while composite accented letters (non-spacing diacritic(s) + a letter) probably could not obey the limits. Here you would need to lower baseline(s) to get everything to fit. Also, figuring where to place a given accent mark (if stacking multiple diacritics) would be an interesting problem to solve. I wonder if regular non-spacing diacritics know enough to stack and not collide with each other? Presumably someone (HarfBuzz?) has done it for languages such as Vietnamese. |
here the values from a Font designed by a named designer:
Note: the |
and here for comparison a common open substitute (TeXGyreHeros)
|
now the same font as the first but from 1991:
|
Interesting. So some fonts contain the desired super- and sub-script size and positioning (applicable to any OS and rendering engine). If they do, presumably they should fit within the ascender and descender limits. If they don't, or you wish to override them for some reason, use best practices (staying within the limits). For odd situations such as superscripts on superscripts, something would have to be done to either fit within the limits, or move the baseline down. Of course, if you're doing such serious mathematical typesetting, you'll want to be using MathJax (creating SVG images) anyway (support expected in release 3.028). |
Hi folks,
I was looking through some notes, sorting out upcoming work on PDF::Builder, and found that I need to gather some information about what's available on non-Windows systems. I'm hoping that @sciurius and @terefang in particular can help out here. Feel free to tag others who might be able to contribute.
My Font Manager subsystem (part of Builder, not Windows) includes an entry for someone to define their generic "script" font face (cursive, handwritten look). I do not default any font, as I don't know if Linux and Mac systems can be counted on to have any script faces pre-installed. Windows, on the other hand, appears to ship with FrenchScriptMT (FRSCRIPT.TTF) and ScriptMTBold (SCRIPTBL.TTF), and either would make a nice "as shipped" default. Can anyone tell me what the story is on non-Windows systems? I suppose that I could default them to either Times-Italic or perhaps Helvetica-Oblique. I have an example for the Font Manager that shows script, using FrenchScriptMT, but I suspect it will fail to install that font on non-Windows systems, and not display that example correctly.
PDF::Builder (and its predecessor, PDF::API2) defined 15 or so extensions to "core" fonts, for Windows. These are Verdana (a sans-serif face), Trebuchet (a very similar sans-serif face), and Georgia (a serif face), plus two Dingbats-like fonts (Wingdings and Webdings). A Roman sans-serif face, Bank Gothic, has metrics provided, but I don't pre-load it into the Font Manager as I'm not sure if my Windows system is even finding it when a Reader requests it (I think it's substituting something else). Anyway, all these fonts are supposedly "core", so a Reader knows where to find them, but the metrics are provided (even though they are all TTF). Any suggestions on what I might replace them with on Linux and Mac systems? A user can always define their own entries, but I'd like to have some reasonable defaults for users of non-Windows systems.
I realize that "Linux" covers a multitude of systems, each likely to load their own beloved set of fonts into their own favorite places. Macs might be a little more consistent, but I don't know for sure. I'm just hoping to find some consistent, widely installed, set of substitutes for the Windows "core" extensions. Then there's the issue of producing PDFs on non-Windows systems that might in turn be displayed on a PDF Reader on a Windows system. Do Readers on Windows systems actually know where to go for these "core" fonts, or are they just substituting something reasonable behind the scenes? I figure that I should get this straightened out and document it well before too long.
Thanks much to anyone who can shed some light on these issues.
The text was updated successfully, but these errors were encountered: