Skip to content

Conversation

@PJBrs
Copy link
Contributor

@PJBrs PJBrs commented Nov 19, 2025

This PR tries to harmonise on using the Adobe standard fonts information in CORE_FONT_METRICS where possible and applicable.

Currently, pypdf collects character widths in three different places. First, there is build_font_width_map

def build_font_width_map(
in _cmap.py , then there is the post_init method
def __post_init__(self) -> None:
of the Font class in _text_extraction/_layout_mode/_font.py, and there are the 14 adobe font widths in _font, in the FontDesciptor class.

This PR tries to make use of the data in CORE_FONT_METRICS in both build_font_width_map and the Font class post_init method, and addresses resulting changes in test results. As it is, this information was unavailable in cmap, and only partially used (incomplete) in the Font class post_init. In that respect, this PR means improved coverage of available font widths in these places. Also, the PR removes incomplete information where it was previously used.

It possibly is a first step towards harmonising the ways in which we collect character widths, which now still seem to differ in important ways between cmap and the Font class. Ultimately, I hope to arrive at one solution that either is part of the FontDescriptor dataclass or that can be used there.

This patch reads the character widths for the core 14
Type 1 PDF fonts from the _codecs/core_fontmetrics.py
file.
The _font_widths.py file contains a set of standard
widths for the core 14 Type 1 PDF fonts. However,
it is incomplete; for instance, it sets all Helvetica
variants to the same set of widths. Since we have
imported FONT_METRICS from the original Adobe core
fot afm files now, which are complete, complete, port
over the Font class to use that data.
This file contained STANDARD_WIDTHS, which was contained
font widths for the core 14 Type 1 PDF fonts. This is no
longer used since we imported FONT_METRICS from the afm
files.

Also, it would appear that the _font_widths.py did
not include the associated licensing information,
which, with its removal, is now also remedied.
It appears that, with the new font metrics  some space widths
in two tests are recognised as slightly smaller than before.
Change the associated tests accordingly.
@PJBrs PJBrs marked this pull request as draft November 19, 2025 10:47
Default width, if not given or defined in a font resource, is
calculated either by averaging all font widths or, in the case
of unembedded fonts, calculated as the width of two spaces.
Previously, this width was part of a separate table in
_cmap.py. This patch changes the associated logic and instead
reads space width from CORE_FONT_METRICS, if a default width
is not given or found elsewhere.

Note that the patch removes the info for four space widths from
pypdf, i.e., the four Helvetica-Narrow fonts. However, these
are not among the fourteen Adobe standard fonts, so, they
should be entirely embedded (and have a default value or be
able to compute one by averaging all font widths).

Note that it's slightly inconsistent now, that CID fonts that
don't have a default width defined will need to get it from
the value set by the user or when, by accident, the have the
same name as one of the fourteen Adobe standard fonts, whereas
other embedded fonts get a default width equal to the average
character width, and the fourteen standard fonts get a default
of two space widths. However, this inconsistency was already
part of _cmap.py before.
@codecov
Copy link

codecov bot commented Nov 19, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.16%. Comparing base (e9e3735) to head (662e977).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3526      +/-   ##
==========================================
- Coverage   97.16%   97.16%   -0.01%     
==========================================
  Files          57       56       -1     
  Lines        9807     9803       -4     
  Branches     1780     1782       +2     
==========================================
- Hits         9529     9525       -4     
  Misses        167      167              
  Partials      111      111              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PJBrs PJBrs marked this pull request as ready for review November 19, 2025 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant