MAINT: Use CORE_FONT_METRICS for widths where possible #3526
+37
−259
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR tries to harmonise on using the Adobe standard fonts information in CORE_FONT_METRICS where possible and applicable.
Currently, pypdf collects character widths in three different places. First, there is build_font_width_map
pypdf/pypdf/_cmap.py
Line 402 in e9e3735
pypdf/pypdf/_text_extraction/_layout_mode/_font.py
Line 39 in e9e3735
This PR tries to make use of the data in CORE_FONT_METRICS in both build_font_width_map and the Font class post_init method, and addresses resulting changes in test results. As it is, this information was unavailable in cmap, and only partially used (incomplete) in the Font class post_init. In that respect, this PR means improved coverage of available font widths in these places. Also, the PR removes incomplete information where it was previously used.
It possibly is a first step towards harmonising the ways in which we collect character widths, which now still seem to differ in important ways between cmap and the Font class. Ultimately, I hope to arrive at one solution that either is part of the FontDescriptor dataclass or that can be used there.