Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROB: Prevent excessive layout mode text output from Type3 fonts #3082

Merged
merged 5 commits into from
Jan 27, 2025

Conversation

shartzog
Copy link
Contributor

Partially addresses #3081 by checking for a '/ToUnicode' map in Type3 font dictionaries. If no such map is present, check to see if the font is using standard Adobe glyph names. If not, mark the font as 'uninterpretable' and prevent collection of text content from any text operations associated with the font.

Copy link

codecov bot commented Jan 27, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.50%. Comparing base (049f71e) to head (eb2f5a4).
Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3082      +/-   ##
==========================================
+ Coverage   96.48%   96.50%   +0.01%     
==========================================
  Files          52       52              
  Lines        8795     8807      +12     
  Branches     1608     1612       +4     
==========================================
+ Hits         8486     8499      +13     
+ Misses        184      183       -1     
  Partials      125      125              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@stefan6419846
Copy link
Collaborator

Thanks for the report and looking into it.

I assume that you own the necessary rights for us to be able to distribute the test file as part of the source code?

@shartzog
Copy link
Contributor Author

shartzog commented Jan 27, 2025

Good point. I'm not certain about the copyright details for the engine that created that document. The associated test is currently accessing it via its link in #3081 anyway, so there's no real reason to include it in the resources folder. Is the link in the issue an acceptable long term access option or should I make other arrangements (e.g. in samples)?

@stefan6419846
Copy link
Collaborator

Accessing using a link is perfectly fine for now. The alternative for specific files which fulfill CC-BY-SA-4.0 would be the https://github.com/py-pdf/sample-files repository, but we are currently not enforcing anything like this.

@stefan6419846 stefan6419846 merged commit 633d188 into py-pdf:main Jan 27, 2025
16 checks passed
stefan6419846 added a commit that referenced this pull request Feb 9, 2025
## What's new

### New Features (ENH)
- Handle attachments in /Kids and provide object-oriented API (#3108) by @stefan6419846

### Bug Fixes (BUG)
- Handle annotations being None on merging (#3111) by @stefan6419846

### Robustness (ROB)
- Prevent excessive layout mode text output from Type3 fonts (#3082) by @shartzog

### Documentation (DOC)
- stefan6419846 becomes BDFL of pypdf (#3078) by @MartinThoma

### Developer Experience (DEV)
- Remove ignoring multiple Ruff rules by @j-t-1
- Remove unused mutmut configuration (#3092) by @stefan6419846

### Testing (TST)
- Fix warning assertions to use `pytest.warns()` (#3083) by @mgorny

[Full Changelog](5.2.0...5.3.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants