Skip to content

Commit

Permalink
fix to iconv() illegal character error (issue #549) (#580)
Browse files Browse the repository at this point in the history
* fix to iconv() illegal character error (issue #549)

* display warnings in PHPUnit; added incomplete test to demonstrate fix

* revert --display-notices in Makefile

because it fails in versions < 10

* Fixed coding style problem

* FontTest.php: finalized test which triggers notice when don't using the fix

---------

Co-authored-by: Konrad Abicht <[email protected]>
  • Loading branch information
Stasky745 and k00ni authored Apr 24, 2023
1 parent ced49b8 commit 9094d77
Show file tree
Hide file tree
Showing 2 changed files with 32 additions and 1 deletion.
2 changes: 1 addition & 1 deletion src/Smalot/PdfParser/Font.php
Original file line number Diff line number Diff line change
Expand Up @@ -603,7 +603,7 @@ private function decodeContentByEncodingElement(string $text, Element $encoding)
// so we use iconv() here
$iconvEncodingName = $this->getIconvEncodingNameOrNullByPdfEncodingName($pdfEncodingName);

return $iconvEncodingName ? iconv($iconvEncodingName, 'UTF-8', $text) : null;
return $iconvEncodingName ? iconv($iconvEncodingName, 'UTF-8//TRANSLIT//IGNORE', $text) : null;
}

/**
Expand Down
31 changes: 31 additions & 0 deletions tests/PHPUnit/Integration/FontTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -439,4 +439,35 @@ public function testCalculateTextWidth(): void
$this->assertEquals(2573, $width);
$this->assertEquals([], $missing);
}

/**
* Check behavior if iconv function gets input which contains illegal characters.
*
* In this test we create a CP1252-encoded string, which contains a character that has no counterpart in UTF-8.
* This way we check if the old code triggers the expected warning:
*
* iconv(): Detected an illegal character in input string
*
* Note: Don't use PHPUnit 10+, because it will hide the warning.
*
* A list of invalid characters can be found here:
* https://www.ibm.com/docs/en/rational-synergy/7.2.1?topic=uc-text-encoding-illegal-character-detection-tool
*
* @see https://github.com/smalot/pdfparser/pull/549
* @see https://github.com/smalot/pdfparser/pull/580
*/
public function testDecodeContentIssue549(): void
{
/*
* we do this to get into the branch with private method "decodeContentByEncodingElement" in Font.php
*/
$encoding = $this->createMock(Element::class);
$encoding->method('getContent')->willReturn('WinAnsiEncoding');
$header = new Header(['Encoding' => $encoding]);

$font = new Font($this->createMock(Document::class), $header);

// check result
$this->assertEquals('foobar-', $font->decodeContent("foobar-\x8D"));
}
}

0 comments on commit 9094d77

Please sign in to comment.