-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDF Lang flag raises error #1502
Comments
Isn't the NUL after the DE more likely to be the culprit? |
You're right. Here's the same file with only the NUL removed which VeraPDF recognizes as valid. |
As an aside, in both p0001-fixed.pdf and p0001-no-null.pdf you actually have I.e. no space between Thus, you effectively only tested removing the NULL ;). Furthermore, you have changed the encoding of that string to UTF-16 while in the original file it was encoded in PDFDocEncoding. But that likely was not a relevant change. |
Interestingly, PDF and PDF/A are both vague about exactly how the RFC 5646 states, "Whitespace is not permitted in a language tag" in 2.1 Syntax, 2nd last paragraph (assuming NUL counts as whitespace). This is very buried so maybe this needs to be noted somewhere for PDF/A (and PDF/UA) devs? Leaving for @bdoubrov to decide in case he knows of some wording somewhere I have missed or past advice/discussions... |
"shall be" to me sounds like equality, i.e. no whitespace allowed. "shall contain" could have been argued to allow for whitespace. My 2c ;) |
This is more explicit in the RFC 4647, which is general syntax for Language Tags:
So, I think not permitting trailing NULL is very logical. In the end this is most likely non-intentional implementation issue. |
Where can we note this for posterity? |
We have been seeing a lot of error messages when validating the PDF/A in our archive with VeraPDF which other validators deem valid.
The error is from the /Lang tag if there is no space between /Lang and the language tag.
![Image](https://private-user-images.githubusercontent.com/89898442/409165312-2f1b95f7-d4ac-4c91-80c0-e92764797b0c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMTA3NzcsIm5iZiI6MTczOTAxMDQ3NywicGF0aCI6Ii84OTg5ODQ0Mi80MDkxNjUzMTItMmYxYjk1ZjctZDRhYy00YzkxLTgwYzAtZTkyNzY0Nzk3YjBjLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDEwMjc1N1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWRkMDgyMTdiMjZiNDg0ZjNiMWFjMmJkMDhhMDU1YTBkOWZiYjA0OWQ1NWNkOTcyMDYyOWM1NGU4ZDk3YjdmMDgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.f0-kugK7fp7uzA7wsWXgx3swsaVXb1tnYpCC7UVGQqo)
I was able to get the file validated correctly by adding a space
![Image](https://private-user-images.githubusercontent.com/89898442/409165390-02543f7b-7987-48eb-ab7f-1c0547cd68af.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMTA3NzcsIm5iZiI6MTczOTAxMDQ3NywicGF0aCI6Ii84OTg5ODQ0Mi80MDkxNjUzOTAtMDI1NDNmN2ItNzk4Ny00OGViLWFiN2YtMWMwNTQ3Y2Q2OGFmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDEwMjc1N1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWE4ZGRhMTcyZTRjMTljOWZiNmFmOTBlMWRhM2ZjNGFlZWFjNDNkNjEzYzg4ZGExMjk2MTgxZGE1YTM4M2VlYzYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.hpXzI3hgxGYF0lHlsrR1Y7OpM_keIXKWgdrylGrAgvE)
The PDF/A was created by Acrobat Distiller 9.3.0 (Windows) and you can find the original and the fixed documents attached.
Is this a bug which can get fixed in VeraPDF or do we have to fix the PDFs themselves?
p0001.pdf
p0001-fixed.pdf
The text was updated successfully, but these errors were encountered: