Skip to content
This repository has been archived by the owner on Nov 28, 2019. It is now read-only.

Some UTF-8 characters are not excepted by XML #59

Open
dnaber-de opened this issue Sep 21, 2016 · 0 comments
Open

Some UTF-8 characters are not excepted by XML #59

dnaber-de opened this issue Sep 21, 2016 · 0 comments
Labels

Comments

@dnaber-de
Copy link
Member

dnaber-de commented Sep 21, 2016

I discovered some validation errors of exported XML files with Cyrillic characters like

FATAL(63) at line 51888:435: CData section not finished
Белоснежные пляжи, очарова
FATAL(9) at line 51888:435: PCDATA invalid Char value 31
FATAL(26) at line 51896:7: Entity 'nbsp' not defined
FATAL(26) at line 51910:7: Entity 'nbsp' not defined
FATAL(26) at line 51916:7: Entity 'nbsp' not defined
FATAL(26) at line 51922:7: Entity 'nbsp' not defined
FATAL(26) at line 51928:7: Entity 'nbsp' not defined
FATAL(26) at line 51934:7: Entity 'nbsp' not defined
FATAL(26) at line 51940:7: Entity 'nbsp' not defined
FATAL(62) at line 51942:57: Sequence ']]>' not allowed in content
FATAL(1) at line 51942:57: internal error: detected an error in element content

Some valid UTF-8 characters seems to be unaccepted by XML: http://stackoverflow.com/a/12265956/2169046
These should be striped.

However, the exporter should be refactored to use XMLWriter instead of concatenating stings. Maybe the writer is aware of these invalid characters and handles them properly on its own or returns an error on export.

@dnaber-de dnaber-de added the bug label Sep 21, 2016
@dnaber-de dnaber-de changed the title Some UTF-8 characters corrupt the XML Some UTF-8 characters are not excepted by XML Sep 21, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant