Slightly Better Support for Escaped Characters in Xlsx Reader/Writer #4726
+271
−40
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Discussion 4724
PhpSpreadsheet converts all control characters (x00-x1f) in strings to and from a form which Excel recognizes (e.g.
x1cbecomes_x001C_when writing, and vice versa when reading). There have historically been 3 exceptions which go unconverted - tab (x09), line feed (new line) (x0a), and carriage return (x0d). PR #4536 removed those exceptions, but that caused some problems; these were fixed by PR #4619, but the exceptions were restored.The referenced discussion deals with a spreadsheet with a cell containing
_x000D_, carriage return. Although the writer no longer converts to that string on output, the reader should be able to handle it on input. In fact, the reader ought to handle any string of the form "underscore x 4-hex-digits underscore", whether or not it represents a control character.And there's an interesting edge case. If a user enters into a cell the string
A_x0030_B, it needs to be handled as-is. Excel handles this by writing it out asA_x005F_x0030_B, i.e. substituting_x005F_for the first underscore, so that the reader sees_x005F_(converting it to underscore) followed byx0030_B(no leading underscore, so no conversion). PhpSpreadsheet could probably handle this by converting all underscores on write, but I am trying to emulate Excel and do it only when needed.This is:
Checklist: