Skip to content

Conversation

@oleibman
Copy link
Collaborator

See Discussion 4724

PhpSpreadsheet converts all control characters (x00-x1f) in strings to and from a form which Excel recognizes (e.g. x1c becomes _x001C_ when writing, and vice versa when reading). There have historically been 3 exceptions which go unconverted - tab (x09), line feed (new line) (x0a), and carriage return (x0d). PR #4536 removed those exceptions, but that caused some problems; these were fixed by PR #4619, but the exceptions were restored.

The referenced discussion deals with a spreadsheet with a cell containing _x000D_, carriage return. Although the writer no longer converts to that string on output, the reader should be able to handle it on input. In fact, the reader ought to handle any string of the form "underscore x 4-hex-digits underscore", whether or not it represents a control character.

And there's an interesting edge case. If a user enters into a cell the string A_x0030_B, it needs to be handled as-is. Excel handles this by writing it out as A_x005F_x0030_B, i.e. substituting _x005F_ for the first underscore, so that the reader sees _x005F_ (converting it to underscore) followed by x0030_B (no leading underscore, so no conversion). PhpSpreadsheet could probably handle this by converting all underscores on write, but I am trying to emulate Excel and do it only when needed.

This is:

  • a bugfix
  • a new feature
  • refactoring
  • additional unit tests

Checklist:

  • Changes are covered by unit tests
    • Changes are covered by existing unit tests
    • New unit tests have been added
  • Code style is respected
  • Commit message explains why the change is made (see https://github.com/erlang/otp/wiki/Writing-good-commit-messages)
  • CHANGELOG.md contains a short summary of the change and a link to the pull request if applicable
  • Documentation is updated as necessary

See [Discussion 4724](PHPOffice#4724)

PhpSpreadsheet converts all control characters (x00-x1f) in strings to and from a form which Excel recognizes (e.g. `x1c` becomes `_x001C_` when writing, and vice versa when reading). There have historically been 3 exceptions which go unconverted - tab (x09), line feed (new line) (x0a), and carriage return (x0d). PR PHPOffice#4536 removed those exceptions, but that caused some problems; these were fixed by PR PHPOffice#4619, but the exceptions were restored.

The referenced discussion deals with a spreadsheet with a cell containing `_x000D_`, carriage return. Although the writer no longer converts to that string on output, the reader should be able to handle it on input. In fact, the reader ought to handle any string of the form "underscore x 4-hex-digits underscore", whether or not it represents a control character.

And there's an interesting edge case. If a user enters into a cell the string `A_x0030_B`, it needs to be handled as-is. Excel handles this by writing it out as `A_x005F_x0030_B`, i.e. substituting `_x005F_` for the first underscore, so that the reader sees `_x005F_` (converting it to underscore) followed by `x0030_B` (no leading underscore, so no conversion). PhpSpreadsheet could probably handle this by converting all underscores on write, but I am trying to emulate Excel and do it only when needed.
It is probably very anal of me to do this. Excel does it. I can't see it happening in the wild.
Php8.5 problem with iconv //IGNORE.
Make some properties protected rather than private.
@oleibman oleibman enabled auto-merge December 3, 2025 03:41
@oleibman oleibman added this pull request to the merge queue Dec 3, 2025
Merged via the queue into PHPOffice:master with commit 5106fac Dec 3, 2025
14 checks passed
@oleibman oleibman deleted the issue4724 branch December 3, 2025 03:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant