Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Code Issue 157: Add "escape invisible characters" option #38

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

gsnedders
Copy link
Member

Reported by @fantasai, Jul 27, 2010
Having invisible characters in the source code can be confusing to someone who's trying to figure out what's going on. Adding an escape_invisible option would make those characters visible in the source code.

The attached patch implements an escape_invisible option. The list of invisible characters is probably incomplete for this iteration, but you get the idea. It depends on the patch in issue 156 .

@ghost ghost assigned gsnedders May 4, 2013
@gsnedders gsnedders modified the milestones: 1.1, 0.99999999 May 8, 2016
@gsnedders gsnedders removed this from the 0.99999999 milestone May 20, 2016
@gsnedders gsnedders removed their assignment Sep 1, 2017
@gsnedders
Copy link
Member Author

gsnedders commented Sep 1, 2017

My preference would be something based on unicodedata and blacklisting General Category C* (though that has the problem that you'll end up blacklisting different sets of characters depending on the Python version and the Unicode version, and generating that set is expensive and hence likely should be precomputed at dist build-time, and likely needs to be represented as a segment tree rather than a set of millions of characters out of concern for memory consumption).

We also need to be careful on narrow Python builds and make sure we don't encode surrogate pairs, as \uD800\uDC00 needs to end up unchanged.

It's also notable that AFAICT the origin reason for this patch no longer holds true (the CSS testsuite build system is basically a historical artefact now and hasn't used an html5lib fork with this for years), though as #197 shows other people do care.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants