As discussed in #762, comrak uses amount of UTF-8 bytes for the value in LineColumn::column.
But if you want to use comrak to pinpoint the exact location in the source file, the user would expect the column to point at the readable character, not at the position of a UTF-8 byte.
When you have a Markdown file that contains only the 好 character (U+597D - 0xE5 0xA5 0xBD in UTF-8 bytes), comrak puts 1:1-1:3 as the sourcepos.
Contrary to that, if you open the same Markdown file in a text editor (e.g. Notepad++), the end column is at 2, and the position is 4. comrak treats the range as end-inclusive, so it puts 1:1-1:3 instead of 1:1-1:4 as the sourcepos.
My proposal is to create a parse option (disabled by default) which, when enabled, would make the column value in LineColumn character-based, not based on UTF-8 bytes.
That would make the Markdown example above have sourcepose 1:1-1:1 (as the end-inclusivity would be maintained).
As discussed in #762,
comrakuses amount of UTF-8 bytes for the value inLineColumn::column.But if you want to use
comrakto pinpoint the exact location in the source file, the user would expect thecolumnto point at the readable character, not at the position of a UTF-8 byte.When you have a Markdown file that contains only the
好character (U+597D -0xE5 0xA5 0xBDin UTF-8 bytes),comrakputs1:1-1:3as the sourcepos.Contrary to that, if you open the same Markdown file in a text editor (e.g. Notepad++), the end column is at 2, and the position is 4.
comraktreats the range as end-inclusive, so it puts1:1-1:3instead of1:1-1:4as the sourcepos.My proposal is to create a parse option (disabled by default) which, when enabled, would make the column value in
LineColumncharacter-based, not based on UTF-8 bytes.That would make the Markdown example above have sourcepose
1:1-1:1(as the end-inclusivity would be maintained).