(Proposal) Parse option to make sourcepos column char-based

As discussed in https://github.com/kivikakk/comrak/discussions/762, `comrak` uses amount of UTF-8 bytes for the value in [`LineColumn::column`](https://docs.rs/comrak/latest/comrak/nodes/struct.LineColumn.html#structfield.column).

But if you want to use `comrak` to pinpoint the exact location in the source file, the user would expect the `column` to point at the readable character, not at the position of a UTF-8 byte.

When you have a Markdown file that contains only the `好` character (U+597D - `0xE5 0xA5 0xBD` in UTF-8 bytes), `comrak` puts `1:1-1:3` as the sourcepos.

Contrary to that, if you open the same Markdown file in a text editor (e.g. Notepad++), the end column is at 2, and the position is 4. `comrak` treats the range as end-inclusive, so it puts `1:1-1:3` instead of `1:1-1:4` as the sourcepos.

<img width="600" height="257" alt="Image" src="https://github.com/user-attachments/assets/fb58cfbc-0db6-480c-ad83-c4053020e2b2" />
<br/>
<br/>

My proposal is to **create a parse option** (disabled by default) which, when enabled, would make the column value in `LineColumn` **character-based**, not based on UTF-8 bytes.

That would make the Markdown example above have sourcepose `1:1-1:1` (as the end-inclusivity would be maintained).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(Proposal) Parse option to make sourcepos column char-based #777

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

(Proposal) Parse option to make sourcepos column char-based #777

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions