Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should attached modifier cares kind of escaped character? #28

Open
boltlessengineer opened this issue Nov 18, 2023 · 11 comments
Open

Should attached modifier cares kind of escaped character? #28

boltlessengineer opened this issue Nov 18, 2023 · 11 comments

Comments

@boltlessengineer
Copy link

In norg syntax, \a is equivalent to a. But if \a is just normal character,

\a*bold*

should not be a valid bold.

And if example above is not valid,

*bold\**

This must also be invalid.

This is tricky for parser because parser work at the token level.

@mrossinek
Copy link
Member

I agree that the first example is invalid because the opening attached modifier does not have whitespace preceding it.
But the second example is valid since the closing modifier is valid.

@boltlessengineer
Copy link
Author

If first example is valid, it means one of these:

  • we see \ and a as two separate nodes. so * at 3rd column is invalid bold opener because it comes after the word character a
  • we see \a as single node, but escaped word characters behave differently from escaped punctuations like \?

In first case, second example should be invalid bold because ** is invalid bold closer.
I really don’t like the second case; two different type of escaped character nodes.

@vhyrro
Copy link
Member

vhyrro commented Jul 25, 2024

what mrossinek proposes is sensible here. From the parser's perspective, this may indeed mean that it has to differentiate escaped punctuation versus escaped characters, but our rules in the spec are pretty explicit. They discuss the immediate previous and next characters, so the parser has to "figure it out" when it comes to escape sequences.

On a side note, now that super verbatim could exist, what are your thoughts on escape characters in general? They're nice and convenient, but could they be superceded by some other syntax?

@boltlessengineer
Copy link
Author

Spec also says that repeated * is not a valid open/close modifier. So if first example is valid, second example should not be valid unless we treat \a and \* as different types of escaped modifiers.

Two or more consecutive attached modifiers of the same type (i.e. **, // etc.) should be instantly "disqualified" and parsed as raw text in all circumstances and without any exceptions.

My thoughts on escape characters haven’t changed from start. They should take precedence over all grammars except free-form (currently “super verbatim” and “verbatim ranged tag”.)

@mrossinek
Copy link
Member

mrossinek commented Jul 25, 2024

Maybe I need to be more explicit. First, let me paraphrase the rules from the spec:

  • an opening attached modifier must have whitespace or punctuation in front of it and no whitespace after it
  • a closing attached modifier reverses this: no whitespace in front and whitespace or punctuation after it

Given that, let us look at your first case:

\a*bold*

\a is an escaped "a" character, i.e. a "verbatim" "a". This is not whitespace or punctuation which means that the first * is not an opening modifier, thus rendering this example invalid.

The second case:

*bold\**
  • The opening modifier is fine.
  • Then we have the word bold. Nothing special going on here.
  • Then we have \* which is an escaped * character, i.e. a verbatim *.
  • Then we have the second *. In front of it is no whitespace and it has whitespace (a line break) after it. Thus, this is a valid closing modifier.
  • Therefore, this example is valid, and the contents of the bold segment should be "bold*"
  • Note: the repetition of * is not argument here, because the first one is escaped and has no effect on the second character.

To paraphrase: the backslash escapes whatever character comes next, therefore rendering it verbatim. No differentiation on whatever is escaped has to be done.

This escaping have higher precedence than all attached modifiers except the super verbatim suggested in #33. Otherwise writing inline math using LaTeX would be very cumbersome. (That however has some more discussion also here: #34 (comment)).

We might want to re-evaluate the precedence of the backslash w.r.t. to linkables.

@boltlessengineer
Copy link
Author

\a is an escaped "a" character, i.e. a "verbatim" "a". This is not whitespace or punctuation

So you are saying that parser should distinguish \a and \* as different types of detached modifiers?
Will *bold*\a also be invalid bold because \a is “verbatim a”?

In my view, \a is not a whitespace or a punctuation but neither a normal word character because it is “escaped”, so it will be highlighted as special character when rendered as raw content without concealing (e.g. @string.escape in Neovim.) Parser should not handle the final escaped output (a here), it should only see things as abstract objects ((escaped_sequence [0, 0] - [0, 2]).)

Having two different node types (escaped_word and escaped_punctuation) for escaped sequences sounds bit too much to me.


One possible solution to this would be disallowing escaped normal word character. Making \a invalid at first place.

@mrossinek
Copy link
Member

I am explicitly stating:

No differentiation on whatever is escaped has to be done.

An escaped character is just that: an escaped character. Any character can be escaped, whether that has any use, is another question, but not one that the spec should care about.

An escaped character is neither whitespace nor punctuation. Therefore, an escaped character:

  • can NOT occur in front of an opening attached modifier (because that would mean it is not opening)
  • can NOT occur after a closing attached modifier (because that would mean it is not closing)
  • it CAN occur in between attached modifiers because that is how you can insert e.g. a * character inside a bold segment: e.g. *my bold \* character*
  • it can NOT occur inside super verbatim (see [Suggestion]: Remove free-form attached modifiers #33)

@boltlessengineer
Copy link
Author

Oh I get it now. I haven’t thought like that. Sorry for misunderstanding.

@boltlessengineer
Copy link
Author

If escaped character cannot occur after a closing attached modifier, how can I write bold:word with only “bold” as bold and :word as literal characters?

*bold*:\word

Should I escape the w instead of : to prevent : parsed as a link modifier?

@mrossinek
Copy link
Member

Very simple:

*bold*:\:word
  • *bold* should be clear
  • using *: makes this a closing link modifier
  • then we escape a colon to make it verbatim: \:
  • and then we write word

The link modifier makes this possible because it is fine with not having whitespace after it. It may not even be necessary to escape the second colon character but I would have to double check that.

@boltlessengineer
Copy link
Author

In the case that the link modifier is opening (the attached modifier appears on the right):

  • The link modifier may only be preceded by a regular character (or, in other words, may not be preceded by a punctuation character nor by a whitespace character).
  • The link modifier may only be succeeded by an opening attached modifier.

In the case that the link modifier is closing (the attached modifier appears on the left):

  • The link modifier may only be preceded by a closing attached modifier.
  • The link modifier may only be succeeded by a regular character.

If the above conditions are not met, then the character should be treated as a literal :.

So you are going to change this spec and redefine the attached modifier opening/closing tokens.
Will *bold*: word be rendered as bold word instead of bold: word now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants