Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raw lifetimes #1603

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

compiler-errors
Copy link
Member

src/tokens.md Outdated Show resolved Hide resolved
@mattheww
Copy link
Contributor

mattheww commented Sep 9, 2024

The

(not immediately followed by ')

bit is there in LIFETIME_OR_LABEL as a way to say that 'xxx'yyy is rejected (rather than being interpreted as two lifetimes).

But (after rust-lang/rust#126452) 'r#xxx'yyy is interpreted as two lifetimes. So I think that bit should be left out of the RAW_LIFETIME rule.

(There's no need to say anything special to indicate that 'r#kw' is rejected: that's true because ' on its own isn't a token, and character literals can't have more than one character between the quotes.)

Alternatively, it might be worth considering changing the implementation to reject 'r#xxx'yyy for consistency with the non-raw case.

Note that allowing that form would close the door to delaying the rejection of overlong character literals to post-expansion, which was being considered late last year at rust-lang/rust#118699 .

Copy link
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update https://doc.rust-lang.org/nightly/reference/tokens.html#reserved-prefixes to include a new rule for the reserved prefix lifetime? I would assume it is something like

`'` (IDENTIFIER_OR_KEYWORD | _) `#`

And update the examples in the "Edition differences" in that section to include reserved lifetime prefixes.

Can you add a "Raw lifetime or label" section similar to https://doc.rust-lang.org/nightly/reference/identifiers.html#raw-identifiers that explains what the raw lifetime means?

Can you add an "Edition differences" block in the "Lifetimes and loop labels" section that explains that raw lifetimes are not supported before 2021? There are some examples within the tokens.md chapter of "Edition differences" blocks for the kind of wording to use.

src/tokens.md Show resolved Hide resolved
src/tokens.md Show resolved Hide resolved
src/tokens.md Outdated Show resolved Hide resolved
@ehuss ehuss added the S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. label Sep 10, 2024
@ehuss
Copy link
Contributor

ehuss commented Oct 22, 2024

@compiler-errors Just checking if you'll be able to look at the requested changes?

@compiler-errors
Copy link
Member Author

I believe I addressed the comments, but please look closely because markdown is not my thing.

@compiler-errors
Copy link
Member Author

Well, except for:

Alternatively, it might be worth considering changing the implementation to reject 'r#xxx'yyy for consistency with the non-raw case.

Which I think I will open a rustc PR to implement.

@compiler-errors
Copy link
Member Author

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: The marked PR is awaiting review from the PR author. and removed S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. labels Oct 30, 2024
src/tokens.md Outdated
>    | RAW_LIFETIME
>
> RAW_LIFETIME :\
>    `'r#` [IDENTIFIER_OR_KEYWORD][identifier]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should also include _, right? That is, 'r#_ is valid, correct?

I also just want to double-check, it is intentional that 'r#_ is allowed as a loop label? (I only ask because '_ is explicitly not allowed).

Suggested change
>    `'r#` [IDENTIFIER_OR_KEYWORD][identifier]
>       `'r#` [IDENTIFIER_OR_KEYWORD][identifier]
> _(not immediately followed by `'`)_\
>    | `'r#_`

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is likely something else we want to deny here. I feel like 'r#_ is invalid, or at least not something we need to support initially.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rust-lang/rust#132363

Hopefully resolved by 912a6d5

@mattheww
Copy link
Contributor

Do we need changes outside the Lexical structure chapter?

As far as I can see nothing is saying that r#a and 'a are treated as equivalent when used as a lifetime or label.

That is, I don't think the Reference is saying that these are now accepted:

fn foo<'r#a>(s: &'a str) {}
    'r#a: { break 'a; }

@@ -851,7 +866,8 @@ r[lex.token.reserved-prefix.syntax]
> **<sup>Lexer 2021+</sup>**\
> RESERVED_TOKEN_DOUBLE_QUOTE : ( IDENTIFIER_OR_KEYWORD <sub>_Except `b` or `c` or `r` or `br` or `cr`_</sub> | `_` ) `"`\
> RESERVED_TOKEN_SINGLE_QUOTE : ( IDENTIFIER_OR_KEYWORD <sub>_Except `b`_</sub> | `_` ) `'`\
> RESERVED_TOKEN_POUND : ( IDENTIFIER_OR_KEYWORD <sub>_Except `r` or `br` or `cr`_</sub> | `_` ) `#`
> RESERVED_TOKEN_POUND : ( IDENTIFIER_OR_KEYWORD <sub>_Except `r` or `br` or `cr`_</sub> | `_` ) `#`\
> RESERVED_TOKEN_LIFETIME : `'` (IDENTIFIER_OR_KEYWORD <sub>_Except `r`_</sub> | _) `#`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is isn't reserving 'r# on its own, which rustc rejects but could lex as 'r followed by #.

But I'm not sure it's worth trying to get all the RESERVED_ rules exactly right at this stage.

@ehuss
Copy link
Contributor

ehuss commented Oct 30, 2024

Posted rust-lang/edition-guide#330 as the companion for the edition.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Nov 1, 2024
…r=chenyukang

Reject raw lifetime followed by `'`, like regular lifetimes do

See comment. We want to reject cases like `'r#long'id`, which currently gets interpreted as a raw lifetime (`'r#long`) followed by a lifetime (`'id`). This could have alternative lexes, such as an overlong char literal (`'r#long'`) followed by an identifier (`id`). To avoid committing to this in any case, let's reject the whole thing.

`@mattheww,` is this what you were looking for in rust-lang/reference#1603 (comment)? I'd say ignore the details about the specific error message (the fact that this gets reinterpreted as a char literal is 🤷), just that because this causes a lexer error we're effectively saving syntactical space like you wanted.
@ehuss
Copy link
Contributor

ehuss commented Nov 5, 2024

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. and removed S-waiting-on-review Status: The marked PR is awaiting review from the PR author. labels Nov 5, 2024
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Nov 9, 2024
…r=chenyukang

Reject raw lifetime followed by `'`, like regular lifetimes do

See comment. We want to reject cases like `'r#long'id`, which currently gets interpreted as a raw lifetime (`'r#long`) followed by a lifetime (`'id`). This could have alternative lexes, such as an overlong char literal (`'r#long'`) followed by an identifier (`id`). To avoid committing to this in any case, let's reject the whole thing.

`@mattheww,` is this what you were looking for in rust-lang/reference#1603 (comment)? I'd say ignore the details about the specific error message (the fact that this gets reinterpreted as a char literal is 🤷), just that because this causes a lexer error we're effectively saving syntactical space like you wanted.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Nov 9, 2024
…=wesleywiser

Enforce that raw lifetimes must be valid raw identifiers

Make sure that the identifier part of a raw lifetime is a valid raw identifier. This precludes `'r#_` and all module segment paths for now.

I don't believe this is compelling to support. This was raised by `@ehuss` in rust-lang/reference#1603 (comment) (well, specifically the `'r#_` case), but I don't see why we shouldn't just make it consistent with raw identifiers.
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Nov 9, 2024
Rollup merge of rust-lang#132363 - compiler-errors:raw-lt-id-valid, r=wesleywiser

Enforce that raw lifetimes must be valid raw identifiers

Make sure that the identifier part of a raw lifetime is a valid raw identifier. This precludes `'r#_` and all module segment paths for now.

I don't believe this is compelling to support. This was raised by `@ehuss` in rust-lang/reference#1603 (comment) (well, specifically the `'r#_` case), but I don't see why we shouldn't just make it consistent with raw identifiers.
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Nov 9, 2024
Rollup merge of rust-lang#132341 - compiler-errors:raw-lt-prefix-id, r=chenyukang

Reject raw lifetime followed by `'`, like regular lifetimes do

See comment. We want to reject cases like `'r#long'id`, which currently gets interpreted as a raw lifetime (`'r#long`) followed by a lifetime (`'id`). This could have alternative lexes, such as an overlong char literal (`'r#long'`) followed by an identifier (`id`). To avoid committing to this in any case, let's reject the whole thing.

`@mattheww,` is this what you were looking for in rust-lang/reference#1603 (comment)? I'd say ignore the details about the specific error message (the fact that this gets reinterpreted as a char literal is 🤷), just that because this causes a lexer error we're effectively saving syntactical space like you wanted.
mati865 pushed a commit to mati865/rust that referenced this pull request Nov 12, 2024
…r=chenyukang

Reject raw lifetime followed by `'`, like regular lifetimes do

See comment. We want to reject cases like `'r#long'id`, which currently gets interpreted as a raw lifetime (`'r#long`) followed by a lifetime (`'id`). This could have alternative lexes, such as an overlong char literal (`'r#long'`) followed by an identifier (`id`). To avoid committing to this in any case, let's reject the whole thing.

`@mattheww,` is this what you were looking for in rust-lang/reference#1603 (comment)? I'd say ignore the details about the specific error message (the fact that this gets reinterpreted as a char literal is 🤷), just that because this causes a lexer error we're effectively saving syntactical space like you wanted.
mati865 pushed a commit to mati865/rust that referenced this pull request Nov 12, 2024
…=wesleywiser

Enforce that raw lifetimes must be valid raw identifiers

Make sure that the identifier part of a raw lifetime is a valid raw identifier. This precludes `'r#_` and all module segment paths for now.

I don't believe this is compelling to support. This was raised by `@ehuss` in rust-lang/reference#1603 (comment) (well, specifically the `'r#_` case), but I don't see why we shouldn't just make it consistent with raw identifiers.
@ehuss
Copy link
Contributor

ehuss commented Nov 14, 2024

@compiler-errors I pushed a change for the validation, and also rebased since we made a slight change in the definition of the lifetime token in #1668.

Please let me know if you think the current version looks good.

These are rejected by the lexer.
@ehuss
Copy link
Contributor

ehuss commented Nov 15, 2024

I pushed a commit specifying that 'r#_ will generate an error (and similarly for r#_).

This is a bit of an awkward issue around whether or not _ is an identifier. Currently the reference does not define it that way (intentionally), even though internally rustc allows that (but quickly rejects it). IIRC, there are some issues around that and proc-macros, but I did not look into that.

I did not want to define an identifier token that allowed _, because then I would have to define separate identifier tokens that exclude it, and this _ identifier would only be used by these raw tokens.

@compiler-errors
Copy link
Member Author

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants