Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct 2 x grammar rules for compilation unit name in #line #1120

Open
wants to merge 1 commit into
base: draft-v8
Choose a base branch
from

Conversation

RexJaeschke
Copy link
Contributor

Good catch, @logeshkumars0604!

This PR addresses Issue #1118.

The " vs. # error was introduced in V6 when we converted to the ANTLR grammar notation.

The ability to have an empty filename was never tested, but as you point out, it does work, so I have made the name contents zero-or-more characters, instead of one-or-more. I tested this using #line 300 "" along with CallerFilePathAttribute.

@RexJaeschke RexJaeschke added the type: bug The Standard does not describe the language as intended or implemented label May 25, 2024
@RexJaeschke RexJaeschke self-assigned this May 25, 2024
Copy link
Contributor

@Nigel-Ecma Nigel-Ecma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably need to expand the changes here, or spin of a new issue/PR to fix more.

@@ -1488,12 +1488,12 @@ fragment PP_Line_Indicator
;

fragment PP_Compilation_Unit_Name
: '"' PP_Compilation_Unit_Name_Character+ '"'
: '"' PP_Compilation_Unit_Name_Character* '"'
Copy link
Contributor

@Nigel-Ecma Nigel-Ecma May 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check this is by design before making this change. Regardless of the answer there is probably some work to do:

  • §14.2 Compilation units defines a compilation unit, the definition does not include them having names… or being files… However the example (non-normative) uses two files A.cs and B.cs and refers to them as two compilation units.
  • §22.5.6.3 The CallerFilePath attribute which provides the file path (which is implementation-dependent) states “The file path may be affected by #line directives ([§6.5.8]”.
  • Here in §6.5.8 the #line allows the setting of the “compilation unit name”
  • So:
    • §6.5.8 states compilation units have names, which is omitted in §14.2; and
    • §22.5.6.3 tells us that the name is the file path, but leaves what that is implementation-dependent

What already exists isn’t overly clear and this change seeks to allow the compilation unit name to be the empty string, which is probably not a valid implementation-dependent path on any implementation… So if this
observed compiler behaviour is by design then it surely needs to have a defined meaning in the Standard.

If this change is to be made, and even if not, this all needs to tided up – either in this PR or spin it all off into a new one.


I might ask what the intended use of an empty file path/compilation unit name is but I might know – it was requested by the NSA so that the file names in NSA distributed software is not leaked and so endanger National Security! I’m only partially joking here, but that’s a story for another time ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BillWagner told me that allowing an empty string was indeed a conscious decision, so I propose keeping that edit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nigel-Ecma

§6.1 Programs states:

A C# program consists of one or more source files, known formally as compilation units (§14.2). Although a compilation unit might have a one-to-one correspondence with a file in a file system, such correspondence is not required.

I propose appending to this, the following:

… As such, the accepted spelling of a compilation unit name, and its mapping, if any, to a filename is outside the scope of this specification.

I'm deliberately avoiding using any of the following terms:

  • behavior, implementation-defined – unspecified behavior where each implementation documents how the choice is made
  • behavior, undefined – behavior, upon use of a non-portable or erroneous construct or of erroneous data, for which this specification imposes no requirements
  • behavior, unspecified – behavior where this specification provides two or more possibilities and imposes no further requirements on which is chosen in any instance

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with that extra text - Rex, do you think it's worth adding that to this PR, so we can merge it all in one go?

;

fragment PP_Compilation_Unit_Name_Character
// Any Input_Character except "
: ~('\u000D' | '\u000A' | '\u0085' | '\u2028' | '\u2029' | '#')
: ~('\u000D' | '\u000A' | '\u0085' | '\u2028' | '\u2029' | '"')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The # was a typo so should be fixed…

However §22.5.6.3 defines the format of the compilation unit name/file path as “implementation-dependent”. So this section might need a semantic rule saying this arbitrary string must conform to the same implementation-dependent rules §22.5.6.3, or that is does not need to (i.e. not be valid as a file path).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... I'm not sure. It feels like it would be okay to allow values which aren't valid filenames, when specifying the compilation unit name directly in code, even if the CallerFilePathAttribute could never automatically generate such a name. (Indeed, I can see some cases where that would even be useful!)

I think this is speaking in favor of having a semantic rule saying "that it does not need to".

Anecdotally, I observe that Roslyn is okay with this:

#line 100 ":invalid:"

and even:

#line 100 ".."

(Interestingly, for the latter, it reports any subsequent error as belonging to the parent directory of the directory containing the file - it doesn't report it as ".." verbatim...)

Copy link
Contributor

@jskeet jskeet Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only just tried code with the former (":invalid:") with code containing errors after it: Roslyn crashes:

error MSB6006: "csc.dll" exited with code 1.

(But if there are no warnings/errors that need reporting, it's fine...)

@BillWagner
Copy link
Member

closing and reopening to rerun the CI builds.

@BillWagner BillWagner closed this May 28, 2024
@BillWagner BillWagner reopened this May 28, 2024
@RexJaeschke
Copy link
Contributor Author

6.5.8 Line directives, states

Line directives may be used to alter the line numbers and compilation unit names that are reported by the compiler in output such as warnings and errors.

Note the presence of "reported": there is no requirement that such a reported name map to anything in a file system, so why not allow an empty string name?

@RexJaeschke RexJaeschke added the meeting: discuss This issue should be discussed at the next TC49-TG2 meeting label Aug 12, 2024
@RexJaeschke
Copy link
Contributor Author

From the 2024-09-04 TG2 call:

There appear to be 2 issues:

  1. Conflicting text for the existing grammar.
  2. Whether we want to allow an empty string. Just because Roslyn allows it is not necessarily a reason for the spec to do so.

After a short discussion, Rex agreed to take another look.

@RexJaeschke RexJaeschke removed the meeting: discuss This issue should be discussed at the next TC49-TG2 meeting label Sep 5, 2024
@Playgirlkaybraz11 Playgirlkaybraz11 mentioned this pull request Sep 6, 2024
@RexJaeschke RexJaeschke added the meeting: discuss This issue should be discussed at the next TC49-TG2 meeting label Nov 4, 2024
@RexJaeschke
Copy link
Contributor Author

Having looked at this again, I don't have more to add, and I've re-added the "Meeting discuss" label.

@jskeet jskeet added this to the C# 8.0 milestone Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meeting: discuss This issue should be discussed at the next TC49-TG2 meeting type: bug The Standard does not describe the language as intended or implemented
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants