Better behavior with unchanged files #328

mattmccutchen-cci · 2020-11-05T04:59:43Z

Normally, 3c prints the rewritten version of a file foo.c to standard output, or if -output-postfix=checked is passed, the rewritten version is written to foo.checked.c. However, if the rewritten version is identical to the original, then:

Without -output-postfix, the output to standard output is empty. This edge case is bound to cause scripts to behave unexpectedly if they think the result of rewriting is actually an empty file. John pointed out that many of the regression tests take advantage of this behavior to assert that a file needs no rewriting by checking that the output is empty (look for count 0), but we think the tests could just as easily (and more clearly!) cmp to the original file.
With -output-postfix, the foo.checked.c file is not created. Hopefully scripts not designed for this case will just error out instead of doing something wrong. Some users might consider this useful and there might be a performance benefit in some workflows, but there is at least one other reason why a given file might not be written (-base-dir), so ~~if we keep the behavior, we should tell the user why a given file wasn't written.~~ Update 2021-02-13: We've deprioritized that part because it's now an error if 3C generates a change to a file outside the base dir (Properly constrain non-canWrite files to not change (remaining cases) #387), so we're not aware of any other common reason for not writing a file that would need to be distinguished from the file being unchanged; if one arises, we'll try to make sure that that one generates a distinctive message. And the code that writes the files only has a list of modified rewrite buffers; I'm not aware of an easy way to get a list of files that were read but not modified.

This issue is to decide whether to stop treating unchanged files specially in one or both cases (or maybe make it an option), and if we keep the special treatment with -output-postfix, provide better diagnostics.

Note that even if we write unchanged files passed on the command line (by default or as an option), we might not want to write unchanged files included from the -base-dir, at least without an additional option.

The text was updated successfully, but these errors were encountered:

mattmccutchen-cci · 2021-01-06T03:29:43Z

Today I learned that there's at least one case in which 3c prints the rewritten version even when it is identical to the original: see this RUN command in the definedType test.

mattmccutchen-cci · 2021-01-13T21:48:20Z

Today I learned that there's at least one case in which 3c prints the rewritten version even when it is identical to the original: see this RUN command in the definedType test.

I asked in today's meeting if anyone had more information about this. Mike said the inconsistent behavior should be considered a bug. But if we decide to always output unchanged files, we wouldn't have to do the work to investigate where the current inconsistency is coming from. Should we do that or is it useful to not output unchanged files? (In terms of workflow, do users like to use -output-postfix and run git status and look for untracked files, or would they be just as happy diffing against the output dir once #347 is implemented or using a tool that moves all the written files into place and runs a git diff?) @mwhicks1 suggested I file a new issue, but I think it makes sense to just discuss the topic in this one.

mwhicks1 · 2021-01-14T12:56:01Z

I would be OK with outputting the new file every time. We could also output a diagnostic under a "verbose" flag that says the content of the file was not changed. This diagnostic can be ignored most of the time (i.e., the flag is not on by default).

mattmccutchen-cci · 2021-01-14T13:25:49Z

Thanks. Per the last sentence that I recently added to the initial comment: are you OK with outputting all header files in the base dir (and thus allowed to be written by 3C) that are transitively included from the files on the command line, even if there are no changes to make to them? That could be a huge number of unchanged files. But if the user has a script to diff or whatever, a larger number of files may not be a problem.

mwhicks1 · 2021-01-14T13:32:43Z

Hm, good point.

After a little more thought, I feel that this is not an issue worth pursuing. We are spending a lot of time on what seems to be to be a minor problem. Whether the file is output or not when there is no change, you have to deal with it in post-processing. To me, to avoid clutter to the file system, I lean to the current behavior.

Let's just implement the "output directory" feature and aim to only output files that changed, as we have now. If occasionally one is output that should not be, we can decide whether it rises to the level of a real problem and fix it then.

mattmccutchen-cci · 2021-01-14T14:36:25Z

Sounds reasonable for the modes that create files (-output-postfix and the forthcoming -output-dir). If our attitude is that for a file that needs no change, either creating it or not is a valid behavior, then I'll make the new regression test script (#355) accept both outcomes for all tests rather than hard-coding which tests trigger one or the other based on the current quirks of 3C.

For the mode that outputs to stdout, I still think it's bad that a script can suddenly start to silently exhibit incorrect behavior if the file is unchanged and the script assumes it is supposed to be empty. I might even be in favor of deprecating the mode that outputs to stdout altogether, since nearly all real users will have multi-file programs (which this mode can't handle in general) and the regression tests will no longer use this mode. But if you think this is low priority, let me know and we can mark the issue as such.

mwhicks1 · 2021-01-14T14:48:44Z

For the output mode to stdout, I can be persuaded that we should always output the result. It's surprising, when running this mode, and nothing is emitted, I'll agree.

mattmccutchen-cci · 2021-01-19T19:57:13Z

Updates: At Thursday's meeting, some team members were emphatic that:

We should keep stdout mode for quick tests even if it isn't very general, and the implementation should remain in the tool rather than a wrapper script: it will be more convenient for users, and we suspect a second code path in the tool may actually be easier to maintain than a wrapper script.
Stdout mode should have an option to solve for changes to multiple files but output only the new version of the main file, discarding the changes to the other files. I'm not convinced that this is a sane use case, but I'll defer to the team. The team allowed that we could require a flag to opt in to this arguably surprising behavior.

Outstanding design questions (stretching the scope of this issue a bit, but I think it's a lesser evil than filing a separate one):

What to do in stdout mode if the "solve for and discard changes to other files" flag isn't passed? Solve only for changes to the main file by default, or refuse to run unless the user passes a flag to choose one of the two behaviors? One possible design is to reuse the -base-dir flag (which controls what files we solve for) to distinguish the two behaviors. Currently, -base-dir always defaults to the working directory. We could change things so that in stdout mode, the default is to solve only for the main file, but if -base-dir is set (whether to the working directory or something else), we solve for all files under that directory and error if the main file isn't under the directory, then discard the changes to non-main files.

mwhicks1 · 2021-01-19T20:10:51Z

I think we care less about (2) than about (1). To me it's fine to just reject rewriting a program in stdout mode if included headers would also have to be changed. I think that addresses your issue?

mattmccutchen-cci · 2021-01-19T20:18:11Z

To me it's fine to just reject rewriting a program in stdout mode if included headers would also have to be changed. I think that addresses your issue?

Included headers never "have" to be changed: we can always constrain them to not change. (In the worst case, this will just cause the new version of the main file to be identical to the original.) To review, I think we're considering three modes here:

Solve only for changes to the main file, i.e., constrain all other files not to change. This seems like the most straightforward behavior to understand.
Solve for changes to all files under the -base-dir, but only output the new version of the main file and discard any changes to other files. I'm willing to implement this mode if other team members are convinced we should (as they seemed to be on Thursday).
How I interpret your last comment: Solve for changes to all files under the -base-dir but then error if we tried to change a file other than the main file. I don't see the point of this mode compared to mode 1.

mwhicks1 · 2021-01-19T20:41:24Z

For stdout mode, it should fail if you give it more than one file on the command line.

When you give it one file, the only question is files it #includes. Don't prevent these from changing through constraints. Rather, if you get to the end and it turns out that several files should change per the constraint solution (meaning they should be rewritten, normally) then fail. This is not the same as forcing included files not to rewrite but proceeding anyway, because you will get a different answer for your main file, potentially.

mattmccutchen-cci · 2021-01-19T22:21:54Z

We discussed the sub-modes of stdout mode in today's meeting: We don't want mode 2 after all, but we do want mode 3 for the case where the user wants to solve for all files under the -base-dir and believes that only one of them will change and wants to assert that that is indeed the case. Sounds reasonable to me; I'll implement it.

Remaining questions: Do we want to support mode 1? (I'm not clear if we previously made a decision about this or, if so, whether the decision stands.) If so, what flag(s) control the choice of mode, and what is the default if none of those flags are passed? In mode 3, is the default -base-dir still the working directory (as it is in non-stdout modes)?

mattmccutchen-cci · 2021-01-20T21:32:06Z

Update from today's meeting: Stdout mode will support only mode 3 and the default -base-dir will still be the working directory. That makes things simple. In all output modes, specifying a file on the command line that is outside the base dir will be an error, but the user may not have passed the -base-dir option and may not know what the base dir is, so we'll need the error message in that case to be good; we hope that case will be rare.

…nge. (#391) This PR addresses function and variable declarations (because they are the most obvious case and reasonably straightforward) and checked regions (because they came up in some existing regression tests). We'll leave #387 open for the tail of unhandled cases. Also: - When 3C tries to change a non-writable file, issue an error diagnostic (not an assertion failure because there are known unhandled cases) rather than silently discarding the change. - Add a `-dump-unwritable-changes` flag to the `3c` tool to dump the new version of the file when that diagnostic appears. - Add an error diagnostic when 3C tries to change a file under the base dir other than the main file in stdout mode. This is a separate feature (part of #328) but ended up being easy to implement along with the diagnostic for a non-writable file. - Add tests for all the fixes (but not `-dump-unwritable-changes`). - Fix a bug where `-warn-all-root-cause` didn't work without `-warn-root-cause`, because this affected one of the new tests. The use of `-warn-all-root-cause` without `-warn-root-cause` in the affected test will serve as a regression test for this fix as well. Fixes part of #387 and a few unrelated minor issues.

- Add -output-dir option to write updated files to a directory structure parallel to the base dir (#347). When -output-dir is used, a source file outside the base dir can't be handled because there is no way to compute its output path. For consistency, this is now an error even when -output-dir is not used. - Convert all 3C regression tests from -output-postfix to -output-dir to avoid leaving temporary files in the clang/test/3C directory (#378). - Expand "3c -help" documentation. In particular, direct the user to pass "--" when they don't want to use a compilation database to avoid accidentally using unwanted compiler options and suppress the warning if no compilation database is found (#343). - For consistency, have stdout mode output the main file even if it is unchanged (#328). - Fix bugs in matching of file paths against the base dir (#327). - Other minor bug fixes: see the pull request description for details. Co-authored-by: John Kastner <[email protected]>

mattmccutchen-cci mentioned this issue Dec 7, 2020

3C needs a better way to output multiple files. #347

Closed

kyleheadley added command-line enhancement New feature or request testing labels Dec 9, 2020

This was referenced Jan 22, 2021

Properly constrain non-canWrite files to not change (remaining cases) #387

Open

Constrain the most common constructs in non-writable files to not change. #391

Merged

mattmccutchen-cci self-assigned this Feb 2, 2021

mattmccutchen-cci mentioned this issue Feb 2, 2021

Add -output-dir and other file handling improvements #412

Merged

14 tasks

mattmccutchen-cci closed this as completed in #412 Feb 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better behavior with unchanged files #328

Better behavior with unchanged files #328

mattmccutchen-cci commented Nov 5, 2020 •

edited

Loading

mattmccutchen-cci commented Jan 6, 2021

mattmccutchen-cci commented Jan 13, 2021

mwhicks1 commented Jan 14, 2021

mattmccutchen-cci commented Jan 14, 2021

mwhicks1 commented Jan 14, 2021

mattmccutchen-cci commented Jan 14, 2021

mwhicks1 commented Jan 14, 2021

mattmccutchen-cci commented Jan 19, 2021 •

edited

Loading

mwhicks1 commented Jan 19, 2021

mattmccutchen-cci commented Jan 19, 2021

mwhicks1 commented Jan 19, 2021

mattmccutchen-cci commented Jan 19, 2021

mattmccutchen-cci commented Jan 20, 2021

Better behavior with unchanged files #328

Better behavior with unchanged files #328

Comments

mattmccutchen-cci commented Nov 5, 2020 • edited Loading

mattmccutchen-cci commented Jan 6, 2021

mattmccutchen-cci commented Jan 13, 2021

mwhicks1 commented Jan 14, 2021

mattmccutchen-cci commented Jan 14, 2021

mwhicks1 commented Jan 14, 2021

mattmccutchen-cci commented Jan 14, 2021

mwhicks1 commented Jan 14, 2021

mattmccutchen-cci commented Jan 19, 2021 • edited Loading

mwhicks1 commented Jan 19, 2021

mattmccutchen-cci commented Jan 19, 2021

mwhicks1 commented Jan 19, 2021

mattmccutchen-cci commented Jan 19, 2021

mattmccutchen-cci commented Jan 20, 2021

mattmccutchen-cci commented Nov 5, 2020 •

edited

Loading

mattmccutchen-cci commented Jan 19, 2021 •

edited

Loading