Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems caused by non-canonical PSL file paths of macro calls in #included files #604

Open
mattmccutchen-cci opened this issue Jun 8, 2021 · 0 comments

Comments

@mattmccutchen-cci
Copy link
Member

I'm filing an issue that I believe is extremely rare and probably not worth fixing on its own, but I think it's valuable to record what I've learned, particularly so that I can refer to it in documenting what #598 does and does not fix.

The root cause of #529 was that for some reason, the following code in PersistentSourceLoc isn't able to get a canonical file path from Clang for code inside macro calls:

FullSourceLoc TFSL(SR.getBegin(), SM);
if (TFSL.isValid()) {
const FileEntry *Fe = SM.getFileEntryForID(TFSL.getFileID());
std::string FeAbsS = Fn;
if (Fe != nullptr) {
// Unlike in `emit` in RewriteUtils.cpp, we don't re-canonicalize the file
// path because of the potential performance cost (mkPSL is called on many
// AST nodes in each translation unit) and because we don't have a good
// way to handle errors. If there is a problem, `emit` will detect it
// before we actually write a file.
FeAbsS = Fe->tryGetRealPathName().str();
}
Fn = std::string(sys::path::remove_leading_dotslash(FeAbsS));
}

In those cases, we fall back to the original SM.getPresumedLoc(SL).getFilename(), which I'll call the "plain" path.

According to my tests, for the main source file, the plain path just comes from the compiler command line of the ToolAction, which in turn comes from the compilation database or (if a compilation database is not used) from the SourceFiles variable in 3C.cpp but may be affected by ArgumentsAdjusters. For an #included file, the plain path is the concatenation of the -I path and the path in the #include directive. (When an #include is implicitly resolved relative to the file in which it appears, the dirname of the plain path of that file plays the role of the -I path.) However, due to a cache in the FileManager (see FileManager::UniqueRealFiles), if the same 3C run opens a file with the same inode number several times (via symlinks or probably even hard links), it always uses the same plain path: perhaps the first one that was determined according to the rules above.

However, in general, the plain path is not guaranteed to be the same as the canonical path. We currently do two things to address that problem:

However, the plain and canonical paths can still differ in one case: an #include where the portion of the path in the #include directive (as opposed to the -I option) is non-canonical. It seems that I overlooked this case while working on #532. This can potentially have several bad effects (that I can think of so far):

One complete fix would be to have the PersistentSourceLoc code quoted above canonicalize the file path itself, as I contemplated in the comment quoted above. If we can find a way to ensure that Clang always gives us a canonical path in the first place, that might be preferable, but it might be hard to be confident that we've handled all cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant