Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing support for preprocessing large files #9 #40

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

aytey
Copy link
Contributor

@aytey aytey commented Mar 21, 2021

This PR implements support for generating a per-process temporary file for the purposes of preprocessing; this allows cnip to be run on large files as follows:

./cnip --cc-pp /tmp/moo/b.c

Currently, master will do this:

cnip: preprocessor invocation failed

if the file to be preprocessed is large -- this will typically happen if the file is larger than getconf ARG_MAX in bytes (well, minus a few for the arguments cnip passes in).

Currently, this code is not portable to Windows as getpid is POSIX (but there is https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/getpid?view=msvc-160)

Signed-off-by: Andrew V. Jones [email protected]

@ltcmelo
Copy link
Owner

ltcmelo commented Mar 21, 2021

The patch looks good to me @andrewvaughanj , but the build failed; once it's fixed I can merge the PR.

@aytey
Copy link
Contributor Author

aytey commented Mar 22, 2021

Oh, that's sad ... so even though pyschec is targetting C++17, and filesystem is in C++17, 18.04's GCC doesn't have enough support for it:

😞

We could just do what some other open-source projects do and say "use $TMPDIR if set, else hard-code /tmp" -- would you be happy to this?

FYI, I didn't overlook:

  • tmpfile -- but this returns a FILE* and we need the filename (I could read from /proc/<pid>, but that's 🤮)
  • tmpnam -- marked as unsafe and gives a build-time warning
  • mkstemp -- gives both the FILE* and the path, but needs the initial template including directory, so doesn't actually help.

If you're happy with $TMPDIR || "/tmp", I'll make the change.

@aytey
Copy link
Contributor Author

aytey commented Mar 22, 2021

TMPDIR + mkstemp implemented

I DON'T ACTUALLY THINK THIS WORKS -- executeCore DOES NOT RETURN ALL OF STDOUT (but that's not a bug in this PR, but there is a bug in executeCore)

Would help if I close the temporary file before calling the preprocessor 🤦 -- now fixed!

// mkstemp returns -1 on error ...
if (err == -1) {
// ... so do we
return std::make_pair(err, "");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmp_template will leak here.

@ltcmelo
Copy link
Owner

ltcmelo commented Mar 22, 2021

Hi @andrewvaughanj , I see you've put some time on working around Ubuntu's filesystem fault, but I'd prefer that we, instead of dealing with raw pointers/legacy API (it's error prone, e.g., there's a leak of tmp_template in the PR upon the early exit), keep your original approach and just add the necessary "patching" to make it work in Ubuntu.

From what I quickly checked, <experimental/filesystem> with additional link flags would do the trick — and a preprocessing directive in the code, to choose the right include header.

@aytey
Copy link
Contributor Author

aytey commented Mar 22, 2021

So, I'm fine to "work around 18.04" but does this mean you actually prefer the pid-based approach?

Personally, I consider mkstemp nicer, but then it does need to deal with a char* API -- for example, what happens if you have a multi-threaded cnip that wants to preprocess multiple files simultaneously?

Happy to do whichever you prefer :)

@ltcmelo
Copy link
Owner

ltcmelo commented Mar 22, 2021

So, I'm fine to "work around 18.04" but does this mean you actually prefer the pid-based approach?

Kind of … 🙂 I mean the approach with <filesystem> / <experimental/filesystem>, but that doesn't need to use pid: you could (should) use the a new file with the same base name and only replace its extension; in particular, with .i (that's the default extension for preprocessed files), similar to what cnip's Python driver does.

@aytey
Copy link
Contributor Author

aytey commented Mar 22, 2021

Do you prefer pid over using the temporary directory + .i? Your choice :)

@ltcmelo
Copy link
Owner

ltcmelo commented Mar 22, 2021

If you ask me, I'd say the temporary directory + .i file. (But with the API.) 🙂

@aytey
Copy link
Contributor Author

aytey commented Mar 28, 2021

Will revisit this this week + do the stuff about looking for C++17 on older Ubuntus!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants