Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive memory usage during compilation with pip #115

Closed
sirmarcel opened this issue Apr 29, 2024 · 6 comments
Closed

Excessive memory usage during compilation with pip #115

sirmarcel opened this issue Apr 29, 2024 · 6 comments
Assignees

Comments

@sirmarcel
Copy link

sirmarcel commented Apr 29, 2024

Currently, attempting to build the sphericart-torch wheel with pip requires a large amount of RAM if many CPU cores are present. I think this is due to this line, which invokes cmake without specifying the number of jobs, which presumably will default to the total number of cores. On a HPC system those can be 40 or 80, and so compilation tends to get killed by the host OS.

While this is not catastrophic, it is inconvenient, and a waste of resources in many cases (the compilation is not much faster in parallel mode). I would suggest defaulting to some reasonable default instead, or disabling parallel builds entirely. Alternatively, the installation docs should at least mention this fact (see #116).

@nickjbrowning
Copy link
Collaborator

Thanks for the find, this is a very good point. I'll address this in a PR tomorrow.

@sirmarcel
Copy link
Author

sirmarcel commented Apr 29, 2024

Thanks @nickjbrowning !

@nickjbrowning nickjbrowning self-assigned this Apr 29, 2024
@Luthaf
Copy link
Contributor

Luthaf commented Apr 30, 2024

One thing I don't understand here is that we don't have that many files to compile, so make -j and make -j8 should have the same behavior (launch ~8 compilation jobs).

@sirmarcel
Copy link
Author

It's a bit suspicious. My observation is: (a) compilation dies with kill on the default allocation on izar (4GB I believe), (b) if you remove --parallel from the setup.py file of sphericart-torch, it works without problem, (c) requesting a node with 32GB also works, without modification.

@Luthaf
Copy link
Contributor

Luthaf commented Apr 30, 2024

Oh, right. I can see the compiler requiring a couple of GiB per file (there are a lot of torch header to parse and template to instantiate), so parallel compilation would fail with only 4GiB of available RAM. But then the changed by @nickjbrowning would not fix it here, since the compilation would also fail with only 8 jobs.

@nickjbrowning
Copy link
Collaborator

nickjbrowning commented Jul 12, 2024

I've added these two environment variables to the build process:

SPHERICART_PARALLEL_BUILD=ON
SPHERICART_JOBS=NJOBS

So you can now control the number of build jobs via:

SPHERICART_PARALLEL_BUILD=OFF pip install .[torch] #disables parallel builds
SPHERICART_JOBS=4 pip install .[torch] #uses 4 jobs for compilation

frostedoyster added a commit that referenced this issue Jul 17, 2024
---------

Co-authored-by: frostedoyster <[email protected]>
Co-authored-by: Filippo Bigi <[email protected]>
Co-authored-by: Guillaume Fraux <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants