Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subprocess.run stalls indefinitely and consumes all memory when checking ninja version #955

Open
gareth-cross opened this issue Dec 4, 2024 · 4 comments

Comments

@gareth-cross
Copy link

  • Python version: 3.10 (installed via conda-forge)
  • scikit-build-core version: 0.10.7 (but issue is present on 0.10.6 as well)
  • OS: Ubuntu 22.04 (WSL, kernel: 5.15.167.4)

Steps to reproduce:

I am running pip wheel --verbose --verbose --verbose . on my project. The build gets this far:

 Created temporary directory: /tmp/pip-build-env-x87al_n8
  Created temporary directory: /tmp/pip-standalone-pip-rpekp9xc
  Running command /home/gareth/repos/wfenv/bin/python /tmp/pip-standalone-pip-rpekp9xc/__env_pip__.zip/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-x87al_n8/overlay --no-warn-script-location -v --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'scikit-build-core @ file:///home/gareth/repos/scikit-build-core' 'cmake>=3.20,<3.31' 'ninja>=1.5'
...
  Collecting cmake<3.31,>=3.20
    Using cached cmake-3.30.5-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.9 MB)
  Collecting ninja>=1.5
    Using cached ninja-1.11.1.2-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (422 kB)
...
  Getting requirements to build wheel ... done
...
Building wheels for collected packages: wrenfold
  Created temporary directory: /tmp/pip-wheel-sqd1lqbx
  Destination directory: /tmp/pip-wheel-sqd1lqbx
  Running command /home/gareth/repos/wfenv/bin/python /home/gareth/repos/wfenv/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmpafsk0kce
  2024-12-03 21:46:10,392 - scikit_build_core - WARNING - cmake should not be in build-system.requires - scikit-build-core will inject it as needed
  2024-12-03 21:46:10,392 - scikit_build_core - WARNING - ninja should not be in build-system.requires - scikit-build-core will inject it as needed
  2024-12-03 21:46:10,413 - scikit_build_core - INFO - RUN: /tmp/pip-build-env-x87al_n8/overlay/lib/python3.10/site-packages/cmake/data/bin/cmake -E capabilities
  2024-12-03 21:46:10,419 - scikit_build_core - INFO - CMake version: 3.30.5
  *** scikit-build-core 0.10.7 using CMake 3.30.5 (wheel)
  2024-12-03 21:46:10,423 - scikit_build_core - INFO - Build directory: /tmp/tmpb32t54no/build
  *** Configuring CMake...
  2024-12-03 21:46:10,427 - scikit_build_core - DEBUG - SITE_PACKAGES: /home/gareth/repos/wfenv/lib/python3.10/site-packages
  2024-12-03 21:46:10,427 - scikit_build_core - DEBUG - Extra SITE_PACKAGES: /tmp/pip-build-env-x87al_n8/overlay/lib/python3.10/site-packages
  2024-12-03 21:46:10,427 - scikit_build_core - DEBUG - PATH: ['/home/gareth/repos/wfenv/lib/python3.10/site-packages/pip/_vendor/pep517/in_process', '/tmp/pip-build-env-x87al_n8/site', '/home/gareth/mambaforge/envs/devtools/lib/python310.zip', '/home/gareth/mambaforge/envs/devtools/lib/python3.10', '/home/gareth/mambaforge/envs/devtools/lib/python3.10/lib-dynload', '/tmp/pip-build-env-x87al_n8/overlay/lib/python3.10/site-packages', '/tmp/pip-build-env-x87al_n8/normal/lib/python3.10/site-packages']
  2024-12-03 21:46:10,432 - scikit_build_core - DEBUG - Default generator: Ninja
  2024-12-03 21:46:10,433 - scikit_build_core - INFO - RUN: /home/gareth/repos/wfenv/bin/ninja --version

The process then stalls, and memory usage grows indefinitely until the process dies. If I kill the process, it appears to stop while reading stdout inside subprocess. I realize this context is a little thin at the moment, but I am still trying to gather debugging information.

Running the command /home/gareth/repos/wfenv/bin/ninja --version manually has no issues. It prints 1.11.1.git.kitware.jobserver-1 and exits.

One (possibly tangential) question I have is: Why does scikit-build-core query the instance of ninja present in my virtual environment /home/gareth/repos/wfenv/bin/ninja (see INFO print above), rather than the version that is collected by pip wheel in the build overlay. Is this expected?

Notably, if I uninstall the instance of ninja in wfenv, the build proceeds normally:

  2024-12-03 22:12:27,909 - scikit_build_core - DEBUG - Default generator: Ninja
  2024-12-03 22:12:27,910 - scikit_build_core - INFO - RUN: ninja --version
  2024-12-03 22:12:27,911 - scikit_build_core - INFO - Ninja version: 1.11.1
  2024-12-03 22:12:27,911 - scikit_build_core - DEBUG - CMAKE_GENERATOR: Using ninja: ninja

I instrumented my CMake to check the path to ninja and found:

  -- CMAKE_MAKE_PROGRAM: ninja
  -- Path to make program: /tmp/pip-build-env-aducxdi1/overlay/bin/ninja

Which appears to be correct - it is using the overlay version.

Of course, I can remove any stray instances of ninja in my virtual environment - but it is somewhat concerning that finding the wrong one triggers a lock-up followed by OOM, so I would like to understand this issue a bit better.

@LecrisUT
Copy link
Collaborator

LecrisUT commented Dec 4, 2024

From navigating the code I guess you are hitting

except (subprocess.CalledProcessError, PermissionError):

And then it fails further down the line when it tries to match ninja version specification.

When you run

$ ninja --version
$ echo $?

Do you get a non-zero exit value, because it would put it in that branch

@gareth-cross
Copy link
Author

Do you get a non-zero exit value, because it would put it in that branch

It exits normally with return code 0. If I kill the stalled process with ctrl-C, it seems like it never escapes out of the call to Run().capture(ninja_path, "--version").

I am not really familiar with the expected behavior here - it feels like scikit-build-core invoking the existing ninja install in my venv is incorrect, and rather it should use the version installed in the build overlay.

@henryiii
Copy link
Collaborator

henryiii commented Dec 4, 2024

It should try the local one first. If it’s installed in the build env, you should not be able to get past it (unless it was broken).

Though the outer one should be broken either. Forcing a pip version is not recommended, as some platforms do not have wheels, like BSDs. Will have to investigate, hopefully later today.

@ShunChengWu
Copy link

I encountered exactly the same problem. What saved me in the end was to uninstall Ninja with pip uninstall ninja.
Hope this helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants