-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify additional steps to utilize GPU for Linux users #2299
base: master
Are you sure you want to change the base?
Conversation
Specify additional steps to utilize GPU for Linux users
Advice to skip additional step 6 if using CPU.
Added second option to create virtual env via Python's built in venv module for Linux users with CUDA-enabled GPUs
Added virtual envs activation/deactivation commands and changed wording for editing the deactivate block in the activate script of the venv virtual env.
Added instructions to resolve the ptxas issue.
Revised CUDNN_DIR definition
Corrected LD_LIBRARY_PATH definition in conda environment instructions
Rename environment variable to PTXAS_DIR and package manager options.
Added note to use pip instead of conda to install TensorFlow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added steps and respective instructions to install TensorFlow by running the pip install tensorflow[and-cuda] command within a virtual environment (option 1: conda, option 2: venv) and set environment variables to find/locate compatible NVIDIA libs installed with TensorFlow to effectively utilize GPUs. The solution has been successfully tested.
Reference: tensorflow/tensorflow#63362
@haifeng-jin , @MarkDaoust, @8bitmp3 I await any suggestions or revisions if needed. Do we have any updates? |
As I remembered, the current recommended way to install TF is to use |
@haifeng-jin it seems practically impossible for someone owning a PC with CUDA-enabled GPU to perform deep learning experiments with TensorFlow version 2.16.1 and utilize his GPU locally without manually performing some extra steps not included (until today) in the official TensorFlow documentation of the standard installation procedure of TensorFlow for Linux users with GPUs at least as a temporal fix! It turns out that when you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use "add file"/"update file"/"fix file"/etc. commit messages. These are hard to reason about when looking at the history of the file/repository. Instead, please write explanatory git commit messages.
The commit message is also the title of the PR if the PR has only one commit. It is thus twice important to have commit messages that are relevant, as PRs would be easier to understand and easier to analyze in search results.
For how to write good quality git commit messages, please consult https://cbea.ms/git-commit/
Can we instead add these to the install guide? |
@mihaimaruseac shouldn't we explain/specify how to configure manually the environment variables as appropriate? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the update and it seems reasonable to me. Thank you
Why is conda mentioned in this patch? It makes the install guide more convoluted and seems unnecessary to me. |
@Tachi107 I agree. Should I proceed to erase everything related to conda refered as option 1 and just keep one suggested option (create a venv virtual environment)? Perhaps it would be better and more straight-forward? |
Note that I'm not a tensorflow maintainer, just a casual user who happened to stumble upon this patch. But yeah, if I were you I would just show how to setup the venv. Conda users should already know how to do that with their non-default setup :) |
@Tachi107 thank you. It seems very reasonable to simplify the guide like that. However for now I will keep it as is and await for the comments of the maintainers as well. |
@haifeng-jin , @MarkDaoust, @8bitmp3 I await any suggestions or revisions if needed. Do we have any updates? |
There is no need to use conda, a standard venv works fine. In 2.15, tensorflow knew to go look for the NVIDIA binaries installed with python -m venv my-venv
source my-venv/bin/activate
python -m pip install tensorflow[and-cuda]
pushd $(dirname $(python -c 'print(__import__("tensorflow").__file__)'))
ln -svf ../nvidia/*/lib/*.so* .
popd This produces output like:
This is essentially what we do from the R interface in |
Removed option to install within conda virtual environment. Recommendation to install in venv environment.
@t-kalinowski thank you very much for your valuable advice. I revised the PR accordingly. |
@sgkouzias if you also create a symlink at |
Replaced instructions to modify default activate/deactivate scripts with instructions to create symlinks to NVIDIA shared libraries and ptxas.
@t-kalinowski thank you so much for your advice. Instructions have been totally revised as per your comments. Modifications to default |
@8bitmp3 , @haifeng-jin , @MarkDaoust even TensorFlow version |
site/en/install/pip.md
Outdated
|
||
```bash | ||
source tf/bin/activate | ||
deactivate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove deactivate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove
deactivate
?
@learning-to-play removed deactivate
as advised. Furthermore, I could remove the instruction to create symlink to ptxas since it is ultimately not needed for TensorFlow version 2.17.0.rc0
but only for TensorFlow version 2.16.1
. Awaiting your comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to make sure that I understand the situation correctly. Which of the following two situation is correct?
- If the issue doesn't happen for 2.17.0RC0, yes please remove the instructions.
- If the issue happens for both 2.17.0RC0 and 2.16, we can wait for the GPU team to take a look at TF 2.17.0 RC0 Fails to work with GPUs (and TF 2.16 too) tensorflow#63362 and see if the can send a fix for both 2.16.2 and 2.17.0 release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@learning-to-play the only difference is that on version 2.17.0.rc0
you need to create the symlinks to NVIDIA libs in order to utilize GPUs while on version 2.16.1
you should in addition to creating symlinks to NVIDIA libs create a symlink to ptxas as well. Consequently, the command pip install tensorflow[and-cuda]
alone fails to work with GPUs on both versions.
@learning-to-play, @SeeForTwo, @8bitmp3, @haifeng-jin, @MarkDaoust, @markmcd Unfortunately the latest release namely TensorFlow
So it seems as TensorFlow Notes:
ln -sf $(find $(dirname $(dirname $(python -c "import nvidia.cuda_nvcc; print(nvidia.cuda_nvcc.__file__)"))/*/bin/) -name ptxas -print -quit) $VIRTUAL_ENV/bin/ptxas |
Thank you for the contribution, @sgkouzias :) |
Revised the step with instructions to configure the virtual environment variables for GPU users by adding a disclaimer.
@belitskiy, @learning-to-play I revised instructions as advised and will be awaiting your feedback. It is my honor to contribute to the TensorFlow community. |
Specify additional steps to utilize GPU for Linux users