-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf report shows most cycles spent in blas_thread_server #5171
Comments
That reads like you have threads idling while there is nothing for them to do - what hardware is this, which version of OpenBLAS, do you (or pytorch) do anything to constrain the number of threads OpenBLAS will use ? (by default, it will start up as many as there are cpu cores) |
It'd be great to get an exact reproducer for this @nickdesaulniers. Without knowing how you installed numpy, pytorch and whatever else is in your environment (venv or conda env?), this isn't reproducible. |
$ uname -m
x86_64
Hard to say, exactly. Numpy can print some info about how it was configured, but I don't see any version info in its output. https://numpy.org/doc/1.24/reference/generated/numpy.show_config.html I am using numpy '1.23.5' from an environment managed by conda. Given the .so files, is it possible the check the version from that?
Unsure. What are some symbols I can grep for to check (in numpy and pytorch sources)?
Yeah, sorry. I know it's not the greatest bug report ever.
We're definitely using Right, I'm not even sure myself if this is an issue with my env, conda's packaging/configuration of numpy or openblas itself. Filing something in case this has been brought up before and is familiar, or for future travelers to find a similar thread. |
thanks - that's a bit more specific than "a computer", but it might help to know if it has like 4 or 400 cores ?
Strange, I'd think Numpy's show_config displays the version information, unless you are using a really ancient version of OpenBLAS that does not yet include the version number in its response to openblas_get_config() - this was added in 0.3.4
See if you can find any calls to openblas_set_num_threads() - or anything that sets the environment variable OPENBLAS_NUM_THREADS. You could also try to constrain it yourself, by setting OPENBLAS_NUM_THREADS to a small(ish) value before starting your numpy/pytorch environment. Depending on the problem (matrix) sizes you are working with, having a large number of threads may not be beneficial, as each needs to allocate memory on startup and most will end up idling if they're only lying in ambush, waiting for the occasional 4x4 matrix to stumble by |
I've definitely seen strange performance on multithreaded code on dual socket NUMA machines before...wonder if that's related, or just a red herring. IIRC, using
so guess I'm running 0.3.29, which at the moment looks like your latest release. Looks like Oh, they do have this but it doesn't seem to work.
No hits for
$ for i in $(seq 1 30); do python -c "import time; z=time.time(); import torch; print(time.time() - z)" > /tmp/allthreads.txt; done
$ awk '{s+=$1}END{print "ave:",s/NR}' RS=" " /tmp/allthreads.txt
ave: 1.71255
$ for i in $(seq 1 30); do OPENBLAS_NUM_THREADS=1 python -c "import time; z=time.time(); import torch; print(time.time() - z)" > /tmp/onethread.txt; done
$ awk '{s+=$1}END{print "ave:",s/NR}' RS=" " /tmp/onethread.txt
ave: 1.97009 Didn't seem to make a difference, but there's a lot of variance. |
Ok, thanks. No idea why the show_config() would not be more specific - besides the version number, it should tell us the maximum number of threads the library was built for (which may well be smaller than 256) |
Not sure this is necessarily an issue with OpenBLAS vs users of OpenBLAS (numpy, pytorch).
I'm seeing slow python imports of pytorch; literally
import pytorch
is taking multiple seconds on my system.When I record the python interpreter with linux
perf record
,perf report
shows most cycles are spent inblas_thread_server
via BOTH liblapack.so.3 and libcblas.so.3. i.e.If I annotate either, it seems both are near reading the time stamp counter:
I'm guessing that's corresponding to code around here.
numpy/numpy#24639 seems like someone else hit this, too, but...https://xkcd.com/979/.
How do I even go about debugging this further? Is it an issue in pytorch? numpy? openblas? PEBKAC?
Importing numpy alone doesn't seem problematic, though I suspect that it's part of the chain of dependencies here. Perhaps related to how pytorch is (mis)using numpy then???
The text was updated successfully, but these errors were encountered: