Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing OPENBLAS_NUM_THREADS env during installation/usage of Caffe? #108

Open
gfursin opened this issue Jun 1, 2017 · 0 comments
Open

Comments

@gfursin
Copy link
Contributor

gfursin commented Jun 1, 2017

(Moving ticket from mlcommons/ck#76)

Just found interesting behavior if OPENBLAS_NUM_THREADS is not set at all - I expected that Caffe CPU version would use OpenMP and multiple threads with default build, but it doesn't seem to be the case, or the OpenMP strategy is not optimal. We just noticed that on multi-core ARM-based machine with Ubuntu when env var "OPENBLAS_NUM_THREADS" is not set at all, or if we set it to "OPENBLAS_NUM_THREADS":1, the difference in performance can be 6x ! Further changes of this var, i.e. forcing various threads improve performance further but not as dramatic, i.e. we can find an optimal parameter to get about 2x further performance.

We will need to understand the default behavior of this var in Caffe to decide how to add it to CK workflow (i.e. should we explicitly add this var to meta.json?) I expected that when not defined OpenBLAS would turn on adaptive OpenMP strategy (i.e. to dynamically select number of threads depending on current system load), but it doesn't seem to be the case.

In contrast, on multicore x86_64 the performance difference when this var is not set or set to 1 is small (around 20%) and we can improve performance further by autotuning number of threads to around 50%.

When we understand the behavior of this OpenBLAS parameter, we should add this param to CK workflow for crowd-tuning and reproducibility ...

Extra note: just found on Google that someone had similar issues:

http://stackoverflow.com/questions/30195837/how-to-use-multi-cpu-cores-to-train-nns-using-caffe-and-openblas

So, in the future, we should add this parameter to our crowd-tuner and share best results for different machines in cKnowledge.org/repo - I added it to our big ToDo list ;) ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant