Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check behavior of OPENBLAS_NUM_THREADS #76

Closed
gfursin opened this issue May 4, 2017 · 2 comments
Closed

Check behavior of OPENBLAS_NUM_THREADS #76

gfursin opened this issue May 4, 2017 · 2 comments
Assignees

Comments

@gfursin
Copy link
Contributor

gfursin commented May 4, 2017

Just found interesting behavior.

I expected that Caffe CPU version would use OpenMP and multiple threads with default build, but it doesn't seem to be the case, or the OpenMP strategy is not optimal. We just noticed that on multi-core ARM-based machine with Ubuntu when env var "OPENBLAS_NUM_THREADS" is not set at all, or if we set it to "OPENBLAS_NUM_THREADS":1, the difference in performance can be 6x ! Further changes of this var, i.e. forcing various threads improve performance further but not as dramatic, i.e. we can find an optimal parameter to get about 2x further performance.

We will need to understand the default behavior of this var in Caffe to decide how to add it to CK workflow (i.e. should we explicitly add this var to meta.json?) I expected that when not defined OpenBLAS would turn on adaptive OpenMP strategy (i.e. to dynamically select number of threads depending on current system load), but it doesn't seem to be the case.

In contrast, on multicore x86_64 the performance difference when this var is not set or set to 1 is small (around 20%) and we can improve performance further by autotuning number of threads to around 50%.

When we understand the behavior of this OpenBLAS parameter, we should add this param to CK workflow for crowd-tuning and reproducibility ...

@gfursin
Copy link
Contributor Author

gfursin commented May 4, 2017

Extra note: just found on Google that someone had similar issues:

So, in the future, we should add this parameter to our crowd-tuner and share best results for different machines in cKnowledge.org/repo - I added it to our big ToDo list ;) ...

@gfursin
Copy link
Contributor Author

gfursin commented Jun 1, 2017

I move this ticket here: dividiti/ck-caffe#108

@gfursin gfursin closed this as completed Jun 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants