Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance difference Windows/Ubuntu #82

Open
Journeyman3003 opened this issue Jul 9, 2019 · 2 comments
Open

Performance difference Windows/Ubuntu #82

Journeyman3003 opened this issue Jul 9, 2019 · 2 comments

Comments

@Journeyman3003
Copy link

Journeyman3003 commented Jul 9, 2019

Hi all,

I've experienced weird performance issues when compiling the binary on my Windows 10 home desktop vs. an Ubuntu 18.04 virtual machine. I compiled the binary using the given instructions in this repository, that is

g++ sptree.cpp tsne.cpp tsne_main.cpp -o bh_tsne -O2
on Ubuntu and

nmake -f Makefile.win all
on windows (using Visual Studio 2019)

Still, on windows, using all 70000 MNIST digits, the .exe runs only half the time the binary requires on ubuntu, see the following logs:

Windows:

Computing input similarities...
Building tree...

  • point 0 of 70000
  • point 10000 of 70000
  • point 20000 of 70000
  • point 30000 of 70000
  • point 40000 of 70000
  • point 50000 of 70000
  • point 60000 of 70000
    Input similarities computed in 734.95 seconds (sparsity = 0.002964)!
    Learning embedding...
    Iteration 1: error is 114.707556
    Iteration 50: error is 114.707556 (50 iterations in 52.88 seconds)
    Iteration 100: error is 114.707555 (50 iterations in 66.19 seconds)
    Iteration 150: error is 114.706666 (50 iterations in 63.20 seconds)
    Iteration 200: error is 108.965702 (50 iterations in 66.21 seconds)
    Iteration 250: error is 5.916399 (50 iterations in 54.78 seconds)
    Iteration 300: error is 4.703993 (50 iterations in 64.33 seconds)
    Iteration 350: error is 4.304277 (50 iterations in 66.77 seconds)
    Iteration 400: error is 4.067927 (50 iterations in 51.41 seconds)
    Iteration 450: error is 3.899897 (50 iterations in 51.27 seconds)
    Iteration 500: error is 3.772200 (50 iterations in 51.67 seconds)
    Iteration 550: error is 3.669212 (50 iterations in 51.34 seconds)
    Iteration 600: error is 3.585067 (50 iterations in 52.51 seconds)
    Iteration 650: error is 3.513970 (50 iterations in 50.97 seconds)
    Iteration 700: error is 3.452446 (50 iterations in 51.37 seconds)
    Iteration 750: error is 3.398439 (50 iterations in 52.12 seconds)
    Iteration 800: error is 3.350182 (50 iterations in 51.71 seconds)
    Iteration 850: error is 3.306805 (50 iterations in 51.35 seconds)
    Iteration 900: error is 3.267429 (50 iterations in 51.92 seconds)
    Iteration 950: error is 3.231636 (50 iterations in 52.13 seconds)
    Iteration 1000: error is 3.199034 (50 iterations in 51.68 seconds)
    Fitting performed in 1105.82 seconds.

Ubuntu:

Computing input similarities...
Building tree...

  • point 0 of 70000
  • point 10000 of 70000
  • point 20000 of 70000
  • point 30000 of 70000
  • point 40000 of 70000
  • point 50000 of 70000
  • point 60000 of 70000
    Input similarities computed in 735.45 seconds (sparsity = 0.002964)!
    Learning embedding...
    Iteration 1: error is 114.707556
    Iteration 50: error is 114.707556 (50 iterations in 114.57 seconds)
    Iteration 100: error is 114.707555 (50 iterations in 125.56 seconds)
    Iteration 150: error is 114.706492 (50 iterations in 116.89 seconds)
    Iteration 200: error is 109.278084 (50 iterations in 129.52 seconds)
    Iteration 250: error is 5.949873 (50 iterations in 140.74 seconds)
    Iteration 300: error is 4.721405 (50 iterations in 126.03 seconds)
    Iteration 350: error is 4.318675 (50 iterations in 120.15 seconds)
    Iteration 400: error is 4.080921 (50 iterations in 116.11 seconds)
    Iteration 450: error is 3.910903 (50 iterations in 114.96 seconds)
    Iteration 500: error is 3.780495 (50 iterations in 113.75 seconds)
    Iteration 550: error is 3.677030 (50 iterations in 118.45 seconds)
    Iteration 600: error is 3.591950 (50 iterations in 114.40 seconds)
    Iteration 650: error is 3.520119 (50 iterations in 112.86 seconds)
    Iteration 700: error is 3.458082 (50 iterations in 111.77 seconds)
    Iteration 750: error is 3.403688 (50 iterations in 112.70 seconds)
    Iteration 800: error is 3.355337 (50 iterations in 114.43 seconds)
    Iteration 850: error is 3.311879 (50 iterations in 111.70 seconds)
    Iteration 900: error is 3.272490 (50 iterations in 112.07 seconds)
    Iteration 950: error is 3.236595 (50 iterations in 114.01 seconds)
    Iteration 1000: error is 3.203558 (50 iterations in 113.41 seconds)
    Fitting performed in 2354.09 seconds..

TL;DR: while constructing the nearest-neighbor tree takes almost the same time on both machines, the iterations take twice as long on ubuntu.

Any ideas on what could be going wrong would be greatly appreciated! Thanks

@SamGG
Copy link

SamGG commented Jul 9, 2019

I would suspect the linked libraries. For example, matrix multiplications (PCA) are more efficient with Intel MKL libraries which are probably integrated in Visual Studio. https://blog.revolutionanalytics.com/2014/10/revolution-r-open-mkl.html
https://simplystatistics.org/2016/01/21/parallel-blas-in-r/

@Journeyman3003
Copy link
Author

Journeyman3003 commented Jul 10, 2019

Thank you @SamGG for your response!

Profiling the binary with perf led to the conclusion that in particular computeNonEdgeForces (as also noticed here) is the most time-consuming.

Long story short: I couldn't dig the root cause for this, but I've applied the performance improvements mentioned in this pull request (Excellent writeup btw, kudos to tavianator!

Numbers of the mnist are now identical on windows/ubuntu, see below:

Computing input similarities...
Building tree...

  • point 0 of 70000
  • point 10000 of 70000
  • point 20000 of 70000
  • point 30000 of 70000
  • point 40000 of 70000
  • point 50000 of 70000
  • point 60000 of 70000
    Input similarities computed in 700.94 seconds (sparsity = 0.002964)!
    Learning embedding...
    Iteration 1: error is 114.707556
    Iteration 50: error is 114.707555 (50 iterations in 36.25 seconds)
    Iteration 100: error is 114.707555 (50 iterations in 36.45 seconds)
    Iteration 150: error is 114.705697 (50 iterations in 38.57 seconds)
    Iteration 200: error is 107.418164 (50 iterations in 37.08 seconds)
    Iteration 250: error is 5.918315 (50 iterations in 36.25 seconds)
    Iteration 300: error is 4.685798 (50 iterations in 35.77 seconds)
    Iteration 350: error is 4.293142 (50 iterations in 33.82 seconds)
    Iteration 400: error is 4.059164 (50 iterations in 34.66 seconds)
    Iteration 450: error is 3.895986 (50 iterations in 35.09 seconds)
    Iteration 500: error is 3.770166 (50 iterations in 33.31 seconds)
    Iteration 550: error is 3.669497 (50 iterations in 34.37 seconds)
    Iteration 600: error is 3.586112 (50 iterations in 33.95 seconds)
    Iteration 650: error is 3.514839 (50 iterations in 34.67 seconds)
    Iteration 700: error is 3.453197 (50 iterations in 34.06 seconds)
    Iteration 750: error is 3.398859 (50 iterations in 32.85 seconds)
    Iteration 800: error is 3.350613 (50 iterations in 33.65 seconds)
    Iteration 850: error is 3.307334 (50 iterations in 34.55 seconds)
    Iteration 900: error is 3.268252 (50 iterations in 33.63 seconds)
    Iteration 950: error is 3.232840 (50 iterations in 33.08 seconds)
    Iteration 1000: error is 3.200317 (50 iterations in 33.95 seconds)
    Fitting performed in 696.02 seconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants