Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement multi-threading for covariance and gradient matrices (WIP) #129

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

maximerischard
Copy link
Contributor

Single-threaded benchmark (JULIA_NUM_THREADS=1):

│ Row │ kernel                         │ times   │
│     │ String                         │ Float64 │
├─────┼────────────────────────────────┼─────────┤
│ 1   │ (se+mat12)*rq                  │ 2532.37 │
│ 2   │ fix(se, σ)                     │ 519.113 │
│ 3   │ mask(se, [1])                  │ 622.843 │
│ 4   │ mask(se, [1])+mask(rq, [2:10]) │ 1949.92 │
│ 5   │ mat12                          │ 612.091 │
│ 6   │ rq                             │ 1613.57 │
│ 7   │ se                             │ 609.192 │
│ 8   │ se*mat12                       │ 955.519 │
│ 9   │ se*rq                          │ 2169.75 │
│ 10  │ se+mat12                       │ 846.811 │
│ 11  │ se+mat12+rq                    │ 2108.69 │
│ 12  │ se+rq                          │ 2001.41 │

Multi-threaded benchmark (JULIA_NUM_THREADS=10):

│ Row │ kernel                         │ times   │
│     │ String                         │ Float64 │
├─────┼────────────────────────────────┼─────────┤
│ 1   │ (se+mat12)*rq                  │ 881.223 │
│ 2   │ fix(se, σ)                     │ 413.357 │
│ 3   │ mask(se, [1])                  │ 425.865 │
│ 4   │ mask(se, [1])+mask(rq, [2:10]) │ 724.22  │
│ 5   │ mat12                          │ 415.498 │
│ 6   │ rq                             │ 639.111 │
│ 7   │ se                             │ 440.963 │
│ 8   │ se*mat12                       │ 482.739 │
│ 9   │ se*rq                          │ 761.098 │
│ 10  │ se+mat12                       │ 486.406 │
│ 11  │ se+mat12+rq                    │ 787.688 │
│ 12  │ se+rq                          │ 656.913 │

I would be quite keen to get feedback from anyone who's worked with multi-threading in julia before.

@Red-Portal
Copy link
Contributor

Red-Portal commented Dec 21, 2019

I think there should be a threshold that decides whether to use simd + threads or only simd. I doubt that threads will give good performance even with a small number of observations.

Also, did you consider using tasks?
The task scheduling system is not yet in Julia but we could consider if there are some performance benefits there.

@maximerischard
Copy link
Contributor Author

Hi @Red-Portal, thank you for having a look into this. You're right to point out that there should be a threshold to determine whether to use threads if there is some overhead in doing so. I'll do some more benchmarks to check if it's necessary. The benchmarks above show that threads do make a big difference to performance.

My understanding of tasks (coroutines) is that they solve a different problem: parallelising IO-bound tasks. Here multithreading is used to simultaneously compute multiple elements of the covariance matrix (and gradients). There is no communication or coordination between threads, so the overhead is low.

@Red-Portal
Copy link
Contributor

Red-Portal commented Dec 27, 2019

Hi @maximerischard ,
yes as you said, tasks are used for IO bound operations.
However they are also used to implement fine-grained parallelism.
Julia's planned depth-first scheduling is targeted towards parallelism (not concurrency).

The performance improvements are wonderful BTW.

@maximerischard
Copy link
Contributor Author

I've been reading a bit more about tasks and multithreading in julia. What I've done in this PR is use multithreading for the functions that iterate over the entries of an array. It's straightforwardly parallelisable, with no communication between threads, so low overhead. The overhead does seem to be non-zero, though when I benchmark it it's drowned in noise. Roughly speaking it seems to add 50ms to the benchmarks that take 1-2 seconds.

You had two suggestions @Red-Portal:

  1. use Tasks. I've been reading about tasks, and I don't see an obvious way to deploy them here, or an obvious benefit. You seem to have something in mind that I don't really understand. If you have a bit of time, it would be great if you could take a stab at implementing multithreading using tasks, or explain what you have in mind.
  2. set a threshold for multithreading. I tried this and ran into some type instability that made everything much slower. The overhead of using threads is fairly low, so I'm hoping it won't be necessary. I'd like to find a way to have no overhead if NTHREADS=1, but so far I haven't figured it out.

@maximerischard maximerischard changed the title implement multi-threading for covariance and gradient matrices implement multi-threading for covariance and gradient matrices (WIP) Oct 3, 2020
@coveralls
Copy link

Pull Request Test Coverage Report for Build 568

  • 17 of 95 (17.89%) changed or added relevant lines in 2 files are covered.
  • 111 unchanged lines in 48 files lost coverage.
  • Overall coverage decreased (-8.4%) to 67.804%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/covariance/covariance.jl 17 23 73.91%
src/covariance/multithreaded.jl 0 72 0.0%
Files with Coverage Reduction New Missed Lines %
src/kernels/distance.jl 1 87.5%
src/kernels/mat32_ard.jl 1 90.0%
src/kernels/mat32_iso.jl 1 90.0%
src/kernels/mat52_ard.jl 1 90.0%
src/kernels/mat52_iso.jl 1 90.0%
src/kernels/noise.jl 1 92.86%
src/kernels/periodic.jl 1 89.47%
src/kernels/rq_ard.jl 1 94.12%
src/kernels/rq_iso.jl 1 88.24%
src/likelihoods/bernoulli.jl 1 44.44%
Totals Coverage Status
Change from base Build 563: -8.4%
Covered Lines: 1371
Relevant Lines: 2022

💛 - Coveralls

@coveralls
Copy link

coveralls commented Oct 4, 2020

Pull Request Test Coverage Report for Build 625

  • 17 of 95 (17.89%) changed or added relevant lines in 2 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-3.1%) to 73.478%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/covariance/covariance.jl 17 23 73.91%
src/covariance/multithreaded.jl 0 72 0.0%
Files with Coverage Reduction New Missed Lines %
src/kernels/poly.jl 1 80.65%
Totals Coverage Status
Change from base Build 617: -3.1%
Covered Lines: 1460
Relevant Lines: 1987

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants