Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: DGEMM workunits #146

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Commits on Mar 27, 2023

  1. ENH: DGEMM workunits

    * `dgemm` now uses `pykokkos` workunits/kernels to achieve
    much faster performance than before
    
    * I had to correct a mistake in the benchmark code--we now use
    larger tiling dimensions to expand the data to avoid having
    empty arrays there--the net effect is bigger benchmark sizes,
    which seems desirable anyway
    
    * the benchmark code was also adjusted to modulate/directly
    control the number of OpenMP threads used by PyKokkos
    using the `threadpoolctl` library--this seems to stabilize
    the timing from trial to trial a bit better but there is still
    quite a bit more variation than I'd like between trials (benchmarking
    concurrent code is hard...) for PyKokkos (warmup issues?)
    
    * the small, medium, large slowdowns vs. SciPy are more
    reasonable now (with kernels pre-compiled/cached)
      - from kokkosgh-134: 310X, 4014X, and 4985X slower, respectively
      - here with 1 OpenMP thread: 75X, 19X, 14X
      - here with 4 OpenMP threads: 62X, 66X, 10X
      - here with 10 OpenMP threads: 38X, 18X, 13X
    
    * it may also be interesting to check these on the GPU,
    although OpenBLAS is just using the host as well
    tylerjereddy committed Mar 27, 2023
    Configuration menu
    Copy the full SHA
    c88c279 View commit details
    Browse the repository at this point in the history
  2. MAINT: unpin mypy

    tylerjereddy committed Mar 27, 2023
    Configuration menu
    Copy the full SHA
    e49cbf5 View commit details
    Browse the repository at this point in the history
  3. BENCH: fixup benchmarks

    * remove `threadpoolctl` stuff and switch to using
    `OMP_NUM_THREADS` manually + do way more trials
    and use boxplots to better visualize outliers I might
    be concerned about
    tylerjereddy committed Mar 27, 2023
    Configuration menu
    Copy the full SHA
    1cdbb9c View commit details
    Browse the repository at this point in the history
  4. ENH: PR 146 revisions

    * add fold ratios directly to plots to facilitate
    performance comparisons
    tylerjereddy committed Mar 27, 2023
    Configuration menu
    Copy the full SHA
    0de8ced View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    6323724 View commit details
    Browse the repository at this point in the history
  6. ENH: use scratch for tiled DGEMM

    * early draft of scratch memory setup for the tiled
    DGEMM workunit
    
    * at the moment this doesn't work because of kokkosgh-180,
    so will need to deal with that first
    tylerjereddy committed Mar 27, 2023
    Configuration menu
    Copy the full SHA
    2031748 View commit details
    Browse the repository at this point in the history
  7. WIP, ENH: more kernel growth

    * created two scratch mem locations per team,
    and add draft code to fill them up (probably wrong)
    
    * draft code to fill the result view with the tiling
    operations (probably wrong)
    
    * add some tests for the tiled kernel vs. SciPy
    `dgemm` (new cases are failing, which makes sense
    for now)
    tylerjereddy committed Mar 27, 2023
    Configuration menu
    Copy the full SHA
    ca7bf74 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    2605bbc View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    73a18bb View commit details
    Browse the repository at this point in the history
  10. ENH: add tiled matmul tests passing

    * all tiled matmul tests passing; simplified algorithm
    tylerjereddy committed Mar 27, 2023
    Configuration menu
    Copy the full SHA
    521c849 View commit details
    Browse the repository at this point in the history
  11. BUG, TST: more tests/fixes

    * more tiled DGEMM testing/bug fixing
    tylerjereddy committed Mar 27, 2023
    Configuration menu
    Copy the full SHA
    e40f5c4 View commit details
    Browse the repository at this point in the history
  12. ENH: allow varied league_size

    * allow varied league_size, but currently segfaults
    when greater than `4` it seems...
    tylerjereddy committed Mar 27, 2023
    Configuration menu
    Copy the full SHA
    64b8d0d View commit details
    Browse the repository at this point in the history
  13. Add SciPy

    tylerjereddy committed Mar 27, 2023
    Configuration menu
    Copy the full SHA
    8087560 View commit details
    Browse the repository at this point in the history
  14. Debug prints

    tylerjereddy committed Mar 27, 2023
    Configuration menu
    Copy the full SHA
    92a4a25 View commit details
    Browse the repository at this point in the history
  15. Try more threads

    tylerjereddy committed Mar 27, 2023
    Configuration menu
    Copy the full SHA
    e3166eb View commit details
    Browse the repository at this point in the history
  16. ENH: PR 146 revisions

    * `dgemm()` now accepts a `league_size` argument, in case
    that might be useful for GPU where more blocks of threads may
    be allowed? We no longer calculate `league_size` automatically
    because this can cause segfaults/issues... (wrt actually available
    resources I think...)
    
    * the tiled DGEMM kernel now passes tests with several input
    widths that are different powers of 2
    tylerjereddy committed Mar 27, 2023
    Configuration menu
    Copy the full SHA
    cc7d976 View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    87b8f00 View commit details
    Browse the repository at this point in the history
  18. ENH: support different league sizes

    * add limited league size variation support--size of 1
    and some convenient multiples of 4 may work; tests for 1
    and 4 are passing locally
    tylerjereddy committed Mar 27, 2023
    Configuration menu
    Copy the full SHA
    6c71f6d View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    40a654d View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    30fca1f View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    876cc99 View commit details
    Browse the repository at this point in the history