Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: no guardrails on shared/scratch alloc requests #183

Open
tylerjereddy opened this issue Mar 14, 2023 · 3 comments
Open

BUG: no guardrails on shared/scratch alloc requests #183

tylerjereddy opened this issue Mar 14, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@tylerjereddy
Copy link
Contributor

Modifying the tiled DGEMM kernel code in gh-146 as below can lead to a segfault. While I realize the C++ Kokkos docs do advise checking the size of the shared memory caches before allocating them, this isn't really a Pythonic experience so we may need some kind of (arguably default-on) mode for auto-querying the size of the i.e., L1 (and so on) cache and refusing to compile it.

The argument in favor of default-on is similar to that for Cython--you need to explicitly opt out of helpful guardrails like bounds checking and so on to get the full-blown performance (i.e., you develop with guardrails on, then deploy to production/releases with i.e., decorators that disable the guardrails).

--- a/pykokkos/linalg/workunits.py
+++ b/pykokkos/linalg/workunits.py
@@ -46,7 +46,7 @@ def dgemm_impl_tiled_no_view_c(team_member: pk.TeamMember,
     global_tid: int = team_member.league_rank() * team_member.team_size() + team_member.team_rank()
 
     # TODO: I have no idea how to get 2D scratch memory views?
-    scratch_mem_a: pk.ScratchView1D[float] = pk.ScratchView1D(team_member.team_scratch(0), tile_size)
+    scratch_mem_a: pk.ScratchView1D[float] = pk.ScratchView1D(team_member.team_scratch(0), tile_size * 100000)
     scratch_mem_b: pk.ScratchView1D[float] = pk.ScratchView1D(team_member.team_scratch(0), tile_size)
     # in a 4 x 4 matrix with 2 x 2 tiling the leagues
     # and teams have matching row/col assignment approaches
tests/test_linalg.py ........Fatal Python error: Fatal Python error: Fatal Python error: Segmentation faultFatal Python error: Fatal Python error: Fatal Python error: Fatal Python error: Fatal Python error: Fatal Python error: Fatal Python error: Fatal Python error: 

Segmentation faultSegmentation faultSegmentation faultThread 0xSegmentation fault00007fd9779c81c0 (most recent call first):
Segmentation fault (core dumped)

I wonder if the CI segfault we see over in the matching PR is related to some kind of prohibition on using L1 cache in the virtual machine or something??

@tylerjereddy tylerjereddy added the bug Something isn't working label Mar 14, 2023
@tylerjereddy
Copy link
Contributor Author

Christian didn't seem to think that Kokkos core had issues allocating L1 cache (scratch) mem on GitHub Actions runners.

@JBludau
Copy link
Contributor

JBludau commented Mar 16, 2023

yeah, L1 is on-chip memory, there will not be much of it. Using it has to be coordinated with how many things you launch in parallel, hardware, etc. ... and if you use it you will basically reduce the amount of registers per thread ...
Unfortunately it will not work to just put any tile size in the ScratchView. This is probably the deepest layer of tweaking stuff that is available in kokkos ...

@JBludau
Copy link
Contributor

JBludau commented Mar 16, 2023

Nevertheless, we could try to put a maximum:
static int scratch_size_max(int level);

Returns: the maximum total scratch size in bytes, for the given level. Note: If a kernel performs team-level reductions or scan operations, not all of this memory will be available for dynamic user requests. Some of that maximal scratch size is being used for internal operations. The actual size of these internal allocations depends on the value type used in the reduction or scan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants