BUG: no guardrails on shared/scratch alloc requests #183

tylerjereddy · 2023-03-14T20:48:20Z

Modifying the tiled DGEMM kernel code in gh-146 as below can lead to a segfault. While I realize the C++ Kokkos docs do advise checking the size of the shared memory caches before allocating them, this isn't really a Pythonic experience so we may need some kind of (arguably default-on) mode for auto-querying the size of the i.e., L1 (and so on) cache and refusing to compile it.

The argument in favor of default-on is similar to that for Cython--you need to explicitly opt out of helpful guardrails like bounds checking and so on to get the full-blown performance (i.e., you develop with guardrails on, then deploy to production/releases with i.e., decorators that disable the guardrails).

--- a/pykokkos/linalg/workunits.py
+++ b/pykokkos/linalg/workunits.py
@@ -46,7 +46,7 @@ def dgemm_impl_tiled_no_view_c(team_member: pk.TeamMember,
     global_tid: int = team_member.league_rank() * team_member.team_size() + team_member.team_rank()
 
     # TODO: I have no idea how to get 2D scratch memory views?
-    scratch_mem_a: pk.ScratchView1D[float] = pk.ScratchView1D(team_member.team_scratch(0), tile_size)
+    scratch_mem_a: pk.ScratchView1D[float] = pk.ScratchView1D(team_member.team_scratch(0), tile_size * 100000)
     scratch_mem_b: pk.ScratchView1D[float] = pk.ScratchView1D(team_member.team_scratch(0), tile_size)
     # in a 4 x 4 matrix with 2 x 2 tiling the leagues
     # and teams have matching row/col assignment approaches

tests/test_linalg.py ........Fatal Python error: Fatal Python error: Fatal Python error: Segmentation faultFatal Python error: Fatal Python error: Fatal Python error: Fatal Python error: Fatal Python error: Fatal Python error: Fatal Python error: Fatal Python error: 

Segmentation faultSegmentation faultSegmentation faultThread 0xSegmentation fault00007fd9779c81c0 (most recent call first):
Segmentation fault (core dumped)

I wonder if the CI segfault we see over in the matching PR is related to some kind of prohibition on using L1 cache in the virtual machine or something??

The text was updated successfully, but these errors were encountered:

tylerjereddy · 2023-03-15T02:01:55Z

Christian didn't seem to think that Kokkos core had issues allocating L1 cache (scratch) mem on GitHub Actions runners.

JBludau · 2023-03-16T14:02:06Z

yeah, L1 is on-chip memory, there will not be much of it. Using it has to be coordinated with how many things you launch in parallel, hardware, etc. ... and if you use it you will basically reduce the amount of registers per thread ...
Unfortunately it will not work to just put any tile size in the ScratchView. This is probably the deepest layer of tweaking stuff that is available in kokkos ...

JBludau · 2023-03-16T14:07:47Z

Nevertheless, we could try to put a maximum:
static int scratch_size_max(int level);

Returns: the maximum total scratch size in bytes, for the given level. Note: If a kernel performs team-level reductions or scan operations, not all of this memory will be available for dynamic user requests. Some of that maximal scratch size is being used for internal operations. The actual size of these internal allocations depends on the value type used in the reduction or scan.

tylerjereddy added the bug Something isn't working label Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: no guardrails on shared/scratch alloc requests #183

BUG: no guardrails on shared/scratch alloc requests #183

tylerjereddy commented Mar 14, 2023

tylerjereddy commented Mar 15, 2023

JBludau commented Mar 16, 2023 •

edited

Loading

JBludau commented Mar 16, 2023

BUG: no guardrails on shared/scratch alloc requests #183

BUG: no guardrails on shared/scratch alloc requests #183

Comments

tylerjereddy commented Mar 14, 2023

tylerjereddy commented Mar 15, 2023

JBludau commented Mar 16, 2023 • edited Loading

JBludau commented Mar 16, 2023

JBludau commented Mar 16, 2023 •

edited

Loading