Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions chapter-05/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,11 +194,11 @@ As above, there is one copy of the array `x[]` for each thread in the grid, so `

**c. How many versions of the variable y_s are there?**

`y_s` is the variable stored in the shared memory. There is one copy of a variable per block in the grid. Since we have 128 blocks in the grid (see a), therefore we have `128` versions of the variable `y_s`.
`y_s` is the variable stored in the shared memory. There is one copy of a variable per block in the grid. Since we have 8 blocks in the grid (see a), therefore we have `8` versions of the variable `y_s`.

**d. How many versions of the array b_s[] are there?**

Same as in c, 128 blocks, so `128` versions of `b_s` stored in the shared memory.
Same as in c, 8 blocks, so `8` versions of `b_s` stored in the shared memory.

**e. What is the amount of shared memory used per block (in bytes)?**

Expand Down Expand Up @@ -230,4 +230,4 @@ The SM supports up to 32 blocks per SM, each block running `64` threads. This br

**b. The kernel uses 256 threads/block, 31 registers/thread, and 8 KB of shared memory/SM.**

The kernel is using the 256 threads per block, meaning we can have up to `2048/256=8` blocks max. With this configuration, we run `8x256=2048` threads in total. Each thread will use 64 registers, bringing us to the total of `2048x31=63488` registers in total, slightly below our register upper bound. The kernel is using 8 KB per block, and since we have 8 blocks, we will be using `8 x 8 KB = 64 KB` of memory total, considerably below our memory limit. This means that we can run 2048 threads and that we will achieve a 100% occupancy rate.
The kernel is using the 256 threads per block, meaning we can have up to `2048/256=8` blocks max. With this configuration, we run `8x256=2048` threads in total. Each thread will use 64 registers, bringing us to the total of `2048x31=63488` registers in total, slightly below our register upper bound. The kernel is using 8 KB per block, and since we have 8 blocks, we will be using `8 x 8 KB = 64 KB` of memory total, considerably below our memory limit. This means that we can run 2048 threads and that we will achieve a 100% occupancy rate.