diff --git a/chapter-05/README.md b/chapter-05/README.md index 7fcd05b..e8e039f 100644 --- a/chapter-05/README.md +++ b/chapter-05/README.md @@ -194,11 +194,11 @@ As above, there is one copy of the array `x[]` for each thread in the grid, so ` **c. How many versions of the variable y_s are there?** -`y_s` is the variable stored in the shared memory. There is one copy of a variable per block in the grid. Since we have 128 blocks in the grid (see a), therefore we have `128` versions of the variable `y_s`. +`y_s` is the variable stored in the shared memory. There is one copy of a variable per block in the grid. Since we have 8 blocks in the grid (see a), therefore we have `8` versions of the variable `y_s`. **d. How many versions of the array b_s[] are there?** -Same as in c, 128 blocks, so `128` versions of `b_s` stored in the shared memory. +Same as in c, 8 blocks, so `8` versions of `b_s` stored in the shared memory. **e. What is the amount of shared memory used per block (in bytes)?** @@ -230,4 +230,4 @@ The SM supports up to 32 blocks per SM, each block running `64` threads. This br **b. The kernel uses 256 threads/block, 31 registers/thread, and 8 KB of shared memory/SM.** -The kernel is using the 256 threads per block, meaning we can have up to `2048/256=8` blocks max. With this configuration, we run `8x256=2048` threads in total. Each thread will use 64 registers, bringing us to the total of `2048x31=63488` registers in total, slightly below our register upper bound. The kernel is using 8 KB per block, and since we have 8 blocks, we will be using `8 x 8 KB = 64 KB` of memory total, considerably below our memory limit. This means that we can run 2048 threads and that we will achieve a 100% occupancy rate. \ No newline at end of file +The kernel is using the 256 threads per block, meaning we can have up to `2048/256=8` blocks max. With this configuration, we run `8x256=2048` threads in total. Each thread will use 64 registers, bringing us to the total of `2048x31=63488` registers in total, slightly below our register upper bound. The kernel is using 8 KB per block, and since we have 8 blocks, we will be using `8 x 8 KB = 64 KB` of memory total, considerably below our memory limit. This means that we can run 2048 threads and that we will achieve a 100% occupancy rate.