diff --git a/cub/cub.cuh b/cub/cub.cuh
index 27756b5cee..1e02cdc5b2 100644
--- a/cub/cub.cuh
+++ b/cub/cub.cuh
@@ -366,6 +366,8 @@
* - Blocked arrangement. The aggregate tile of items is partitioned
* evenly across threads in "blocked" fashion with threadi
* owning the ith segment of consecutive elements.
+ * Blocked arrangements are often desirable for algorithmic benefits (where
+ * long sequences of items can be processed sequentially within each thread).
*
*
* \par
@@ -377,7 +379,10 @@
* \par
* - Striped arrangement. The aggregate tile of items is partitioned across
* threads in "striped" fashion, i.e., the \p ITEMS_PER_THREAD items owned by
- * each thread have logical stride \p BLOCK_THREADS between them.
+ * each thread have logical stride \p BLOCK_THREADS between them. Striped arrangements
+ * are often desirable for data movement through global memory (where
+ * [read/write coalescing](http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/#coalesced-access-global-memory)
+ * is an important performance consideration).
* |
*
* \par
@@ -398,13 +403,8 @@
* facilitates greater ILP for improved throughput and utilization.
*
* \par
- * Furthermore, cub::BlockExchange provides operations for converting between blocked
- * and striped arrangements. Blocked arrangements are often desirable for
- * algorithmic benefits (where long sequences of items can be processed sequentially
- * within each thread). Striped arrangements are often desirable for data movement
- * through global memory (where
- * [read/write coalescing](http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/#coalesced-access-global-memory)
- * is an important performance consideration).
+ * Finally, cub::BlockExchange provides operations for converting between blocked
+ * and striped arrangements.
*
* \section sec7 (7) Contributors
*
|