Doc updates for new version

Former-commit-id: 1cf8089
NVIDIA · Feb 25, 2014 · c60ec5b · c60ec5b
1 parent 1843f06
commit c60ec5b
Show file tree

Hide file tree

Showing 144 changed files with 592 additions and 308 deletions.
diff --git a/CHANGE_LOG.TXT b/CHANGE_LOG.TXT
@@ -1,5 +1,10 @@
 //-----------------------------------------------------------------------------
 
+1.2.0    02/25/2014
+    - New features:
+
+//-----------------------------------------------------------------------------
+
 1.1.1    12/11/2013
     - New features:
         - Added TexObjInputIterator, TexRefInputIterator, CacheModifiedInputIterator, and CacheModifiedOutputIterator types for loading & storing arbitrary types through the cache hierarchy.  Compatible with Thrust API. 

diff --git a/LICENSE.TXT b/LICENSE.TXT
@@ -1,5 +1,5 @@
 Copyright (c) 2010-2011, Duane Merrill.  All rights reserved.
-Copyright (c) 2011-2013, NVIDIA CORPORATION.  All rights reserved.
+Copyright (c) 2011-2014, NVIDIA CORPORATION.  All rights reserved.
 
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions are met:

diff --git a/README.md b/README.md
@@ -1,7 +1,7 @@
 <hr>
 <h3>About CUB</h3>
 
-Current release: v1.1.1 (December 11, 2013)
+Current release: v1.2.0 (February 25, 2014)
 
 We recommend the [CUB Project Website](http://nvlabs.github.com/cub) and the [cub-users discussion forum](http://groups.google.com/group/cub-users) for further information and examples.
 
@@ -84,6 +84,7 @@ See [CUB Project Website](http://nvlabs.github.com/cub) for more information.
  
 | Date | Version |
 | ---- | ------- |
+| 02/25/2014 | [CUB v1.2.0 Primary Release](https://github.com/NVlabs/cub/archive/1.2.0.zip) |
 | 12/10/2013 | [CUB v1.1.1 Primary Release](https://github.com/NVlabs/cub/archive/1.1.1.zip) |
 | 08/08/2013 | [CUB v1.0.1 Primary Release](https://github.com/NVlabs/cub/archive/1.0.1.zip) |
 | 05/07/2013 | [CUB v0.9.4 Update Release](https://github.com/NVlabs/cub/archive/0.9.4.zip) |
@@ -104,7 +105,7 @@ CUB is available under the "New BSD" open-source license:
 
 ```
 Copyright (c) 2010-2011, Duane Merrill.  All rights reserved.
-Copyright (c) 2011-2013, NVIDIA CORPORATION.  All rights reserved.
+Copyright (c) 2011-2014, NVIDIA CORPORATION.  All rights reserved.
 
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions are met:

diff --git a/cub/block/block_discontinuity.cuh b/cub/block/block_discontinuity.cuh
@@ -1,6 +1,6 @@
 /******************************************************************************
  * Copyright (c) 2011, Duane Merrill.  All rights reserved.
- * Copyright (c) 2011-2013, NVIDIA CORPORATION.  All rights reserved.
+ * Copyright (c) 2011-2014, NVIDIA CORPORATION.  All rights reserved.
  * 
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are met:
@@ -61,7 +61,7 @@ namespace cub {
  * \blockcollective{BlockDiscontinuity}
  * \par
  * The code snippet below illustrates the head flagging of 512 integer items that
- * are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec3) across 128 threads
+ * are partitioned in a [<em>blocked arrangement</em>](index.html#sec5sec3) across 128 threads
  * where each thread owns 4 consecutive items.
  * \par
  * \code
@@ -274,7 +274,7 @@ public:
      *
      * \par
      * The code snippet below illustrates the head-flagging of 512 integer items that
-     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec3) across 128 threads
+     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec5sec3) across 128 threads
      * where each thread owns 4 consecutive items.
      * \par
      * \code
@@ -352,7 +352,7 @@ public:
      *
      * \par
      * The code snippet below illustrates the head-flagging of 512 integer items that
-     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec3) across 128 threads
+     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec5sec3) across 128 threads
      * where each thread owns 4 consecutive items.
      * \par
      * \code
@@ -445,7 +445,7 @@ public:
      *
      * \par
      * The code snippet below illustrates the tail-flagging of 512 integer items that
-     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec3) across 128 threads
+     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec5sec3) across 128 threads
      * where each thread owns 4 consecutive items.
      * \par
      * \code
@@ -524,7 +524,7 @@ public:
      *
      * \par
      * The code snippet below illustrates the tail-flagging of 512 integer items that
-     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec3) across 128 threads
+     * are partitioned in a [<em>blocked arrangement</em>](index.html#sec5sec3) across 128 threads
      * where each thread owns 4 consecutive items.
      * \par
      * \code

diff --git a/cub/block/block_exchange.cuh b/cub/block/block_exchange.cuh
@@ -1,6 +1,6 @@
 /******************************************************************************
  * Copyright (c) 2011, Duane Merrill.  All rights reserved.
- * Copyright (c) 2011-2013, NVIDIA CORPORATION.  All rights reserved.
+ * Copyright (c) 2011-2014, NVIDIA CORPORATION.  All rights reserved.
  * 
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are met:
@@ -60,10 +60,10 @@ namespace cub {
  *   yet most block-wide operations prefer a "blocked" partitioning of items across threads
  *   (where consecutive items belong to a single thread).
  * - BlockExchange supports the following types of data exchanges:
- *   - Transposing between [<em>blocked</em>](index.html#sec4sec3) and [<em>striped</em>](index.html#sec4sec3) arrangements
- *   - Transposing between [<em>blocked</em>](index.html#sec4sec3) and [<em>warp-striped</em>](index.html#sec4sec3) arrangements
- *   - Scattering ranked items to a [<em>blocked arrangement</em>](index.html#sec4sec3)
- *   - Scattering ranked items to a [<em>striped arrangement</em>](index.html#sec4sec3)
+ *   - Transposing between [<em>blocked</em>](index.html#sec5sec3) and [<em>striped</em>](index.html#sec5sec3) arrangements
+ *   - Transposing between [<em>blocked</em>](index.html#sec5sec3) and [<em>warp-striped</em>](index.html#sec5sec3) arrangements
+ *   - Scattering ranked items to a [<em>blocked arrangement</em>](index.html#sec5sec3)
+ *   - Scattering ranked items to a [<em>striped arrangement</em>](index.html#sec5sec3)
  *
  * \par A Simple Example
  * \blockcollective{BlockExchange}

diff --git a/cub/block/block_histogram.cuh b/cub/block/block_histogram.cuh
@@ -1,6 +1,6 @@
 /******************************************************************************
  * Copyright (c) 2011, Duane Merrill.  All rights reserved.
- * Copyright (c) 2011-2013, NVIDIA CORPORATION.  All rights reserved.
+ * Copyright (c) 2011-2014, NVIDIA CORPORATION.  All rights reserved.
  * 
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are met:

diff --git a/cub/block/block_load.cuh b/cub/block/block_load.cuh
@@ -1,6 +1,6 @@
 /******************************************************************************
  * Copyright (c) 2011, Duane Merrill.  All rights reserved.
- * Copyright (c) 2011-2013, NVIDIA CORPORATION.  All rights reserved.
+ * Copyright (c) 2011-2014, NVIDIA CORPORATION.  All rights reserved.
  * 
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are met:
@@ -441,7 +441,7 @@ enum BlockLoadAlgorithm
     /**
      * \par Overview
      *
-     * A [<em>blocked arrangement</em>](index.html#sec4sec3) of data is read
+     * A [<em>blocked arrangement</em>](index.html#sec5sec3) of data is read
      * directly from memory.  The thread block reads items in a parallel "raking" fashion: thread<sub><em>i</em></sub>
      * reads the <em>i</em><sup>th</sup> segment of consecutive elements.
      *
@@ -454,7 +454,7 @@ enum BlockLoadAlgorithm
     /**
      * \par Overview
      *
-     * A [<em>blocked arrangement</em>](index.html#sec4sec3) of data is read directly
+     * A [<em>blocked arrangement</em>](index.html#sec5sec3) of data is read directly
      * from memory using CUDA's built-in vectorized loads as a coalescing optimization.
      * The thread block reads items in a parallel "raking" fashion: thread<sub><em>i</em></sub> uses vector loads to
      * read the <em>i</em><sup>th</sup> segment of consecutive elements.
@@ -476,13 +476,13 @@ enum BlockLoadAlgorithm
     /**
      * \par Overview
      *
-     * A [<em>striped arrangement</em>](index.html#sec4sec3) of data is read
+     * A [<em>striped arrangement</em>](index.html#sec5sec3) of data is read
      * directly from memory and then is locally transposed into a
-     * [<em>blocked arrangement</em>](index.html#sec4sec3). The thread block
+     * [<em>blocked arrangement</em>](index.html#sec5sec3). The thread block
      * reads items in a parallel "strip-mining" fashion:
      * thread<sub><em>i</em></sub> reads items having stride \p BLOCK_THREADS
      * between them. cub::BlockExchange is then used to locally reorder the items
-     * into a [<em>blocked arrangement</em>](index.html#sec4sec3).
+     * into a [<em>blocked arrangement</em>](index.html#sec5sec3).
      *
      * \par Performance Considerations
      * - The utilization of memory transactions (coalescing) remains high regardless
@@ -496,13 +496,13 @@ enum BlockLoadAlgorithm
     /**
      * \par Overview
      *
-     * A [<em>warp-striped arrangement</em>](index.html#sec4sec3) of data is read
+     * A [<em>warp-striped arrangement</em>](index.html#sec5sec3) of data is read
      * directly from memory and then is locally transposed into a
-     * [<em>blocked arrangement</em>](index.html#sec4sec3). Each warp reads its own
+     * [<em>blocked arrangement</em>](index.html#sec5sec3). Each warp reads its own
      * contiguous segment in a parallel "strip-mining" fashion: lane<sub><em>i</em></sub>
      * reads items having stride \p WARP_THREADS between them. cub::BlockExchange
      * is then used to locally reorder the items into a
-     * [<em>blocked arrangement</em>](index.html#sec4sec3).
+     * [<em>blocked arrangement</em>](index.html#sec5sec3).
      *
      * \par Usage Considerations
      * - BLOCK_THREADS must be a multiple of WARP_THREADS
@@ -518,7 +518,7 @@ enum BlockLoadAlgorithm
 
 
 /**
- * \brief The BlockLoad class provides [<em>collective</em>](index.html#sec0) data movement methods for loading a linear segment of items from memory into a [<em>blocked arrangement</em>](index.html#sec4sec3) across a CUDA thread block.  ![](block_load_logo.png)
+ * \brief The BlockLoad class provides [<em>collective</em>](index.html#sec0) data movement methods for loading a linear segment of items from memory into a [<em>blocked arrangement</em>](index.html#sec5sec3) across a CUDA thread block.  ![](block_load_logo.png)
  * \ingroup BlockModule
  * \ingroup UtilIo
  *
@@ -533,17 +533,17 @@ enum BlockLoadAlgorithm
  *   to implement different cub::BlockLoadAlgorithm strategies.  This facilitates different
  *   performance policies for different architectures, data types, granularity sizes, etc.
  * - BlockLoad can be optionally specialized by different data movement strategies:
- *   -# <b>cub::BLOCK_LOAD_DIRECT</b>.  A [<em>blocked arrangement</em>](index.html#sec4sec3)
+ *   -# <b>cub::BLOCK_LOAD_DIRECT</b>.  A [<em>blocked arrangement</em>](index.html#sec5sec3)
  *      of data is read directly from memory.  [More...](\ref cub::BlockLoadAlgorithm)
- *   -# <b>cub::BLOCK_LOAD_VECTORIZE</b>.  A [<em>blocked arrangement</em>](index.html#sec4sec3)
+ *   -# <b>cub::BLOCK_LOAD_VECTORIZE</b>.  A [<em>blocked arrangement</em>](index.html#sec5sec3)
  *      of data is read directly from memory using CUDA's built-in vectorized loads as a
  *      coalescing optimization.    [More...](\ref cub::BlockLoadAlgorithm)
- *   -# <b>cub::BLOCK_LOAD_TRANSPOSE</b>.  A [<em>striped arrangement</em>](index.html#sec4sec3)
+ *   -# <b>cub::BLOCK_LOAD_TRANSPOSE</b>.  A [<em>striped arrangement</em>](index.html#sec5sec3)
  *      of data is read directly from memory and is then locally transposed into a
- *      [<em>blocked arrangement</em>](index.html#sec4sec3).  [More...](\ref cub::BlockLoadAlgorithm)
- *   -# <b>cub::BLOCK_LOAD_WARP_TRANSPOSE</b>.  A [<em>warp-striped arrangement</em>](index.html#sec4sec3)
+ *      [<em>blocked arrangement</em>](index.html#sec5sec3).  [More...](\ref cub::BlockLoadAlgorithm)
+ *   -# <b>cub::BLOCK_LOAD_WARP_TRANSPOSE</b>.  A [<em>warp-striped arrangement</em>](index.html#sec5sec3)
  *      of data is read directly from memory and is then locally transposed into a
- *      [<em>blocked arrangement</em>](index.html#sec4sec3).  [More...](\ref cub::BlockLoadAlgorithm)
+ *      [<em>blocked arrangement</em>](index.html#sec5sec3).  [More...](\ref cub::BlockLoadAlgorithm)
  *
  * \par A Simple Example
  * \blockcollective{BlockLoad}

diff --git a/cub/block/block_radix_rank.cuh b/cub/block/block_radix_rank.cuh
@@ -1,6 +1,6 @@
 /******************************************************************************
  * Copyright (c) 2011, Duane Merrill.  All rights reserved.
- * Copyright (c) 2011-2013, NVIDIA CORPORATION.  All rights reserved.
+ * Copyright (c) 2011-2014, NVIDIA CORPORATION.  All rights reserved.
  * 
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are met:
@@ -63,7 +63,7 @@ namespace cub {
  *
  * \par Usage Considerations
  * - Keys must be in a form suitable for radix ranking (i.e., unsigned bits).
- * - Assumes a [<em>blocked arrangement</em>](index.html#sec4sec3) of elements across threads
+ * - Assumes a [<em>blocked arrangement</em>](index.html#sec5sec3) of elements across threads
  * - \smemreuse{BlockRadixRank::TempStorage}
  *
  * \par Performance Considerations