Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Commit

Permalink
Doc updates for new version
Browse files Browse the repository at this point in the history
Former-commit-id: 1cf8089
  • Loading branch information
dumerrill committed Feb 25, 2014
1 parent 1843f06 commit c60ec5b
Show file tree
Hide file tree
Showing 144 changed files with 592 additions and 308 deletions.
5 changes: 5 additions & 0 deletions CHANGE_LOG.TXT
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
//-----------------------------------------------------------------------------

1.2.0 02/25/2014
- New features:

//-----------------------------------------------------------------------------

1.1.1 12/11/2013
- New features:
- Added TexObjInputIterator, TexRefInputIterator, CacheModifiedInputIterator, and CacheModifiedOutputIterator types for loading & storing arbitrary types through the cache hierarchy. Compatible with Thrust API.
Expand Down
2 changes: 1 addition & 1 deletion LICENSE.TXT
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Copyright (c) 2010-2011, Duane Merrill. All rights reserved.
Copyright (c) 2011-2013, NVIDIA CORPORATION. All rights reserved.
Copyright (c) 2011-2014, NVIDIA CORPORATION. All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<hr>
<h3>About CUB</h3>

Current release: v1.1.1 (December 11, 2013)
Current release: v1.2.0 (February 25, 2014)

We recommend the [CUB Project Website](http://nvlabs.github.com/cub) and the [cub-users discussion forum](http://groups.google.com/group/cub-users) for further information and examples.

Expand Down Expand Up @@ -84,6 +84,7 @@ See [CUB Project Website](http://nvlabs.github.com/cub) for more information.
| Date | Version |
| ---- | ------- |
| 02/25/2014 | [CUB v1.2.0 Primary Release](https://github.com/NVlabs/cub/archive/1.2.0.zip) |
| 12/10/2013 | [CUB v1.1.1 Primary Release](https://github.com/NVlabs/cub/archive/1.1.1.zip) |
| 08/08/2013 | [CUB v1.0.1 Primary Release](https://github.com/NVlabs/cub/archive/1.0.1.zip) |
| 05/07/2013 | [CUB v0.9.4 Update Release](https://github.com/NVlabs/cub/archive/0.9.4.zip) |
Expand All @@ -104,7 +105,7 @@ CUB is available under the "New BSD" open-source license:
```
Copyright (c) 2010-2011, Duane Merrill. All rights reserved.
Copyright (c) 2011-2013, NVIDIA CORPORATION. All rights reserved.
Copyright (c) 2011-2014, NVIDIA CORPORATION. All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
Expand Down
12 changes: 6 additions & 6 deletions cub/block/block_discontinuity.cuh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
/******************************************************************************
* Copyright (c) 2011, Duane Merrill. All rights reserved.
* Copyright (c) 2011-2013, NVIDIA CORPORATION. All rights reserved.
* Copyright (c) 2011-2014, NVIDIA CORPORATION. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
Expand Down Expand Up @@ -61,7 +61,7 @@ namespace cub {
* \blockcollective{BlockDiscontinuity}
* \par
* The code snippet below illustrates the head flagging of 512 integer items that
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec3) across 128 threads
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec5sec3) across 128 threads
* where each thread owns 4 consecutive items.
* \par
* \code
Expand Down Expand Up @@ -274,7 +274,7 @@ public:
*
* \par
* The code snippet below illustrates the head-flagging of 512 integer items that
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec3) across 128 threads
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec5sec3) across 128 threads
* where each thread owns 4 consecutive items.
* \par
* \code
Expand Down Expand Up @@ -352,7 +352,7 @@ public:
*
* \par
* The code snippet below illustrates the head-flagging of 512 integer items that
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec3) across 128 threads
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec5sec3) across 128 threads
* where each thread owns 4 consecutive items.
* \par
* \code
Expand Down Expand Up @@ -445,7 +445,7 @@ public:
*
* \par
* The code snippet below illustrates the tail-flagging of 512 integer items that
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec3) across 128 threads
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec5sec3) across 128 threads
* where each thread owns 4 consecutive items.
* \par
* \code
Expand Down Expand Up @@ -524,7 +524,7 @@ public:
*
* \par
* The code snippet below illustrates the tail-flagging of 512 integer items that
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec3) across 128 threads
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec5sec3) across 128 threads
* where each thread owns 4 consecutive items.
* \par
* \code
Expand Down
10 changes: 5 additions & 5 deletions cub/block/block_exchange.cuh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
/******************************************************************************
* Copyright (c) 2011, Duane Merrill. All rights reserved.
* Copyright (c) 2011-2013, NVIDIA CORPORATION. All rights reserved.
* Copyright (c) 2011-2014, NVIDIA CORPORATION. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
Expand Down Expand Up @@ -60,10 +60,10 @@ namespace cub {
* yet most block-wide operations prefer a "blocked" partitioning of items across threads
* (where consecutive items belong to a single thread).
* - BlockExchange supports the following types of data exchanges:
* - Transposing between [<em>blocked</em>](index.html#sec4sec3) and [<em>striped</em>](index.html#sec4sec3) arrangements
* - Transposing between [<em>blocked</em>](index.html#sec4sec3) and [<em>warp-striped</em>](index.html#sec4sec3) arrangements
* - Scattering ranked items to a [<em>blocked arrangement</em>](index.html#sec4sec3)
* - Scattering ranked items to a [<em>striped arrangement</em>](index.html#sec4sec3)
* - Transposing between [<em>blocked</em>](index.html#sec5sec3) and [<em>striped</em>](index.html#sec5sec3) arrangements
* - Transposing between [<em>blocked</em>](index.html#sec5sec3) and [<em>warp-striped</em>](index.html#sec5sec3) arrangements
* - Scattering ranked items to a [<em>blocked arrangement</em>](index.html#sec5sec3)
* - Scattering ranked items to a [<em>striped arrangement</em>](index.html#sec5sec3)
*
* \par A Simple Example
* \blockcollective{BlockExchange}
Expand Down
2 changes: 1 addition & 1 deletion cub/block/block_histogram.cuh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
/******************************************************************************
* Copyright (c) 2011, Duane Merrill. All rights reserved.
* Copyright (c) 2011-2013, NVIDIA CORPORATION. All rights reserved.
* Copyright (c) 2011-2014, NVIDIA CORPORATION. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
Expand Down
32 changes: 16 additions & 16 deletions cub/block/block_load.cuh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
/******************************************************************************
* Copyright (c) 2011, Duane Merrill. All rights reserved.
* Copyright (c) 2011-2013, NVIDIA CORPORATION. All rights reserved.
* Copyright (c) 2011-2014, NVIDIA CORPORATION. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
Expand Down Expand Up @@ -441,7 +441,7 @@ enum BlockLoadAlgorithm
/**
* \par Overview
*
* A [<em>blocked arrangement</em>](index.html#sec4sec3) of data is read
* A [<em>blocked arrangement</em>](index.html#sec5sec3) of data is read
* directly from memory. The thread block reads items in a parallel "raking" fashion: thread<sub><em>i</em></sub>
* reads the <em>i</em><sup>th</sup> segment of consecutive elements.
*
Expand All @@ -454,7 +454,7 @@ enum BlockLoadAlgorithm
/**
* \par Overview
*
* A [<em>blocked arrangement</em>](index.html#sec4sec3) of data is read directly
* A [<em>blocked arrangement</em>](index.html#sec5sec3) of data is read directly
* from memory using CUDA's built-in vectorized loads as a coalescing optimization.
* The thread block reads items in a parallel "raking" fashion: thread<sub><em>i</em></sub> uses vector loads to
* read the <em>i</em><sup>th</sup> segment of consecutive elements.
Expand All @@ -476,13 +476,13 @@ enum BlockLoadAlgorithm
/**
* \par Overview
*
* A [<em>striped arrangement</em>](index.html#sec4sec3) of data is read
* A [<em>striped arrangement</em>](index.html#sec5sec3) of data is read
* directly from memory and then is locally transposed into a
* [<em>blocked arrangement</em>](index.html#sec4sec3). The thread block
* [<em>blocked arrangement</em>](index.html#sec5sec3). The thread block
* reads items in a parallel "strip-mining" fashion:
* thread<sub><em>i</em></sub> reads items having stride \p BLOCK_THREADS
* between them. cub::BlockExchange is then used to locally reorder the items
* into a [<em>blocked arrangement</em>](index.html#sec4sec3).
* into a [<em>blocked arrangement</em>](index.html#sec5sec3).
*
* \par Performance Considerations
* - The utilization of memory transactions (coalescing) remains high regardless
Expand All @@ -496,13 +496,13 @@ enum BlockLoadAlgorithm
/**
* \par Overview
*
* A [<em>warp-striped arrangement</em>](index.html#sec4sec3) of data is read
* A [<em>warp-striped arrangement</em>](index.html#sec5sec3) of data is read
* directly from memory and then is locally transposed into a
* [<em>blocked arrangement</em>](index.html#sec4sec3). Each warp reads its own
* [<em>blocked arrangement</em>](index.html#sec5sec3). Each warp reads its own
* contiguous segment in a parallel "strip-mining" fashion: lane<sub><em>i</em></sub>
* reads items having stride \p WARP_THREADS between them. cub::BlockExchange
* is then used to locally reorder the items into a
* [<em>blocked arrangement</em>](index.html#sec4sec3).
* [<em>blocked arrangement</em>](index.html#sec5sec3).
*
* \par Usage Considerations
* - BLOCK_THREADS must be a multiple of WARP_THREADS
Expand All @@ -518,7 +518,7 @@ enum BlockLoadAlgorithm


/**
* \brief The BlockLoad class provides [<em>collective</em>](index.html#sec0) data movement methods for loading a linear segment of items from memory into a [<em>blocked arrangement</em>](index.html#sec4sec3) across a CUDA thread block. ![](block_load_logo.png)
* \brief The BlockLoad class provides [<em>collective</em>](index.html#sec0) data movement methods for loading a linear segment of items from memory into a [<em>blocked arrangement</em>](index.html#sec5sec3) across a CUDA thread block. ![](block_load_logo.png)
* \ingroup BlockModule
* \ingroup UtilIo
*
Expand All @@ -533,17 +533,17 @@ enum BlockLoadAlgorithm
* to implement different cub::BlockLoadAlgorithm strategies. This facilitates different
* performance policies for different architectures, data types, granularity sizes, etc.
* - BlockLoad can be optionally specialized by different data movement strategies:
* -# <b>cub::BLOCK_LOAD_DIRECT</b>. A [<em>blocked arrangement</em>](index.html#sec4sec3)
* -# <b>cub::BLOCK_LOAD_DIRECT</b>. A [<em>blocked arrangement</em>](index.html#sec5sec3)
* of data is read directly from memory. [More...](\ref cub::BlockLoadAlgorithm)
* -# <b>cub::BLOCK_LOAD_VECTORIZE</b>. A [<em>blocked arrangement</em>](index.html#sec4sec3)
* -# <b>cub::BLOCK_LOAD_VECTORIZE</b>. A [<em>blocked arrangement</em>](index.html#sec5sec3)
* of data is read directly from memory using CUDA's built-in vectorized loads as a
* coalescing optimization. [More...](\ref cub::BlockLoadAlgorithm)
* -# <b>cub::BLOCK_LOAD_TRANSPOSE</b>. A [<em>striped arrangement</em>](index.html#sec4sec3)
* -# <b>cub::BLOCK_LOAD_TRANSPOSE</b>. A [<em>striped arrangement</em>](index.html#sec5sec3)
* of data is read directly from memory and is then locally transposed into a
* [<em>blocked arrangement</em>](index.html#sec4sec3). [More...](\ref cub::BlockLoadAlgorithm)
* -# <b>cub::BLOCK_LOAD_WARP_TRANSPOSE</b>. A [<em>warp-striped arrangement</em>](index.html#sec4sec3)
* [<em>blocked arrangement</em>](index.html#sec5sec3). [More...](\ref cub::BlockLoadAlgorithm)
* -# <b>cub::BLOCK_LOAD_WARP_TRANSPOSE</b>. A [<em>warp-striped arrangement</em>](index.html#sec5sec3)
* of data is read directly from memory and is then locally transposed into a
* [<em>blocked arrangement</em>](index.html#sec4sec3). [More...](\ref cub::BlockLoadAlgorithm)
* [<em>blocked arrangement</em>](index.html#sec5sec3). [More...](\ref cub::BlockLoadAlgorithm)
*
* \par A Simple Example
* \blockcollective{BlockLoad}
Expand Down
4 changes: 2 additions & 2 deletions cub/block/block_radix_rank.cuh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
/******************************************************************************
* Copyright (c) 2011, Duane Merrill. All rights reserved.
* Copyright (c) 2011-2013, NVIDIA CORPORATION. All rights reserved.
* Copyright (c) 2011-2014, NVIDIA CORPORATION. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
Expand Down Expand Up @@ -63,7 +63,7 @@ namespace cub {
*
* \par Usage Considerations
* - Keys must be in a form suitable for radix ranking (i.e., unsigned bits).
* - Assumes a [<em>blocked arrangement</em>](index.html#sec4sec3) of elements across threads
* - Assumes a [<em>blocked arrangement</em>](index.html#sec5sec3) of elements across threads
* - \smemreuse{BlockRadixRank::TempStorage}
*
* \par Performance Considerations
Expand Down
Loading

0 comments on commit c60ec5b

Please sign in to comment.