rocPRIM v0.3.0
Pre-release
Pre-release
Milestone 3
All functions needed for Caffe2 and Tensorflow 1.3 are now finished. Optimizations are only selectively done, where the rest should arrive with milestone 4.
Done in milestones 1 and 2:
- Scan, reduce and sort algorithms (warp, block, device)
- Block and thread I/O primitives
- Block data exchange primitives
- Reduce-by-key, transform (device)
- Discontinuity algorithm (block)
Added in this milestone:
- Fancy iterators
- Segmented reduction, scan and sort (device)
- Select (copy if) and unique operations (device)
- Histogram algorithm (block, device)
- Run length encode algorithm (device)
Not yet finished:
- Partition algorithm (device)
- Comparison sort (warp, block, device), merge (device)