DALI v0.30.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements.
- Optimized CPU resampling (#2540).
- Added the following mathematical expressions:
- Added the images argument for the COCOReader, which allows for the custom ordering of images and fixed a bug in the segmentation data parsing (#2548, #2597).
- Added support for the nvJPEG preallocate API for a batched hardware decoder (#2544).
- Added support surfaces with strides over 2G (#2600).
- Enabled CUDA 11.2 builds (#2553).
- Documentation improvements:
- Allowed DALI to be compiled with Clang (#2416).
- Added CUDA API checks in utility functions (#2517) and tests (#2516).
Fixed issues
- Fixed the autoreset option in the iterator for the DROP policy (#2567).
Improvements
- Make Nvjpeg2kTest more verbose (#2509)
- Compile DALI with Clang (#2416)
- Try to actually find the library instead of arbitrarily deciding it can't be there (#2511)
- Enable GDS for conda build by default (#2515)
- Pool memory resource (#2518)
- Add GTest Event Listener with CUDA validation after TEST (#2516)
- Disable GPU numpy reader test form sm < 6.0 (#2514)
- Mention WarpAffine in transforms.* documentation (#2527)
- Ops rework to prepare iter-to-iter batch size variability (#2408)
- Fix unchecked CUDA API calls in utility functions (#2517)
- Bump up nvidia-tensorflow version in tests (#2526)
- Cleanup warnings in CUDA code (#2523)
- Add debug info to RN50 pipeline (#2522)
- Add a supported matrix to the documentation (#2519)
- Add ArgValue utility (#2528)
- Remove pinning numpy version in TL1_ssd_training test (#2536)
- Remove unreachable return statement (#2541)
- Vectorize CPU resampling (#2540)
- Remove constraint on input type for RandomResizedCrop. Update tests. (#2549)
- Hide ArithmeticGenericOp doc and disallow bool (#2538)
- Support for nvJPEG preallocate API for batched HW decoder (#2544)
- Add exp and log math functions (#2555)
- Add COCOReader
files
arg support and fix bug in the segmentation data parsing (#2548) - Event pool (#2520)
- Rework random number generators. RNGBase operator template and NormalDistribution. (#2513)
- Enable CUDA 11.2 builds (#2553)
- Adjust range of tested log inputs (#2564)
- Add geometric transform tutorial. (#2530)
- Add synchronization after randomizer construction. (#2565)
- Move to the upstream version of paddle paddle (#2561)
- Move examples to fn api (#2566)
- Remove legacy API based nvJPEG decoder implementation (#2591)
- Support surfaces with strides over 2G (#2600)
- COCOReader
images
argument can be used to provide a custom order of images (#2597)
Bug fixes
- Fix build for Jetson platform (#2512)
- Fix aarch64 build errors (#2529)
- Fix broken uniform operator python tests (#2556)
- Fix Clang build (#2560)
- Fix Xavier test crash caused by NumPy faulty build (#2596)
- Fix autoreset option in iterator for DROP policy (#2567)
- Fix uniform distribution test expectations (#2589)
Breaking API changes
Deprecated features
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==0.30.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==0.30.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==0.30.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==0.30.0
Or use direct download links (CUDA 10.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda100/nvidia_dali_cuda100-0.30.0-1983576-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda100/nvidia-dali-tf-plugin-cuda100-0.30.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-0.30.0-1983575-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-0.30.0-1983575-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-0.30.0.tar.gz
FFmpeg source code:
Libsndfile source code: