Releases: NVIDIA/DALI
DALI v0.19.0
Bug fixes
- Update examples with COCO data set and fix reader behavior for padding (#1557)
- Fix TensorFlow dataset test (#1641)
- Fix typo in QNX cmake files (#1648)
- Remove allocation-dependent test assert (#1650)
- Fix several explicit "something is implicitly deleted" warnings (#1652)
- Fix formatting of the example in the FW iterators docs (#1649)
- Fix hang in decoder benchmark (#1672)
- Fix error message (#1680)
- Fix torch stream initialization in TorchPythonFunction (#1681)
- Fix multi-channel fill value check in Erase operator (#1675)
- Tests fix after examples refactor (#1687)
- Fix Reshape docstring typo (#1691)
- Add synchronization to read/write operations in image decoder cache (#1702)
- Fix Buffer linkage and Reshape bug (#1714)
- Fix TL1 tests (#1710)
- Fix Pad operator bug (#1713)
Improvements
- Allow Crop and CropMirrorNormalize to crop sequences as if they were volumetric images (#1605)
- Erase CPU operator (#1609)
- Improved Reshape (#1634)
- Add GetDimIndices utility to tensor_layout.h (#1640)
- Add example with booleans, comparisons, bitwise and muxing (#1631)
- Remove unimplemented scale parameter in ops.VideoReader. (#1658)
- Change ambiguous
here
in docs developer version (#1657) - Docs layout and navigation changes (#1635)
- GPU PythonFunction operator (#1655)
- Rename Tensor to TensorList in Supported Ops doc (#1661)
- Add Pad CPU operator (including aligned padded shape support) (#1642)
- Remove the ColorTwist deprecation message (#1646)
- Change PipelineAPIType to Enum (#1636)
- Directional reductions (for CPU) - mean standard deviation, sum, mean square; with tree reduction. (#1653)
- Add support to UINT8 data type in SequenceWrapper (#1643)
- Moving operators around. (#1667)
- Normalize CPU vol 2 (#1666)
- GPU PyTorch operator (#1662)
- Proposing new structure of DALI examples (#1540)
- VideoReader example (#1612)
- MovingMeanSquared kernel (#1668)
- Allow extra dimensions with extent 1 in Spectrogram operator & AudioDecoder changes (#1679)
- Make DataIter a base class for MXNet DALIGenericIterator (#1669)
- Add Transpose CPU Operator (#1677)
- Remove not supported python versions from manylinux build (#1694)
- Add deprecation message about CUDA 9 (#1684)
- Mitigate the OS file-max limit in the VideoReader (#1659)
- Adds support to StopIteration raised inside framework iterators (#1625)
- Enable FFTS builds for ARM (Xavier, QNX) (#1686)
- Normalize operator for CPU backend (#1670)
- Python operator notebook (#1685)
- Change backend_impl at to getitem - return TensorXPU (#1682)
- Normalize tutorial (#1697)
- Adjust setup_packages.py to the latest pip version (#1698)
- Remove gif as supported extension (#1700)
- Making "Supported backend" title in docs appear correctly
- Update supported TF versions, update setup_packages.py (#1693)
- Add pass-through info to OpSchema to add shared data to stage outputs. (#1707)
- Nonsilence operator (#1701)
- Constant operator and Python wrapper. (#1699)
- Add support in CropMirrorNormalize for uneven sizes of mean and std (#1708)
- Shrink host buffers (#1712)
- Move pipeline ownership from Dataset to Iterator (#1704)
- Align Rn50 data processing pipeline for TensorFlow with upstream examples (#1706)
- Add a note how to set DALI_EXTRA_PATH to run jupyter examples (#1703)
- Gpu python operator notebook (#1715)
- Update
Memory consumption
andCustom operator
docs sections (#1719) - Use prebuild cupy for TL0_jupyter test (#1728)
Breaking API changes
None
Deprecated feature
- CUDA 9 support will end in several releases (#1684)
- Access to Tensors of TensorListCPU and TensorListGPU with at was replaced by array subscript operator. (#1682)
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- DALI TensorFlow plugin may not be compatible with TensorFlow versions 1.15.0 and/or later. If the user wants to use DALI with TensorFlow version which doesn’t have prebuilt plugin binary shipped with DALI it requires the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc, 4.8.5 or 5.4, depending on the particular version) is present on the system.
- Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.19.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.19.0
Or use direct download links (CUDA 9.0):
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.19.0-1119076-cp35-cp35m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.19.0-1119076-cp36-cp36m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.19.0-1119076-cp37-cp37m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.19.0.tar.gz
Or use direct download links (CUDA 10.0):
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.19.0-1119077-cp35-cp35m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.19.0-1119077-cp36-cp36m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.19.0-1119077-cp37-cp37m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.19.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v0.18.0
Bug fixes
- Fix setup_packages.py for CUDA versions that are not listed explicitly (#1554)
- Fix problem with TensorFlow and cupy tests (#1568)
- Fix ToContiguousXXX for more than 2 inputs. (#1572)
- Use prebuild cupy in tests (#1570)
- Fix a race condition in GetGPUAllocator (#1575)
- Use different stream base for different videos. (#1592)
- Fixing numpy version to 1.17.0 to avoid error in pycocotools/cocoeval due to implicit conversion from float64 to integer (#1618)
- Formatting fix. (#1597)
- Fix Transpose operator for batch size 1 as well as 1 channel images (#1624)
- Fix static analysis problems (#1559)
- Fix check if resampling is needed in audio decoder. (#1630)
- Temporary fix due to missing PILLOW_VERSION symbol when using torchvision (#1626)
Improvements
- Add support for Unary Ops: + and - (#1392)
- Improve support for labels in VideoReader. (#1500)
- Bump up Protobuff version to the latest one (#1543)
- Add comparison operators and bool handling in arithmetic ops (#1541)
- Cleanup formatting of Supported Operations (#1578)
- Bump up protobuf and libturbo-jpeg version in aarch64-linux and qnx build, fix libsnd dependency (#1573)
- Update PR template (#1571)
- Add an ability to return a duplicated outputs from the DALI pipeline (#1556)
- Add explicit call docstring, fix Supported backends (#1547)
- Add DCT 1D CPU kernel (#1569)
- Bump protobuf version in docs (#1586)
- Add interdoc link to define_graph, fix note (#1590)
- Split Expression Factory into separate translation units (#1587)
- Add bitwise operators: &, |, ^ (#1594)
- Resampling decoder (#1582)
- Extract windows GPU (#1538)
- Remove old PythonFunction implementation (#1585)
- Mock imports when building docs where possible (#1593)
- Load libnvcuvid before we test if cuvidReconfigureDecoder symbol exists (#1591)
- Bump protobuf version in conda build (#1606)
- Update VideoReader testcase, use nvmlSystemGetDriverVersion (#1617)
- Name the dataloader shuffling seed (#1621)
- Add docs for arithmetic expressions (#1600)
- Add data source info to error message in TFRecord and Caffe parsers (#1620)
- Remove the need to have GPU available when DALI is just imported (#1601)
- MFCC CPU operator (#1577)
- Update CUDA version detection for Conda (#1629)
Breaking API changes
- Python 2.7 is no longer available. To stay up-to-date with DALI, upgrade to Python 3.5 or later.
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- DALI TensorFlow plugin may not be compatible with TensorFlow versions 1.15.0 and/or later. If the user wants to use DALI with TensorFlow version which doesn’t have prebuilt plugin binary shipped with DALI it requires the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc, 4.8.5 or 5.4, depending on the particular version) is present on the system.
- Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.18.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.18.0
Or use direct download links (CUDA 9.0):
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.18.0-1062352-cp35-cp35m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.18.0-1062352-cp36-cp36m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.18.0-1062352-cp37-cp37m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.18.0.tar.gz
Or use direct download links (CUDA 10.0):
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.18.0-1062351-cp35-cp35m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.18.0-1062351-cp36-cp36m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.18.0-1062351-cp37-cp37m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.18.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v0.17.0
Bug fixes
- Fix scalar batch handling in arithmetic ops (#1449)
- Coverity fixes (#1408)
- Fix removal of device_id initialization in OF (#1459)
- Static analysis fixes (#1469)
- Fix start index function (#1482)
- Add missing dependencies to conda recipe (#1483)
- Fix for bundle-wheel.sh (#1499)
- More of static analysis fixes (#1496)
- Fix race between consecutive invocations of stage, reduce number of events (#1493)
- Fixes ExternSource for the GPU (#1452)
- Fix pip package discovery (#1534)
- Wait for thread pool to finish work in BrightnessConstrast (#1549)
- Fix doc string (#1546)
- Fix color operators. (#1555)
- Fix color operators even more (#1558)
- Fix stream usage in HSV and BrighnessContrast. (#1566)
- Fix problem with TensorFlow and cupy tests (#1568)
Improvements
- Add favicon to docs (#1453)
- Resampling ND - ground work (#1366)
- Warp 3D (#1442)
- Add sequence and 3D support in flip operator (#1439)
- Make thread pinning optional in the mixed ImageDecoder (#1465)
- Improve accuracy of 3D rotation (#1466)
- Add ability to read LMDB without any labels stored inside (#1440)
- AudioDecoder for WAV format (#1447)
- Add support for PaddlePaddle (#1371)
- Update docs for
fill_last_batch
parameter to match the real behavior (#1479) - Remove used requirement from paddle SSD demo docs (#1486)
- FFT CPU 1D implementation (based on ffts) (#1446)
- Utilize libcudart.so version to detect the CUDA toolkit version (#1477)
- Allow for more verbose Pipeline's graph logging (#1487)
- CMake switch for audio support (#1480)
- Add polygons mask support to COCOReader (#1455)
- Change TF versions supported by dataset (#1492)
- Additional deps for AudioDecoder (#1485)
- Add ExtractWindows CPU kernel (#1461)
- Add MNIST TensorFlow test (#1467)
- Remove deprecated edge.py (#1498)
- Add PowerSpectrum CPU operator (#1460)
- Add Spectrogram CPU Operator (#1468)
- Add MNIST examples (#1491)
- Add notebooks with example usage of arithmetic ops (#1438)
- Add ToDecibels CPU kernel (#1516)
- Adding librosa dependency to qa/TL1_jupyter_plugins/test.sh (#1517)
- Fix Keras GPU example (#1520)
- Preemphasis operator (#1515)
- Fix for WaitForWork in Preemphasis (#1523)
- AudioDecoder operator (#1481)
- Lower the accuracy threshold for paddle RN50 test (limited to 25 epochs only) (#1528)
- Remove cache options from fused ImageDecoder documentation (#1495)
- Add ToDecibels CPU operator (#1518)
- Add deprecation warning for Python 2.7 (#1521)
- Split tests per framework if possible (#1519)
- Add zlib dependency warning to libtiff build step (#1530)
- Rephrase supported backends documentation (#1497)
- Extend supported ops doc to include info about volumetric data. (#1531)
- Disable clamping when converting from bool (#1536)
- Add adobe analytics tracking script into docs (#1539)
- ColorTwist operator cleanup (#1532)
- NormalDistribution operator (#1529)
- Hide the docs for internal operators (#1542)
- MelFilterBank CPU kernel (#1522)
- Disables cupy test for python 2.7 (#1544)
- Boundary condition handling (#1552)
- Add spaces in Python 2.7 end of life warning (#1553)
- Add MelFilterBank CPU operator (#1535)
- Add more formats to FileReader (#1561)
- Make the presence of unique visitor script counting optional in docs (#1560)
- Adjust color ops; make contrast-neutral gray configurable (#1562)
Breaking API changes
- DALI 0.17 is the last official release for Python 2.7, which reaches the end of life on January 1st, 2020. To stay up to date with DALI, please upgrade to Python 3.5 or later.
- The
asCPU
method is no longer available and has been replaced withas_cpu
. - ColorTwist operator was deprecated and replaced by BrightnessContrast and HSV operators cleanup (#1532)
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- DALI TensorFlow plugin may not be compatible with TensorFlow versions 1.15.0 and/or later. If the user wants to use DALI with TensorFlow version which doesn’t have prebuilt plugin binary shipped with DALI it requires the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc, 4.8.5 or 5.4, depending on the particular version) is present on the system.
- Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.17.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.17.0
Or use direct download links (CUDA 9.0):
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.17.0-1030352-cp27-cp27mu-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.17.0-1030352-cp35-cp35m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.17.0-1030352-cp36-cp36m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.17.0-1030352-cp37-cp37m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.17.0.tar.gz
Or use direct download links (CUDA 10.0):
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.17.0-1030354-cp27-cp27mu-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.17.0-1030354-cp35-cp35m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.17.0-1030354-cp36-cp36m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.17.0-1030354-cp37-cp37m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.17.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v0.16.0
Bug fixes
- Fix DALI TF plugin CXX11 ABI issue (#1361)
- Fix DALI TF installation for TF 2.0 (#1386)
- Fix Pad op default fill_value and axes (#1410)
- Fix Tensorflow examples for TF 2.0 (#1420)
- Fix input tiling in arithmetic ops (#1426)
- Fix link error in debug mode. (#1429)
- Fix RN50 MXNet TL3 test (#1424)
- Fix scalar batch handling in arithmetic ops (#1449)
Improvements
- Rearrange docker images (#1333)
- GTest naming in STYLE_GUIDE (#1330)
- Add 3D case to shape layout verification in CropAttr (#1344)
- Add fallback to host when nvjpegJpegStreamParse fails (#1335)
- Surface2D -> ND generalization (#1348)
- Add multichannel (C>3) pipeline tests (#1219)
- Improve
last_batch_padded
andRunning DALI pipeline
docs (#1351) - Undo pytorch download changes (#1353)
- Provide prebuilt plugins for manylinux2010 based pip packages (#1346)
- Clean include file depenedencies (#1362)
- Add warning if avformat_open_input fails (#1363)
- Workaround for a segfault in NVCC 9 with (#1365)
- HSV manipulation operator for GPU & CPU (#1338)
- Backend implementation for binary arithmetic Operator (#1322)
- Add skip_vfr_check option to VideoReader (#1367)
- Support float16 in Cast GPU operator (#1368)
- Add implementation of BmpImage::PeekShapeImpl, including number of channels (#1332)
- Add Vp9 codec support (#1331)
- Add torch dependency to TL1_separate_executor (#1373)
- Add TF Dataset GPU (#1354)
- Add ability to cross compile ldmb (#1374)
- Move Tensor(List)Shape, Tensor(List)View to dali/core (#1341)
- Relax check for libnvidia-opticalflow is test script. (#1381)
- Disable Vp9 tests temporarily (#1383)
- Make it possible to build DALI with any CUDA version (#1345)
- Add multigpu TF dataset test (#1382)
- Generalize helper code to unary inputs (#1379)
- Force inline and affine transformation (#1389)
- GPU dltensor operator (#1261)
- Enhance Slice API to specify axes represented in the arguments (#1336)
- Allow default compiler build if TF compiler version is unknown (#1396)
- NewWarpAffine -> WarpAffine; optimize CPU warp for affine mapping. (#1387)
- Allow build DALI for different architectures as well (#1397)
- Remove PyTorch iterator double buffering (#1399)
- Improve wording for PREBUILD_TF_PLUGINS option (#1407)
- Move builtin operators to dali/pipeline. (#1406)
- Enhance CaffeReader and Caffe2Reader to support multiple LMDB files (#1360)
- Expose arithm ops in Python (#1355)
- Add Pad operator (#1180)
- Enable CUDA 10 compatibility layer for Conda build (#1339)
- Enforce crop argument minimum size (#1401)
- Rotate operator using Warp kernel (#1403)
- Allow empty lists in arguments (#1413)
- Add missing license in python tests (#1412)
- Support TF 1.15 and 2.0 in tests (#1400)
- Fix DALIDataType enum in Python (#1419)
- BrightnessContrast operator example (#1414)
- Add additional_decode_surfaces parameter to videoreader (#1393)
- CPU argument input (#1423)
- Add support for Constant inputs and type-erased tiles (#1391)
- Support TF v2.0 in jupyter examples (#1425)
- Limit number of Input/Output type combinations in Slice kernel family (#1418)
- Add TF 1.15 and 2.0 support for TF dataset (#1395)
- New warp example + minor fixes (#1158)
- Add initial support for constants in python API (#1421)
Breaking API changes
- DALI 0.17 is the last official release for Python 2.7, which reaches the end of life on January 1st, 2020. To stay up to date with DALI, please upgrade to Python 3.5 or later.
- Removed the following deprecated operators:
- Crop, CropMirrorNormalize and Slice operator possible output types are limited to one of uint8_t, int16_t, uint16_t, int32_t, float, float16 or passing through the input type (#1418).
- Move dali/pipeline/operators to dali/operators (#1380)
- DALI library modularization (#1384)
- CPU argument input (#1423)
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- DALI TensorFlow plugin may not be compatible with TensorFlow versions 1.15.0 and/or later. If the user wants to use DALI with TensorFlow version which doesn’t have prebuilt plugin binary shipped with DALI it requires the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc, 4.8.5 or 5.4, depending on the particular version) is present on the system.
- Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.16.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.16.0
Or use direct download links (CUDA 9.0):
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.16.0-982179-cp27-cp27mu-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.16.0-982179-cp35-cp35m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.16.0-982179-cp36-cp36m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.16.0-982179-cp37-cp37m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.16.0.tar.gz
Or use direct download links (CUDA 10.0):
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.16.0-982180-cp27-cp27mu-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.16.0-982180-cp35-cp35m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.16.0-982180-cp36-cp36m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.16.0-982180-cp37-cp37m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.16.0.tar.gz
FFmpeg source code:
DALI v0.15.0
Bug fixes
- Fix Transpose operator when data shape with dimension of size 1 (#1244)
- Fix DALI_Extra clone (#1276)
- Fix conda check in DALI TF installation script (#1284)
- Fix problems with seeking when stream start_time is != 0. (#1287)
- Fix TypeTable initialization (#1321)
- Fix CropMirrorNormalize compilation with GCC 8 (#1320)
- Suppress warning when FileReader encounters dot and dot-dot entries (#1318)
- Fix the wrong usage of find_library when searching for FFmpeg libs (#1317)
- Fix last_batch_padded docs (#1314)
- Fix pytorch download url (#1334)
- Undo pytorch download changes (#1353)
- Fix DALI TF plugin CXX11 ABI issue (#1361)
- Add torch dependency to TL1_separate_executor (#1373)
- Fix DALI TF installation for TF 2.0 (#1386)
- Relax check for libnvidia-opticalflow is test script. (#1381)
Improvements
- Replace std::pair alias with actual type (#1248)
- Add support for volumetric (i.e. 3D) crop (depth, height and width) (#1210)
- Refactor storage type specialization for operator aguments (#1245)
- CPU DLTensor Operator (#1233)
- Change Outputs and SharedOuputs return type to tuple (#1243)
- Add non_blocking option to CopyToExternalTensor (#1254)
- Improve heuristic for variable frame rate detection (#1242)
- Add pipeline validation (#1267)
- Add lookup table operator (#1251)
- make_string for arguments, which have
operator<<
(#1174) - Tensor layout (#1237)
- Rework Support Ops to use TensorList (#1259)
- Improve logic in DALI TF plugin installation (support conda installation use case) (#1271)
- size_t -> int for vec, mat, box etc... (#1277)
- ImageDecoder libtiff implementation (#1264)
- Add check for OF support (#1278)
- ImageDecoder libtiff implementation (types.ANY_DATA, YCbCr, ImageDims to TensorShape) (#1280)
- Handle nchannels>3 in ImageDecoder (#1285)
- Use alternative compiler (e.g. g++-5.4) when available (#1290)
- Add support for UCF-101 dataset and upgrade ffmpeg version from 3.4.2 to 4.2 (#1241)
- Add info about libtiff dependency in the documentation (#1294)
- Check whether random row access is allowed in libtiff based decoder implementation (#1295)
- Make cspan (#1298)
- BrightnessContrast operator (#1188)
- Parse number of channels in PNGImage::PeekShape (#1288)
- Add support for decoding multiple resolution videos in the same pipeline. (#1144)
- Conda recipe: Point to local git repository for build source, relax version dependencies and use on conda-forge for some dependencies (#1303)
- TiffImage::PeekShapeImpl parse and return number of channels (#1304)
- Introduce byte_io.h including byte sequence reading utils (ReadValueBE and ReadValueLE) (#1310)
- Add parsing of number of channels in JpegImage::PeekShapeImpl (#1306)
- Layout refactor (#1250)
- Add CMake VERBOSE_LOGS switch (#1319)
- Add BMP tests (#1316)
- Make DALI_extra repo path settable from the env (#1323)
- Linear transformation GPU kernel (#1262)
- Use DALI_extra images in more tests (#1177)
- Reshape op (#1327)
- Add tf dataset (#1299)
- Adjust QA scripts remove installing pip whl from direct links as pip will disregard the "-f" option in that case (#1328)
- Add CropMirrorNormalize 3D support (#1326)
- Add layout handling to Transpose operator (#1329)
- Add shape layout input to crop window generator signature (#1340)
- Linear Transformation kernel for CPU (#1300)
- Rearrange docker images (#1333)
- Provide prebuilt plugins for manylinux2010 based pip packages (#1346)
- Add 3D case to shape layout verification in CropAttr (#1344)
Breaking API changes
- Change Outputs and SharedOuputs return type to tuple (#1243)
Known issues:
-
The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
-
DALI TensorFlow plugin may not be compatible with TensorFlow versions 1.15.0 and/or later. If the user wants to use DALI with TensorFlow version which doesn’t have prebuilt plugin binary shipped with DALI it requires the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc, 4.8.5 or 5.4, depending on the particular version) is present on the system.
Binary builds
Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.15.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.15.0
Or use direct download links (CUDA 9.0):
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.15.0-947078-cp27-cp27mu-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.15.0-947078-cp35-cp35m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.15.0-947078-cp36-cp36m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.15.0-947078-cp37-cp37m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.15.0.tar.gz
Or use direct download links (CUDA 10.0):
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.15.0-947079-cp27-cp27mu-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.15.0-947079-cp35-cp35m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.15.0-947079-cp36-cp36m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.15.0-947079-cp37-cp37m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.15.0.tar.gz
FFmpeg source code:
DALI v0.14.0
Bug fixes
- Fix fp16 bug from #1129 and add fp16 test case (#1160)
- Fix framework iterators behavior when iter_setup raises StopIteration (#1136)
- Fix nvjpeg legacy API (#1179)
- Attempt different driver urls in setup_test_common.sh (#1193)
- fix nightly bug in video reader (#1194)
- Fix conversions to int64 / uint64. (#1205)
- Attempt to fix issue with tf plugin install and gcc 4.8 (#1214)
- Fix PyTorch spelling (#1230)
Improvements
- BrightnessContrast CUDA kernels (#1142)
- Adjust Operator::Run to take reference instead of pointer (#1168)
- Add a STYLE_GUIDE for DALI, adjust Kernel example (#1167)
- Extend external source operator capacity (#1127)
- Make
Deallocate
public API (#1182) - Remove .cpu function (#1181)
- Allow
stream()
to be called for every Workspace (#1178) - Improve error messages for file_list arg problems in FileReader (#1184)
- Add multi gpu python notebook (#1186)
- HSV Kernel for CPU (#1187)
- Adjust CropMirrorNormalize to Setup API (#1140)
- Expose tensor as dlpack (#1154)
- Add const noexcept qualifiers to IsContiguous. (#1211)
- ROI utils (#1189)
- Add qa test for multi gpu example (#1202)
- Add support for 3d shapes in crop window (#1207)
- DALI for aarch64-QNX platform (#522)
- Unified naming for float16 type. (#1212)
- Add types to DALIDataType that were missing (#1213)
- CPU warp, with tests. (#1159)
- Conda Recipe for DALI (#1156)
- Update file reader doc (#1222)
- Track DALI_extra version in DALI (#1229)
- Add Shapes operator returning sample shapes. (#1223)
- New Warp operator (#1153)
Breaking API changes
- Remove .cpu function (#1181)
- Adjust Operator::Run to take reference instead of pointer (#1168)
- Extend external source operator capacity (#1127) - it now requires input to be set for every iteration
- Unified naming for float16 type. (#1212)
Known issues:
- New Video reader operator requires NVIDIA VIDEO CODEC SDK support in the platform. NVIDIA GPU Cloud (NGC) optimized containers lacks this functionality in the default configuration prior to 19.01. To enable it please run the container with the ‘video’ capability enabled, ie.:
-e "NVIDIA_DRIVER_CAPABILITIES=compute,utility,video" - The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
Binary builds
Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.14.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.14.0
Or use direct download links (CUDA 9.0):
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.14.0-888827-cp27-cp27mu-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.14.0-888827-cp35-cp35m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.14.0-888827-cp36-cp36m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.14.0-888827-cp37-cp37m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.14.0.tar.gz
Or use direct download links (CUDA 10.0):
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.14.0-888828-cp27-cp27mu-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.14.0-888828-cp35-cp35m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.14.0-888828-cp36-cp36m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.14.0-888828-cp37-cp37m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.14.0.tar.gz
FFmpeg source code:
DALI v0.13.0
Bug fixes
- Upgrade PyTorch to 1.2, TorchVison to 0.4 (#1155)
- Add use_batched_decode argument to nvJPEGDecoder API (only for legacy nvJPEGDecoder implementation) (#1151)
- Make loading of the versioned libnvidia-opticalflow.so the primary path (#1147)
- Fix tests that are not using prolog/epilog functions (#1143)
- Provide default initialization for scratch sizes in KernelRequiements. (#1141)
- Fix coco loader (#1135)
- Fix GET_PROC_EX macro (#1128)
- Fix typo in installation doc (#1126)
- Fix capitalization in docs for
docker
dir (#1122) - Fix pipeline serialization/deserialization for logical_id (#1121)
- Make use right PyTorch capitalization everywhere (#1119)
- Fix Gluon example that mixes simple and iterator DALI API (#1117)
- Fix lint in ../dali/pipeline/operators/reader/loader/loader.h (#1113)
- Fix float16 support in DALI TensorFlow plugin (#1086)
- Fix python operator with side effects. (#1105)
- Fix warning (#1061)
- Fix test header inclusion (#1100)
- Make
dali_kernel_test_lib
respectBUILD_TEST
(#1101) - Fix a race condtion in async pipeline executor (#1103)
- Typo fixed in getting started notebook (#1091)
- Reduced batch size to avoid out of memory condition in 19.07 container. (#1089)
- Fix error of indexing shape in Optical Flow (#1087)
- Disable video_reader_op test when we disable NVDEC (#1077)
- Add video error message (#1067)
- Fix sampling of chroma in the VideoReader op (#1054)
- Fix detection pipeline example (#1055)
- Fix fp16 bug from #1129 and add fp16 test case (#1160)
Improvements
- Adjust customdummy plugin in Docs to new API (#1150)
- Add
view
overload to get TensorListView from TensorVector. (#1152) - Warp kernels (#1063)
- Add Setup API to Operator (#1045)
- Input & output TYPED_TEST (#1133)
- Refactor SliceFlipNormalizePermutPad (super)kernel (#1129)
- Add virtual env and conda test case for DALI TF plugin (#1107)
- Add test for water operator (#1075)
- BrightnessContrast kernel first implementation (#1060)
- Add
default_cuda_stream_priority
documentation (#1131) - Fast coco reader (#1098)
- Optimize docker images building(#1053)
- Remove explicit Multiple Input Sets handling from C++ Backend (#1088)
- Document pre-built WML CE packages in Installation docs (#1124)
- Upgrade VideoCodecSDK to 9.0.20 (#1120)
- UniformRandomFill for unified storage (#1070)
- Calculation layout setup for GPU kernels. (#1106)
- Rework multiple input sets API (#1104)
- Use per-sample RNG in SSDRandomCrop and RandomBBoxCrop (#1109)
- Add compile-time mapping for DALIDataType. For use in TYPE_SWITCH. (#1108)
- Reworks how the reader pick samples from the shuffling buffer (#1005)
- Add checking if Python API is not mixed between simple, scheduled and iterator (#1074)
- Enable OpticalFlow test on CI (#1096)
- Make
protobuf
linking mode configurable (#1102) - Kernel manager (#1079)
- Add JIRA Task placeholder in PR template (#1090)
- Replace vector<shared_ptr> with TensorVector (#1040)
- Deprecate NormalizePermute in favor of CropMirrorNormalize (#982)
- Adjust TensorFlow ResNet50 example to 1.14 version API (#1081)
- Update DALI TF plugin docs to be aligned with the current functionality (#1066)
- Adds BUILD_TF_PLUGIN flag to one-click build script (#1051)
- Enforce shares_data_ in Buffer (#1057)
- Improved sampler (#1071)
- Change test prefix from L*_ to TL*_ (#1069)
- Rounding Convert and ConvertSat added. (#1068)
- Copy multiple collections to scratchpad. (#1044)
- Use DALI_extra in loader test (#1064)
- Add filename to LMDB reader errors (#1059)
- Add make check target that runs basic tests (#1019)
- Bounding box representation (#1052)
- Add option to enable fast IDCT in libjpeg-turbo (#1031)
- Adjust Tests to use DALI_EXTRA (#1056)
- Basic geometric transform functions. (#1047)
- Add TorchPythonFunction operator (#1033)
- Add support for reading video files with labels using file_list argument (#1029)
- add tensorflow 1.14 (#1037)
- Enable sink operators. (#1004)
- Update PR template (#1043)
Breaking API changes
- Added Setup API to Operator with pure virtual SetupImpl
- Multiple Input Sets handling was removed from backend and is only python level syntactic sugar
- Reader sampling from shuffling buffer was adjusted
- Replace vector<shared_ptr> with TensorVector as input and output of CPU Operators allowing for contiguous outputs from CPU Ops
- Deprecate NormalizePermute in favor of CropMirrorNormalize (#982)
- Enforce shares_data_ in Buffer - sharing data cannot be implicitly reallocated and must match allocation size
Known issues:
- New Video reader operator requires NVIDIA VIDEO CODEC SDK support in the platform. NVIDIA GPU Cloud (NGC) optimized containers lacks this functionality in the default configuration prior to 19.01. To enable it please run the container with the ‘video’ capability enabled, ie.:
-e "NVIDIA_DRIVER_CAPABILITIES=compute,utility,video" - The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
Binary builds
Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.13.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.13.0
Or use direct download links (CUDA 9.0):
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.13.0-853140-cp27-cp27mu-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.13.0-853140-cp35-cp35m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.13.0-853140-cp36-cp36m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.13.0-853140-cp37-cp37m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.13.0.tar.gz
Or use direct download links (CUDA 10.0):
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.13.0-853141-cp27-cp27mu-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.13.0-853141-cp35-cp35m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.13.0-853141-cp36-cp36m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.13.0-853141-cp37-cp37m-manylinux1_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.13.0.tar.gz
FFmpeg source code:
DALI v0.12.0
Bug fixes
- Remove dependency with gitlab-master in DALI TF (#1038)
- Added include(CheckSymbolsExists) to cmakelists (#1035)
- Fix uninitialized number of dimensions in TensorListShape. (#1023)
- Add const-qualifiers to TensorShape
first
andlast
functions. (#1020) - Add missing bracket in the BoxEncoder docs (#1018)
- Adjust espilon in tests. (#1017)
- Add ASAN support, fix reported problems in the unit tests (#362)
- Fix for OF test (#1008)
- Fix nvjpeg_decoder legacy api build (#1006)
- Fix scratchpad allocation in CropMirrorNormalize (#1000)
- Fix Resize ratio calculation (#997)
- Add missing device guard in the reader prefetch thread (#978)
- optical flow test fix (#976)
- Make errors from build_helper propagate correctly (#961)
- Add casting to float before normalization in SliceFlipNormalizePermute tests (#974)
- Fix displacement filter (#524)
- Fix output allocation in operator benchmark (#959)
- Handle NULL pointer in ctypes_void_ptr (#965)
- Fix error of indexing shape in Optical Flow (#1087)
- Reduced batch size to avoid out of memory condition in 19.07 container
Improvements
- Create pull request template (#1039)
- Add environment variables to DALI TF build image (#1034)
- Replace HostDecoder and nvJPEGDecoder with generic ImageDecoder (#1028)
- Add deprecated operator warning when using it (#1030)
- Expose and document fine grain control API for pipeline run (#972)
- Use TensorListShape for TensorList shape (#1025)
- Rework nvidia-dali-tf-plugin build (#1007)
- Span improvements. (#1032)
- Add ImageDecoder operator, selecting implementation based on device argument (#995)
- Removed unified memory from resampling filters. (#1026)
- Add mechanism to mark an operator as deprecated in favor of another one (#1001)
- Add matrix types + tests. (#1014)
- Use TensorShape in dali::Tensor (#1015)
- Introduce number of samples to TensorListShape (#1010)
- Video reader label (#998)
- Add path to json in case of error in the COCO reader (#1011)
- Add vector types. (#1009)
- Add no squeeze option and dynamic shape for MXNet and PyTorch plugins (#988)
- Update test_python_function_operator.py (#880)
- Restructure subdirectories in nvjpeg decoder (#999)
- Add printing of error string enums with nvJPEG error codes (#983)
- Remove deprecated
__init__
usage from backend (#993) - Replace usage of NormalizePermute by CropMirrorNormalize (#994)
- Remove OldCropMirrorNormalize (#992)
- Optimize python operator outputs copy. (#958)
- Rework how DALI handles py_buffer format string (#985)
- Improve obtaining TensorFlow build flags for prebuild DALI plugins (#963)
- Replace CropMirrorNormalize with new implementation (#989)
- Add COCO tfrecord support (#979)
- Add test cases for Flip operator (#973)
- Add NewCropMirrorNormalize GPU (#970)
- Read COCO categories from json file in COCOReader (#986)
- Add -std=c++14 to cuda nvcc flags in custom plugin example (#984)
- Add max_size upperbound option to Resize with resize_short (#960)
- Enable no-crop by default in NewCropMirrorNormalize (#977)
- Change type traits to use C++14 library aliases. (#975)
- Use c++14 standard (#971)
- Change storage device from boolean to enum in workspace (#967)
- Add new SliceFlipNormalizePermute CPU kernel. (#949)
- Remove lint from the default target list (#964)
- Add split_scenes and transcode_scenes doc in Superres example (#944)
- Update libjpeg-turbo to 2.0.2 version (#951)
- Add lint as the first class, separate target to CMake (#952)
- Create test_optical_flow.py (#911)
- Adjust TensorFlow ResNet50 example to 1.14 version API (#1081)
- Change test prefix from L*_ to TL*_ (#1069)
Breaking API changes
- CPU operators have moved from per-sample processing (pipeline process sample after sample, all the way through the pipeline) to batch-procession (all samples are processed by the first operator before moving to the next operator). This may result in a small performance degradation for some use cases. However, in the long term it will make some currently unavailable optimizations possible, together with making possible operations that need to view the whole batch during the processing (like random sample blending inside a batch).
- Deprecated
_run
,_share_outputs
and_release_outputs
in favor ofschedule_run
,share_outputs
andrelease_outputs
- Replaced HostDecoder and nvJPEGDecoder with generic ImageDecoder. ImageDecoder is the recommended way function for the image decoding, and old API will be removed in the future
Known issues:
- New Video reader operator requires NVIDIA VIDEO CODEC SDK support in the platform. NVIDIA GPU Cloud (NGC) optimized containers lacks this functionality in the default configuration prior to 19.01. To enable it please run the container with the ‘video’ capability enabled, ie.:
-e "NVIDIA_DRIVER_CAPABILITIES=compute,utility,video" - The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- DALI TensorFlow plugin may be not compatible with TensorFlow 1.14.0 release. The DALI TensorFlow plugin requires that the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc 4.8.5, depending on the particular version) be present on the system.
Binary builds
Install via pip for CUDA 9:
pip install --extra-index-url http://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.12.0
or for CUDA 10
pip install --extra-index-url http://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.12.0
Or use direct download links (CUDA 9.0):
- http://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.12.0-819488-cp27-cp27mu-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.12.0-819488-cp35-cp35m-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.12.0-819488-cp36-cp36m-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.12.0-819488-cp37-cp37m-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.12.0.tar.gz
Or use direct download links (CUDA 10.0):
- http://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.12.0-819496-cp27-cp27mu-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.12.0-819496-cp35-cp35m-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.12.0-819496-cp36-cp36m-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.12.0-819496-cp37-cp37m-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.12.0.tar.gz
FFmpeg source code:
DALI v0.11.0
Bug fixes
- Fix propagation of DALI build SHA, flavor and timestamp (#948)
- Fix warning (#947)
- Fix data race in displacement filter (#945)
- Fix OF sequence number bug (#896)
- Drop TF 1.14rc0 from test as it doesn't have working TensorBoard (#941)
- Make Transpose operator as one supporting sequences (#928)
- Update aarch64 build docs (#931)
- Fix lint error (#932)
- Fix lint result being ignored for
include/dali
. Fix linter errors ininclude/dali
. (#923) - Fix floating point precision error to calculate width and height for resizing (#917)
- Fix wrong registration of python operators after loading plugin (#910)
- Bound installed torchvision version with present CUDA version in tests (#912)
- Update README and iterator docs (#889)
- Fix SSD example and tests (#908)
- Disable threading inside the OpenCV (#887)
- Fix lint error printing in Python 3. (#907)
- Fix compilation error in
assert(size(shample_shape))
. (#901) - fix cmake warning (#886)
- Restore performance in JoC RN50 inference (#962)
Improvements
- Change CPU to batch processing (#936)
- Add specializations of Operator class for all backends (#934)
- Replace the displacement flip with dedicated operator. (#849)
- Replace current crop and slice with new version based on slice kernel (#930)
- Add multiple inputs and outputs in the python operator (#942)
- Add ThreadPool to Host Workspace (#935)
- Make test_detection_pipeline to use DALI extra as an option (#922)
- Add the seqence reader example (#895)
- Box encoder gpu offsets (#939)
- Add cascading notify in thread pool (#933)
- Add optional offset computation to BoxEncoder (#921)
- Add sanity test for PyTorch SuperRes example (#633)
- Remove prebuild TensorFlow plugins from DALI (#920)
- New slice operator (#913)
- Remove unnecessary copies by using const ref or move (#655)
- view_as_tensor_gpu utility function & copy tensor (#658)
- Use SmallVector in TensorShape. (#915)
- Add GTC 2019 video and presentation do the documentation (#926)
- Optimize slice kernel. (#924)
- Update nvJPEG version (#919)
- Rework DeviceGuard to restore original context upon the exit (#882)
- Slice GPU batched kernel (#905)
- Add ability to use docker based build for insource-builds (#891)
- NewCrop: support for 4D inputs (#900)
- Upgrade PyTorch to 1.1.0 in QA tests config (#909)
- Device-usable TensorsShape and core utils. (#903)
- Add SmallVector class. (#902)
- Add N-dimensional Slice CPU kernel (#893)
- DALI for aarch64-linux platform (#856)
- Make linter to work with Python3 (#904)
- VideoReader stride (#755)
- Device-side testing. (#897)
- Update docs of the Readers (#894)
- Add L1 test for split queues executor (#780)
- Generic N-dimensional GPU slice kernel (#877)
- Update info about operators supporting sequences (#885)
- Move error handling to DALI core. (#867)
- Add possibility to build debug dali using build.sh (#857)
- Add proper errors to the ExternalSource (#875)
- Use raw ImageNet data for RN50 convergence test (#636)
- Simplified README with links to NVIDIA docs
- Add as_tensor with provided shape method to python API (#953)
Breaking API changes
- CPU operators have moved from per-sample processing (pipeline process sample after sample, all the way through the pipeline) to batch-procession (all samples are processed by the first operator before moving to the next operator). This may result in a small performance degradation for some use cases. However, in the long term it will make some currently unavailable optimizations possible, together with making possible operations that need to view the whole batch during the processing (like random sample blending inside a batch).
- CropCastPermute is removed. CropMirrorNormalize should be used instead (with the default values for normalization).
Known issues:
- New Video reader operator requires NVIDIA VIDEO CODEC SDK support in the platform. NVIDIA GPU Cloud (NGC) optimized containers lacks this functionality in the default configuration prior to 19.01. To enable it please run the container with the ‘video’ capability enabled, ie.:
-e "NVIDIA_DRIVER_CAPABILITIES=compute,utility,video" - The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- DALI TensorFlow plugin may be not compatible with TensorFlow 1.14.0 release. The DALI TensorFlow plugin requires that the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc 4.8.5, depending on the particular version) be present on the system.
Binary builds
Install via pip for CUDA 9:
pip install --extra-index-url http://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.11.0
or for CUDA 10
pip install --extra-index-url http://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.11.0
Or use direct download links (CUDA 9.0):
- http://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.11.0-781233-cp27-cp27mu-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.11.0-781233-cp35-cp35m-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.11.0-781233-cp36-cp36m-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.11.0-781233-cp37-cp37m-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.11.0.tar.gz
Or use direct download links (CUDA 10.0):
- http://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.11.0-781234-cp27-cp27mu-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.11.0-781234-cp35-cp35m-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.11.0-781234-cp36-cp36m-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.11.0-781234-cp37-cp37m-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.11.0.tar.gz
FFmpeg source code:
DALI v0.10.0
Bug fixes
- Fix CropMirrorNormalize crop_pos_x/y argument for the CPU (#853)
- Update SSD L1 test (#863)
- Add stream to memset calls (#862)
- Replace
bc
calls with awk (#850) - Fix pipeline serialization with make_continious inside (#848)
- add dot (#852)
- Remove unreliable tests that expected reallocation to give different pointer. (#851)
- Fix MXNet L3 and PyTorch L1 and L3 tests (#845)
- Fix tests for Ubuntu 18.04 and Python 3.7 (#797)
- Fix numerical issue in clamping cropped bounding boxes. (#846)
- Move RapidJSON to third_party (#835)
- Add fallback to so.1 for optical flow library loading (#822)
- Added more options to build.sh script (#828)
- Update SSD example to report global speed and use proper number of shards (#810)
- Fix one_config_only condition in test_template.sh (#823)
- Prevent manylinux3 image build from pruning other docker images (#795)
- Install OpenMPI for CUDA 10 when not present in the system (#821)
- Add dependencies silently required by opencv-python (#820)
- Fix test_detection_pipeline for python2 (#809)
- Do not install glib-2.0 in qa tests (#816)
- Reimplement GetSingleOrRepeatedArg without use of exceptions for normal flow.
- Fix no_dali run for SSD example (#803)
- Make test scripts verbose (#804)
- Updating OF docs & example (#799)
- Fix DALI version for non-release builds (#800)
- Improve error message when unable to set CPU affinity (#775)
- Move changing the value in callback before the barrier (#784)
- enabling tests (#789)
- Rename GetRequirements to Setup. (#778)
Improvements
- Add basic PyTorch DALI example, fix links to files in docs (#864)
- Move to CPU based pipeline in L3 RN50 TF test (#865)
- Add info about nightly and weekly DALI builds (#859)
- Generalized tensor list view (#791)
- Move doxygen doc generation to build docs phase (#860)
- QA tests: splitting plugin manager and tf plugin package tests (#830)
- Add options for OF, NVDEC and NVML support (#838)
- Add python tests for multi-input CropMirrorNormalize (#818)
- Fix unnecesary memory usage when reallocating (#847)
- Add collect_sources and collect_headers macros for CMake (#837)
- Add python function operator - DALI-571 (#732)
- Add performance treshold to L3 tests (#801)
- Add "dali_core" library. (#832)
- Upgrade to CMake 3.11 (#825)
- Add Boost Preprocessor to third_party (#826)
- Add location specifiers to
span
functions. (#824) - Improve documentation of ExternalSource and RandomResizedCrop (#815)
- Better logging and gitignore update (#806)
- Align build.sh with docs (#792)
- Non-static kernels. (#786)
- Add ability to build nightly/weekly version of DALI (#770)
- Add new test case for cached_batch_copy (#783)
- Add support of separate prefetch queues in TF plugin (#761)
Breaking API changes
- None
Known issues:
- New Video reader operator requires NVIDIA VIDEO CODEC SDK support in the platform. NVIDIA GPU Cloud (NGC) optimized containers lacks this functionality in the default configuration prior to 19.01. To enable it please run the container with the ‘video’ capability enabled, ie.:
-e "NVIDIA_DRIVER_CAPABILITIES=compute,utility,video" - The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
Binary builds
Install via pip for CUDA 9:
pip install --extra-index-url http://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.10.0
or for CUDA 10
pip install --extra-index-url http://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.10.0
Or use direct download links (CUDA 9.0):
- http://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.10.0-743882-cp27-cp27mu-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.10.0-743882-cp35-cp35m-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.10.0-743882-cp36-cp36m-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.10.0-743882-cp37-cp37m-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.10.0.tar.gz
Or use direct download links (CUDA 10.0):
- http://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.10.0-743881-cp27-cp27mu-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.10.0-743881-cp35-cp35m-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.10.0-743881-cp36-cp36m-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.10.0-743881-cp37-cp37m-manylinux1_x86_64.whl
- http://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.10.0.tar.gz
FFmpeg source code: