DeepSparse v1.5.0
New Features:
- ONNX evaluation pipeline for OpenPifPaf (#915)
- YOLOv8 segmentation pipelines and validation (#924)
deepsparse.benchmark_sweep
CLI to enable sweeps of benchmarks across different settings such as cores and batch sizes (#860)Engine.generate_random_inputs()
API (#966)- Example data logging configurations for pipelines/server (#867)
- Expanded built-in functions for NLP and CV pipeline logging to enable better monitoring (#865) (#862)
- Product usage analytics tracking in DeepSparse Community edition (documentation)
Performance Improvements:
- Inference latency for unstructured sparse-quantized CNNs has been improved by up to 2x.
- Inference throughput and latency for dense CNNs has been improved by up to 20%.
- Inference throughput and latency for dense transformers has been improved by up to 30%.
- The following operators are now supported for performance:
- Neg, Unsqueeze with non-constant inputs
- MatMulInteger with two non-constant inputs
- GEMM with constant weights and 4D or 5D inputs
Changes:
- Transformers and YOLOv5 integrations migrated from auto install to install from PyPI packages. Going forward,
pip install deepsparse[transformers]
andpip install deepsparse[yolov5]
will need to be used. - DeepSparse now uses hwloc to determine CPU topology. This fixes a bug where DeepSparse could not be used performantly inside of a Kubernetes cluster with a static CPU manager policy.
- When users pass in a
num_streams
parameter that is smaller than the number of cores, multi-stream and elastic scheduler behaviors have been improved. Previously, DeepSparse would divide the system intonum_streams
chunks and fill each chunk until it ran out of threads. Now, each stream will use a number of threads equal tonum_cores
divided bynum_streams
, with the remainder distributed in a round-robin fashion.
Resolved Issues:
-
In networks with a Clip operator where min isn't equal to zero, performance bugs no longer occurs.
-
Crashing eliminated:
- Pipeline conll eval using
ignore_labels
. (#903) - YOLOv8 pipelines handling models with dynamic inputs. (#967)
- QA pipelines with sequence lengths equal to or less than 128. (#889)
- Image classification pipelines handling PNG images. (#870)
- ONNX overriding of shapes if a list was not passed in; this now automatically wraps in a list. (#914)
- Pipeline conll eval using
-
Assertion errors/failures removed:
- Networks with both Convolutions and GEMM operations.
- YOLOv8 model compilation.
- Slice and Unsqueeze operators with a negative axis.
- OPT models involving a constant tensor that is broadcast in two different ways.
Known Issues:
- None