Skip to content

DeepSparse v1.5.0

Compare
Choose a tag to compare
@jeanniefinks jeanniefinks released this 07 Jun 05:18
22208e5

New Features:

  • ONNX evaluation pipeline for OpenPifPaf (#915)
  • YOLOv8 segmentation pipelines and validation (#924)
  • deepsparse.benchmark_sweep CLI to enable sweeps of benchmarks across different settings such as cores and batch sizes (#860)
  • Engine.generate_random_inputs() API (#966)
  • Example data logging configurations for pipelines/server (#867)
  • Expanded built-in functions for NLP and CV pipeline logging to enable better monitoring (#865) (#862)
  • Product usage analytics tracking in DeepSparse Community edition (documentation)

Performance Improvements:

  • Inference latency for unstructured sparse-quantized CNNs has been improved by up to 2x.
  • Inference throughput and latency for dense CNNs has been improved by up to 20%.
  • Inference throughput and latency for dense transformers has been improved by up to 30%.
  • The following operators are now supported for performance:
    • Neg, Unsqueeze with non-constant inputs
    • MatMulInteger with two non-constant inputs
    • GEMM with constant weights and 4D or 5D inputs

Changes:

  • Transformers and YOLOv5 integrations migrated from auto install to install from PyPI packages. Going forward, pip install deepsparse[transformers] and pip install deepsparse[yolov5] will need to be used.
  • DeepSparse now uses hwloc to determine CPU topology. This fixes a bug where DeepSparse could not be used performantly inside of a Kubernetes cluster with a static CPU manager policy.
  • When users pass in a num_streams parameter that is smaller than the number of cores, multi-stream and elastic scheduler behaviors have been improved. Previously, DeepSparse would divide the system into num_streams chunks and fill each chunk until it ran out of threads. Now, each stream will use a number of threads equal to num_cores divided by num_streams, with the remainder distributed in a round-robin fashion.

Resolved Issues:

  • In networks with a Clip operator where min isn't equal to zero, performance bugs no longer occurs.

  • Crashing eliminated:

    • Pipeline conll eval using ignore_labels. (#903)
    • YOLOv8 pipelines handling models with dynamic inputs. (#967)
    • QA pipelines with sequence lengths equal to or less than 128. (#889)
    • Image classification pipelines handling PNG images. (#870)
    • ONNX overriding of shapes if a list was not passed in; this now automatically wraps in a list. (#914)
  • Assertion errors/failures removed:

    • Networks with both Convolutions and GEMM operations.
    • YOLOv8 model compilation.
    • Slice and Unsqueeze operators with a negative axis.
    • OPT models involving a constant tensor that is broadcast in two different ways.

Known Issues:

  • None