Release 1.11.0 cherry pick round 1 (#10915)

* Update to flatbuffers v2.0.0 (#10866) * Fix Reduced ops pipeline (#10861) * Fix a couple of issues with the python package tools (#10858) * Tweaks to the model utils * Add handling for a dim_value of -1 when replacing the entire input shape. This occurs in models exported from PaddlePaddle * make pytorch helpers accessible in package * make QDQ helpers accessible in package * Fix wrong percentile values returned during calibration (#10847) * Use numpy.percentile to get the lookup value. * Use 1.0 as float value rather than integer. * Add missing cdf parameter for `np.percentile`. * Use 100. instead of 1.0 * Remove print. * Update from @yufenglee * Add support for opset 16 to transpose optimizer. (#10841) * Add support for opset 16 to transpose optimizer. Only change required is for GridSample to be added to the layout sensitive ops. The existing handling for layout transpose works with that as the first input and first output are layout sensitive. Update the optimize to be able to return an error message if it fails. * Use separate build directories for full and mobile iOS packages. (#10835) * Address performance issue with abseil flat_hash_table. (#10819) When returning by value in a cross DLL call, the hash table even though containing all the entries that are originally there can not find at least some of them. Reverting to std::unordered_set pending further investigation. * Mark end of version 11 C API. (#10803) * Mark end of version 11 C API * Add static_assert * avoid using LocalFree on FormatMessageW buffer (#10796) * remove local free * Remove local free from onnxruntime * don't allocate * Change to use constexpr to satisfy CPU build warning * Integrate C-API tests into Pipelines for release packages (#10794) * add c-api test for package * fix bug for running c-api test for package * refine run application script * remove redundant code * include CUDA test * Remove testing CUDA EP temporarily * fix bug * Code refactor * try to fix YAML bug * try to fix YAML bug * try to fix YAML bug * fix bug for multiple directories in Pipelines * fix bug * add comments and fix bug * Update c-api-noopenmp-packaging-pipelines.yml * Remove failOnStandardError flag in Pipelines * Detect runtime CUDA JIT and warn the user (#10781) * Use cudaMalloc vs cudaDeviceSynchronize and show the total time * Update convert_onnx_models_to_ort.py to support runtime optimizations. (#10765) Add runtime optimization support to ONNX -> ORT format conversion script. Replace `--optimization_level`, `--use_nnapi`, and `--use_coreml` with a new `--optimization_style` option. * Add multithreading test and put a lock on nvinfer1::createInferRuntime() for TRT EP (#10714) * Add multithread unit test and put lock on library call * update code * remove debug code * add comment * add one session multi-threads inference * Put lock for build engine all the time * Update naming and comment * remove unnecessary lock * Revert "remove unnecessary lock" This reverts commit 9c2317b. * Fix handling of nodes inserted by NHWC transformer. (#10904) (#10925) * Revert "Upsample support NHWC (#10554)" (#10917) This reverts commit bd08f11. Co-authored-by: Yufeng Li <[email protected]> * [python API] Change raise import error when `C:\Windows\System32\vcruntime140_1.dll` is not found to warning (#10927) * remove throw if C:\\Windows\\System32\\vcruntime140_1.dll cannot be found * Add comments and update warning message * adding back accidentally removed line Co-authored-by: gwang0000 <[email protected]> * [js] Create npm packaging pipeline (#10886) * create npm packaging pipeline * fix indentations * Update npm-packaging-pipeline.yml for Azure Pipelines * Update npm-packaging-pipeline.yml for Azure Pipelines * Update npm-packaging-pipeline.yml for Azure Pipelines * react-native-ci as a template * fix typos * fix template paths * add a depencendy * change a stage name * set different artifact name for each package * fix typo * Update npm-packaging-pipeline.yml for Azure Pipelines Set a build Id for node npm package as a parameter * Update npm-packaging-pipeline.yml for Azure Pipelines Set a build Id for node npm package as a parameter * Update npm-packaging-pipeline.yml for Azure Pipelines * Follow up update for python API checking if `vcruntime140_1.dll` is available (#10927) (#10933) Co-authored-by: Hariharan Seshadri <[email protected]> Co-authored-by: Scott McKay <[email protected]> Co-authored-by: Funtowicz Morgan <[email protected]> Co-authored-by: Edward Chen <[email protected]> Co-authored-by: Dmitri Smirnov <[email protected]> Co-authored-by: Pranav Sharma <[email protected]> Co-authored-by: Ryan Lai <[email protected]> Co-authored-by: Ryan Hill <[email protected]> Co-authored-by: Yi-Hong Lyu <[email protected]> Co-authored-by: Yufeng Li <[email protected]> Co-authored-by: Guoyu Wang <[email protected]> Co-authored-by: gwang0000 <[email protected]> Co-authored-by: Sunghoon <[email protected]>
microsoft · Mar 18, 2022 · b713855 · b713855
1 parent e0cec5c
commit b713855
Show file tree

Hide file tree

Showing 56 changed files with 1,622 additions and 1,031 deletions.
diff --git a/cmake/onnxruntime_python.cmake b/cmake/onnxruntime_python.cmake
@@ -380,6 +380,7 @@ file(GLOB onnxruntime_python_datasets_data CONFIGURE_DEPENDS
 set(onnxruntime_mobile_util_srcs
     ${REPO_ROOT}/tools/python/util/check_onnx_model_mobile_usability.py
     ${REPO_ROOT}/tools/python/util/convert_onnx_models_to_ort.py
+    ${REPO_ROOT}/tools/python/util/file_utils.py
     ${REPO_ROOT}/tools/python/util/logger.py
     ${REPO_ROOT}/tools/python/util/make_dynamic_shape_fixed.py
     ${REPO_ROOT}/tools/python/util/onnx_model_utils.py
@@ -397,6 +398,9 @@ file(GLOB onnxruntime_mobile_helpers_srcs CONFIGURE_DEPENDS
     ${REPO_ROOT}/tools/ci_build/github/android/nnapi_supported_ops.md
     ${REPO_ROOT}/tools/ci_build/github/apple/coreml_supported_ops.md
 )
+file(GLOB onnxruntime_qdq_helper_srcs CONFIGURE_DEPENDS
+    ${REPO_ROOT}/tools/python/util/qdq_helpers/*.py
+)
 
 set(build_output_target onnxruntime_common)
 if(NOT onnxruntime_ENABLE_STATIC_ANALYSIS)
@@ -408,6 +412,7 @@ add_custom_command(
   COMMAND ${CMAKE_COMMAND} -E make_directory $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/datasets
   COMMAND ${CMAKE_COMMAND} -E make_directory $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/tools
   COMMAND ${CMAKE_COMMAND} -E make_directory $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/tools/mobile_helpers
+  COMMAND ${CMAKE_COMMAND} -E make_directory $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/tools/qdq_helpers
   COMMAND ${CMAKE_COMMAND} -E make_directory $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/tools/ort_format_model
   COMMAND ${CMAKE_COMMAND} -E make_directory $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/tools/ort_format_model/ort_flatbuffers_py
   COMMAND ${CMAKE_COMMAND} -E make_directory $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/transformers
@@ -460,7 +465,17 @@ add_custom_command(
   COMMAND ${CMAKE_COMMAND} -E copy
       ${onnxruntime_mobile_util_srcs}
       $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/tools/
-      COMMAND ${CMAKE_COMMAND} -E copy
+  # append the /tools/python/utils imports to the __init__.py that came from /onnxruntime/tools.
+  # we're aggregating scripts from two different locations, and only include selected functionality from
+  # /tools/python/util. due to that we take the full __init__.py from /onnxruntime/tools and append
+  # the required content from /tools/python/util/__init__append.py.
+  COMMAND ${CMAKE_COMMAND} -E cat
+      ${REPO_ROOT}/tools/python/util/__init__append.py >>
+      $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/tools/__init__.py
+  COMMAND ${CMAKE_COMMAND} -E copy
+      ${onnxruntime_qdq_helper_srcs}
+      $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/tools/qdq_helpers/
+  COMMAND ${CMAKE_COMMAND} -E copy
       ${onnxruntime_mobile_helpers_srcs}
       $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/tools/mobile_helpers/
   COMMAND ${CMAKE_COMMAND} -E copy

diff --git a/include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h b/include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h
@@ -54,6 +54,7 @@ static const char* const kOrtSessionOptionsDisableQuantQDQ = "session.disable_qu
 // other factors like whether the model was created using Quantization Aware Training or Post Training Quantization.
 // As such, it's best to test to determine if enabling this works well for your scenario.
 // The default value is "0"
+// Available since version 1.11.
 static const char* const kOrtSessionOptionsEnableQuantQDQCleanup = "session.enable_quant_qdq_cleanup";
 
 // Enable or disable gelu approximation in graph optimization. "0": disable; "1": enable. The default is "0".
@@ -80,25 +81,18 @@ static const char* const kOrtSessionOptionsConfigUseORTModelBytesDirectly = "ses
 
 // This should only be specified when exporting an ORT format model for use on a different platform.
 // If the ORT format model will be used on ARM platforms set to "1". For other platforms set to "0"
+// Available since version 1.11.
 static const char* const kOrtSessionOptionsQDQIsInt8Allowed = "session.qdqisint8allowed";
 
-// Save information for replaying graph optimizations later instead of applying them directly.
-//
-// When an ONNX model is loaded, ORT can perform various optimizations on the graph.
-// However, when an ORT format model is loaded, the logic to perform these optimizations may not be available because
-// this scenario must be supported by minimal builds.
-// When loading an ONNX model, ORT can optionally save the effects of some optimizations for later replay in an ORT
-// format model. These are known as "runtime optimizations" - in an ORT format model, they happen at runtime.
-//
-// Note: This option is only applicable when loading an ONNX model and saving an ORT format model.
-//
-// Note: Runtime optimizations are only supported for certain optimizations at the extended level or higher.
-// Unsupported optimizations at those levels are not applied at all, while optimizations at other levels are applied
-// directly.
-//
-// "0": disabled, "1": enabled
-// The default is "0".
-static const char* const kOrtSessionOptionsConfigSaveRuntimeOptimizations = "optimization.save_runtime_optimizations";
+// Specifies how minimal build graph optimizations are handled in a full build.
+// These optimizations are at the extended level or higher.
+// Possible values and their effects are:
+// "save": Save runtime optimizations when saving an ORT format model.
+// "apply": Only apply optimizations available in a minimal build.
+// ""/<unspecified>: Apply optimizations available in a full build.
+// Available since version 1.11.
+static const char* const kOrtSessionOptionsConfigMinimalBuildOptimizations =
+    "optimization.minimal_build_optimizations";
 
 // Note: The options specific to an EP should be specified prior to appending that EP to the session options object in
 // order for them to take effect.

diff --git a/onnxruntime/core/framework/config_options.cc b/onnxruntime/core/framework/config_options.cc
@@ -22,8 +22,8 @@ bool ConfigOptions::TryGetConfigEntry(const std::string& config_key, std::string
   return found;
 }
 
-const std::string ConfigOptions::GetConfigOrDefault(const std::string& config_key,
-                                                    const std::string& default_value) const noexcept {
+std::string ConfigOptions::GetConfigOrDefault(const std::string& config_key,
+                                              const std::string& default_value) const noexcept {
   return GetConfigEntry(config_key).value_or(default_value);
 }
 

diff --git a/onnxruntime/core/framework/config_options.h b/onnxruntime/core/framework/config_options.h
@@ -12,9 +12,9 @@
 namespace onnxruntime {
 
 /**
-  * Configuration options that can be used by any struct by inheriting this class.
-  * Provides infrastructure to add/get config entries
-  */
+ * Configuration options that can be used by any struct by inheriting this class.
+ * Provides infrastructure to add/get config entries
+ */
 struct ConfigOptions {
   std::unordered_map<std::string, std::string> configurations;
 
@@ -29,7 +29,7 @@ struct ConfigOptions {
 
   // Get the config string in this instance of ConfigOptions using the given config_key
   // If there is no such config, the given default string will be returned
-  const std::string GetConfigOrDefault(const std::string& config_key, const std::string& default_value) const noexcept;
+  std::string GetConfigOrDefault(const std::string& config_key, const std::string& default_value) const noexcept;
 
   // Add a config pair (config_key, config_value) to this instance of ConfigOptions
   Status AddConfigEntry(const char* config_key, const char* config_value) noexcept;

diff --git a/onnxruntime/core/framework/fallback_cpu_capability.cc b/onnxruntime/core/framework/fallback_cpu_capability.cc
@@ -38,12 +38,12 @@ static bool IsSmallInitializer(const onnxruntime::GraphViewer& graph, const Node
 }
 }  // namespace
 
-InlinedHashSet<NodeIndex> GetCpuPreferredNodes(const onnxruntime::GraphViewer& graph,
-                                               const std::string& provider_type,
-                                               gsl::span<const KernelRegistry* const> kernel_registries,
-                                               gsl::span<const NodeIndex> tentative_nodes) {
+std::unordered_set<NodeIndex> GetCpuPreferredNodes(const onnxruntime::GraphViewer& graph,
+                                                   const std::string& provider_type,
+                                                   gsl::span<const KernelRegistry* const> kernel_registries,
+                                                   gsl::span<const NodeIndex> tentative_nodes) {
   // automatic conversion from const std::vector&
-  gsl::span<const NodeIndex> ordered_nodes = graph.GetNodesInTopologicalOrder();
+  const auto& ordered_nodes = graph.GetNodesInTopologicalOrder();
   InlinedVector<size_t> node_id_to_order_map(graph.MaxNodeIndex());
   for (size_t id = 0, limit = ordered_nodes.size(); id < limit; ++id) {
     const NodeIndex& node_id = ordered_nodes[id];
@@ -55,7 +55,7 @@ InlinedHashSet<NodeIndex> GetCpuPreferredNodes(const onnxruntime::GraphViewer& g
     return node_id_to_order_map[n1] > node_id_to_order_map[n2];
   };
 
-  std::priority_queue<NodeIndex, InlinedVector<NodeIndex>, decltype(greater_order_comp)> candidates(greater_order_comp);
+  std::priority_queue<NodeIndex, std::vector<NodeIndex>, decltype(greater_order_comp)> candidates(greater_order_comp);
 
   InlinedHashSet<const NodeArg*> cpu_output_args;
 
@@ -95,10 +95,10 @@ InlinedHashSet<NodeIndex> GetCpuPreferredNodes(const onnxruntime::GraphViewer& g
         }));
   }
 
-  gsl::span<const NodeArg* const> graph_inputs = graph.GetInputs();
+  const auto& graph_inputs = graph.GetInputs();
   InlinedHashSet<NodeIndex> visited;
   visited.reserve(candidates.size());
-  InlinedHashSet<NodeIndex> cpu_nodes;
+  std::unordered_set<NodeIndex> cpu_nodes;
   cpu_nodes.reserve(candidates.size());
   // The algo below is trying to identity a subgraph that only depends on cpu tensors.
   // Usually it is a subgraph that doing shape calculation based on a GPU tensor, then reshape it back.

diff --git a/onnxruntime/core/framework/fallback_cpu_capability.h b/onnxruntime/core/framework/fallback_cpu_capability.h
@@ -18,9 +18,9 @@ namespace onnxruntime {
   @param kernel_registries Kernel registries for the target EP
   @param tentative_nodes Nodes that are tentative to be placed on on target EP
   */
-InlinedHashSet<NodeIndex> GetCpuPreferredNodes(const GraphViewer& graph,
-                                               const std::string& provider_type,
-                                               gsl::span<const KernelRegistry* const> kernel_registries,
-                                               gsl::span<const NodeIndex> tentative_nodes);
+  std::unordered_set<NodeIndex> GetCpuPreferredNodes(const GraphViewer& graph,
+                                                    const std::string& provider_type,
+                                                    gsl::span<const KernelRegistry* const> kernel_registries,
+                                                    gsl::span<const NodeIndex> tentative_nodes);
 
 }  // namespace onnxruntime
diff --git a/onnxruntime/core/framework/session_state.cc b/onnxruntime/core/framework/session_state.cc
@@ -988,23 +988,18 @@ Status SessionState::LoadFromOrtFormat(const fbs::SessionState& fbs_session_stat
   // kernel hashes for model are in top level SessionState
   const auto& compiled_kernel_hashes = GetCompiledKernelHashes();
 
-  const bool original_nodes_should_exist =
-      compiled_kernel_hashes.empty()
-#if !defined(ORT_MINIMAL_BUILD) || defined(ORT_EXTENDED_MINIMAL_BUILD)
-      && graph_.RuntimeOptimizationReplayCtx().num_replayed_optimizations == 0
-#endif  // !defined(ORT_MINIMAL_BUILD) || defined(ORT_EXTENDED_MINIMAL_BUILD)
-      ;
-
   // process the nodes that existed when the model was created
   for (FbsSessionStateViewer::Index i = 0, end = fbs_session_state_viewer.GetNumNodeKernelInfos(); i < end; ++i) {
     const auto node_kernel_info = fbs_session_state_viewer.GetNodeKernelInfo(i);
 
     Node* const node = graph_.GetNode(node_kernel_info.node_index);
     if (node == nullptr) {
-      // this is OK if we have compiled kernels/replayed runtime optimizations and the original node was replaced.
+#if defined(ORT_MINIMAL_BUILD) && !defined(ORT_EXTENDED_MINIMAL_BUILD)
+      // this is OK if we have compiled kernels and the original node was replaced.
       // if not the model is invalid.
-      ORT_RETURN_IF(original_nodes_should_exist,
+      ORT_RETURN_IF(compiled_kernel_hashes.empty(),
                     "Can't find node with index ", node_kernel_info.node_index, ". Invalid ORT format model.");
+#endif  // defined(ORT_MINIMAL_BUILD) && !defined(ORT_EXTENDED_MINIMAL_BUILD)
       continue;
     }
 

diff --git a/onnxruntime/core/optimizer/graph_transformer_utils.cc b/onnxruntime/core/optimizer/graph_transformer_utils.cc
@@ -286,7 +286,7 @@ InlinedVector<std::unique_ptr<GraphTransformer>> GenerateTransformersForMinimalB
     const IExecutionProvider& cpu_execution_provider,
     const InlinedHashSet<std::string>& rules_and_transformers_to_disable) {
   InlinedVector<std::unique_ptr<GraphTransformer>> transformers;
-  bool saving = std::holds_alternative<SatRuntimeOptimizationSaveContext>(apply_context);
+  const bool saving = std::holds_alternative<SatRuntimeOptimizationSaveContext>(apply_context);
 
   switch (level) {
     case TransformerLevel::Level1:

diff --git a/onnxruntime/core/optimizer/transpose_optimizer/optimizer_api.h b/onnxruntime/core/optimizer/transpose_optimizer/optimizer_api.h
@@ -428,7 +428,7 @@ class GraphRef {
 }  // namespace api
 
 constexpr int64_t kMinSupportedOpset = 7;
-constexpr int64_t kMaxSupportedOpset = 15;
+constexpr int64_t kMaxSupportedOpset = 16;
 
 enum class OptimizerMode {
   OPTIMIZE_TRANSPOSE,        // simple transpose optimization
@@ -441,6 +441,11 @@ enum class OptimizerMode {
 /// <returns>const reference to an unordered set of op_types which are layout sensitive</returns>
 const std::unordered_set<std::string_view>& GetLayoutSensitiveOps();
 
+struct OptimizeResult {
+  std::optional<std::string> error_msg;  // set if there was an error
+  bool graph_modified{false};
+};
+
 /// <summary>
 /// Performs transpose optimization on a graph. Returns true if the graph was modified.
 ///
@@ -453,15 +458,17 @@ const std::unordered_set<std::string_view>& GetLayoutSensitiveOps();
 /// <param name="graph">The graph to optimize (or a portion of a graph, see api::GraphRef docs)</param>
 /// <param name="allow_extended_ops">Whether com.microsoft ops can be used for optimization</param>
 /// <param name="provider_type">Execution provider if applicable.</param>
-/// <param name="mode">Current mode. Optimizer can be called in the context of transpose optimizations or during layout transformations.</param>
-/// <param name="layout_sensitive_ops">List of ops which are treated as layout sensitive by the ONNX standard as well as any runtime specific ops.
-/// These ops should be provided when mode is set to OPTIMIZE_LAYOUT_TRANSFORM. If these ops are not provided, transpose optimizer may convert the
-/// layout for these ops </param>
-/// <returns>true if the graph was modified</returns>
-bool Optimize(api::GraphRef& graph, bool allow_extended_ops,
-              const std::string& provider_type = "",
-              OptimizerMode mode = OptimizerMode::OPTIMIZE_TRANSPOSE,
-              const std::unordered_set<std::string_view>& layout_sensitive_ops = {});
+/// <param name="mode">Current mode. Optimizer can be called in the context of transpose optimizations or during
+/// layout transformations.</param>
+/// <param name="layout_sensitive_ops">List of ops which are treated as layout sensitive by the ONNX standard
+/// as well as any runtime specific ops. These ops should be provided when mode is set to OPTIMIZE_LAYOUT_TRANSFORM.
+/// If these ops are not provided, transpose optimizer may convert the layout for these ops </param>
+/// <returns>OptimizeResult. If error_msg is set the Optimize failed. If not set, graph_modified indicates whether
+/// any changes were required during optimization.</returns>
+OptimizeResult Optimize(api::GraphRef& graph, bool allow_extended_ops,
+                        const std::string& provider_type = "",
+                        OptimizerMode mode = OptimizerMode::OPTIMIZE_TRANSPOSE,
+                        const std::unordered_set<std::string_view>& layout_sensitive_ops = {});
 
 /* Layout Transformation Tools
  * These methods help change the channel ordering of layout sensitive ops (like Conv). ONNX currently only supports

diff --git a/onnxruntime/core/optimizer/transpose_optimizer/optimizer_api_impl.cc b/onnxruntime/core/optimizer/transpose_optimizer/optimizer_api_impl.cc
@@ -876,9 +876,14 @@ Status TransformLayoutForCompilingEP(Graph& graph, bool& modified, const IExecut
   }
 
   if (modified) {
-    onnx_layout_transformation::Optimize(*api_graph, /*allow_extended_ops*/ true, execution_provider.Type(),
-                                         onnx_layout_transformation::OptimizerMode::OPTIMIZE_LAYOUT_TRANSFORM,
-                                         layout_sensitive_ops);
+    OptimizeResult result =
+        onnx_layout_transformation::Optimize(*api_graph, /*allow_extended_ops*/ true, execution_provider.Type(),
+                                             onnx_layout_transformation::OptimizerMode::OPTIMIZE_LAYOUT_TRANSFORM,
+                                             layout_sensitive_ops);
+    if (result.error_msg) {
+      return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Optimization after layout transformation failed: ",
+                             result.error_msg.value());
+    }
   }
 
   return Status::OK();

diff --git a/onnxruntime/core/optimizer/transpose_optimizer/ort_transpose_optimizer.cc b/onnxruntime/core/optimizer/transpose_optimizer/ort_transpose_optimizer.cc
@@ -17,7 +17,14 @@ namespace onnxruntime {
 
 Status TransposeOptimizer::ApplyImpl(Graph& graph, bool& modified, int graph_level, const logging::Logger& logger) const {
   auto api_graph = MakeApiGraph(graph, cpu_allocator_, /*new_node_ep*/ nullptr);
-  if (onnx_layout_transformation::Optimize(*api_graph, /*allow_extended_ops*/ false)) {
+  OptimizeResult result = onnx_layout_transformation::Optimize(*api_graph, /*allow_extended_ops*/ false);
+  if (result.error_msg) {
+    // currently onnx_layout_transformation::Optimize only fails if we hit an unsupported opset.
+    // we don't want to fail loading the model just because we can't optimize Transpose ops, so just log a warning
+    LOGS(logger, WARNING) << "Transpose optimizer failed: " << result.error_msg.value();
+  }
+
+  if (result.graph_modified) {
     modified = true;
   }