Skip to content

Commit

Permalink
[PyOV][DOCS] Update inference documentation with shared memory flags (o…
Browse files Browse the repository at this point in the history
  • Loading branch information
Jan Iwaszkiewicz authored Jul 18, 2023
1 parent d21296b commit ec26537
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 11 deletions.
18 changes: 10 additions & 8 deletions docs/OV_Runtime_UG/Python_API_inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,16 +26,17 @@ The ``CompiledModel`` class provides the ``__call__`` method that runs a single
:fragment: [direct_inference]


Shared Memory on Inputs
#######################
Shared Memory on Inputs and Outputs
###################################

While using ``CompiledModel``, ``InferRequest`` and ``AsyncInferQueue``,
OpenVINO™ Runtime Python API provides an additional mode - "Shared Memory".
Specify the ``shared_memory`` flag to enable or disable this feature.
The "Shared Memory" mode may be beneficial when inputs are large and copying
data is considered an expensive operation. This feature creates shared ``Tensor``
Specify the ``share_inputs`` and ``share_outputs`` flag to enable or disable this feature.
The "Shared Memory" mode may be beneficial when inputs or outputs are large and copying data is considered an expensive operation.

This feature creates shared ``Tensor``
instances with the "zero-copy" approach, reducing overhead of setting inputs
to minimum. Example usage:
to minimum. For outputs this feature creates numpy views on data. Example usage:


.. doxygensnippet:: docs/snippets/ov_python_inference.py
Expand All @@ -45,13 +46,14 @@ to minimum. Example usage:

.. note::

"Shared Memory" is enabled by default in ``CompiledModel.__call__``.
"Shared Memory" on inputs is enabled by default in ``CompiledModel.__call__``.
For other methods, like ``InferRequest.infer`` or ``InferRequest.start_async``,
it is required to set the flag to ``True`` manually.
"Shared Memory" on outputs is disabled by default in all sequential inference methods (``CompiledModel.__call__`` and ``InferRequest.infer``). It is required to set the flag to ``True`` manually.

.. warning::

When data is being shared, all modifications may affect inputs of the inference!
When data is being shared, all modifications (including subsequent inference calls) may affect inputs and outputs of the inference!
Use this feature with caution, especially in multi-threaded/parallel code,
where data can be modified outside of the function's control flow.

Expand Down
10 changes: 7 additions & 3 deletions docs/snippets/ov_python_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,13 @@
request = compiled_model.create_infer_request()

#! [shared_memory_inference]
# Data can be shared
_ = compiled_model({"input_0": data_0, "input_1": data_1}, shared_memory=True)
_ = request.infer({"input_0": data_0, "input_1": data_1}, shared_memory=True)
# Data can be shared only on inputs
_ = compiled_model({"input_0": data_0, "input_1": data_1}, share_inputs=True)
_ = request.infer({"input_0": data_0, "input_1": data_1}, share_inputs=True)
# Data can be shared only on outputs
_ = request.infer({"input_0": data_0, "input_1": data_1}, share_outputs=True)
# Or both flags can be combined to achieve desired behavior
_ = compiled_model({"input_0": data_0, "input_1": data_1}, share_inputs=False, share_outputs=True)
#! [shared_memory_inference]

time_in_sec = 2.0
Expand Down

0 comments on commit ec26537

Please sign in to comment.