[DML EP] Add BFC allocator #16634

PatriceVignola · 2023-07-08T05:07:38Z

This change adds the BFC allocator to the DML EP in order to reduce peak memory usage and allow bigger models to be loaded in memory (e.g. LLMs). Note that we still need to keep the Bucketized allocator since the WinML API allows for the caller to query D3D12Resources directly, which isn't backward compatible with the way that the BFC allocations work.

The reserved resource logic is similar to what we had in TF. We leverage the existing ORT allocator by creating a tagged pointer which can be incremented arithmetically by ORT, as if it was sequentially allocated memory. When we need to access the memory, we decode the tagged pointer in order to access its allocation id and offset. This allows us to effectively retrieve the appropriate resource and access it at the right offset.

To keep the legacy WinML APIs working, I added a way to detect whether custom ops have been registered at session creation them. I then added an ORT API that allows us to disable the BFC allocator when creating the execution provider, which allows us to use the bucketized allocator instead of the BFC one. This doesn't cause any regressions since the custom op users were already using the Bucketized buffer allocator, and it's a niche scenario anyway.

…ignol/add-bfc-allocator-3

…origin/user/pavignol/add-bfc-allocator-4

winml/test/adapter/AdapterDmlEpTest.cpp

…origin/user/pavignol/add-bfc-allocator-4

onnxruntime/core/framework/bfc_arena.cc

onnxruntime/core/providers/dml/dml_provider_factory.cc

onnxruntime/core/providers/dml/DmlExecutionProvider/src/BucketizedBufferAllocator.h

smk2007 · 2023-08-15T22:01:42Z

onnxruntime/core/providers/dml/DmlExecutionProvider/src/DmlBfcAllocator.h

+
+namespace Dml
+{
+    class DmlBfcAllocator : public onnxruntime::IAllocator


DmlBfcAllocator

nit: maybe name DmlResourceAllocator

onnxruntime/core/providers/dml/DmlExecutionProvider/src/ExecutionProvider.h

smk2007 · 2023-08-15T22:40:51Z

onnxruntime/core/providers/dml/DmlExecutionProvider/src/MLOperatorAuthorImpl.cpp

-                m_dataInterface = static_cast<IUnknown*>(m_impl->MutableDataRaw());
+            m_tensorData = m_impl->MutableDataRaw();
+        }
+    }


Whats going on here??

What happened to my Shadow Copy!??

I went through this whole logic and this is super outdated code. What this code is trying to do is decoding what the actual pointer is (something that inherits IUnknown or a more specific ID3D12Resource), but we ended up always returning ID3D12Resource objects even for the external operators. Also, this notion of "layout" doesn't seem to make any sense here since all GetShadowCopyIfRequired does is increasing the ref count, but this isn't needed since there's no layout conversion done anywhere.

Maybe @jeffbloo can shed some light here, but from my analysis this all seems to be code that isn't needed anymore.

onnxruntime/core/providers/dml/DmlExecutionProvider/src/MLOperatorAuthorImpl.h

onnxruntime/core/providers/dml/DmlExecutionProvider/src/MLOperatorAuthorImpl.cpp

onnxruntime/core/providers/dml/DmlExecutionProvider/src/DmlCommandRecorder.cpp

onnxruntime/core/providers/dml/OperatorAuthorHelper/MLOperatorAuthorPrivate.h

onnxruntime/core/providers/dml/DmlExecutionProvider/src/DmlSubAllocator.h

onnxruntime/core/providers/dml/DmlExecutionProvider/src/IExecutionProvider.h

onnxruntime/core/providers/dml/DmlExecutionProvider/src/MLOperatorAuthorImpl.h

onnxruntime/core/providers/dml/DmlExecutionProvider/src/ExecutionProvider.cpp

…origin/user/pavignol/add-bfc-allocator-4

smk2007

fdwr · 2023-08-19T00:39:22Z

onnxruntime/core/framework/utils.cc

+#ifdef USE_DML
+  const bool bothValuesOnGPU = copy_info.source_device.Type() == OrtDevice::GPU && copy_info.target_device.Type() == OrtDevice::GPU;
+  const bool sourceIsDmlAlloc = copy_info.source_device.MemType() == OrtDevice::MemType::DEFAULT || copy_info.source_device.MemType() == OrtDevice::MemType::DML_EXTERNAL;
+  const bool targetIsInternalAlloc = copy_info.target_device.MemType() == OrtDevice::MemType::DEFAULT;


target_is_internal_alloc since this is ORT code 🐫🐍 rather than the DML EP 🐫🐪.

fdwr · 2023-08-19T00:45:12Z

onnxruntime/core/providers/dml/DmlExecutionProvider/src/DmlAllocationInfo.cpp

+
+namespace Dml
+{
+


[nit] extra blank line

fdwr · 2024-07-10T22:32:27Z

@PatriceVignola Happy 1 year anniversary on this open CR. Still relevant? 🤔

…origin/user/pavignol/add-bfc-allocator-4

winml/test/adapter/AdapterDmlEpTest.cpp

+      ort_api
+    );
+    // Ensure resource is the same
+    WINML_EXPECT_EQUAL(d3d12_resource, d3d12_resource_from_allocation);


winml/test/adapter/AdapterDmlEpTest.cpp

+    auto unique_cpu_memory_info = UniqueOrtMemoryInfo(cpu_memory_info, ort_api->ReleaseMemoryInfo);
+    auto cpu_tensor = CreateTensorFromMemoryInfo(unique_cpu_memory_info.get());
+    THROW_IF_NOT_OK_MSG(winml_adapter_api->ValueGetDeviceId(cpu_tensor.get(), &device_id), ort_api);
+    WINML_EXPECT_EQUAL(0, device_id);


winml/test/adapter/AdapterDmlEpTest.cpp

+    THROW_IF_NOT_OK_MSG(
+      winml_adapter_api->SessionGetInputRequiredDeviceId(cpu_session.get(), "inputImage", &device_id), ort_api
+    );
+    WINML_EXPECT_EQUAL(0, device_id);


PatriceVignola added 30 commits January 17, 2023 20:00

WIP

f5a87a4

WIP

707c1c9

WIP

0619fa3

WIP

6b62b72

WIP

3f2910b

WIP

25bb52d

Remove sub allocator

92f51a3

WIP

c0cbcae

WIP

76328be

WIP

7bd0983

WIP

0c35fc2

WIP

43c47b9

WIP

d0eb5da

Add buffer region size alignment

3385d20

Merge branch 'main' of github.com:microsoft/onnxruntime into user/pav…

4e36efd

…ignol/add-bfc-allocator-3

WIP

7e5622d

WIP

e6897c5

WIP

2064baa

WIP

b71a5ff

WIP

06caff8

WIP

e7667f1

WIP

a95d434

Fix

0729ea2

Fix

544637f

WIP

ea26855

WIP

61dce2e

WIP

b9b3fb8

WIP

3854807

WIP

93d931b

Merge branch 'main' of github.com:microsoft/onnxruntime into user/pav…

96be36c

…ignol/add-bfc-allocator-3

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

2e4eb2c

…origin/user/pavignol/add-bfc-allocator-4

github-advanced-security bot found potential problems Aug 11, 2023

View reviewed changes

winml/test/adapter/AdapterDmlEpTest.cpp Fixed Show fixed Hide fixed

winml/test/adapter/AdapterDmlEpTest.cpp Fixed Show fixed Hide fixed

winml/test/adapter/AdapterDmlEpTest.cpp Fixed Show fixed Hide fixed

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

184940a

…origin/user/pavignol/add-bfc-allocator-4