-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WebNN EP] Enable IO Bindings with MLTensor #21301
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super! Great work @egalli!
Some first eyes comments, I can't wait to try it, will provide more feedbacks later.
@egalli, I test a simple merged model (with Since WebNN doesn't support With pre-allocated output ml buffer, it works. |
@Honry, I have changed from |
It works, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@egalli, some final comments. :)
0c853c0
to
e203298
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @egalli, LGTM % two nits.
|
/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline |
/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline |
/azp run Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline |
Azure Pipelines successfully started running 3 pipeline(s). |
Azure Pipelines successfully started running 9 pipeline(s). |
Azure Pipelines successfully started running 10 pipeline(s). |
/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline |
/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline |
/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline |
/azp run Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline |
Azure Pipelines successfully started running 3 pipeline(s). |
Azure Pipelines successfully started running 8 pipeline(s). |
Azure Pipelines successfully started running 9 pipeline(s). |
I am generally OK with this change. I have a question: does this |
MLTensor can represent data in any of the device types (CPU, GPU, or NPU). Moreover, |
I understand that MLTensor will be the type that WebNN interface uses. My question is, I am not sure how MLTensor manages the location of its data. I have no idea how user can set locations to a model's inputs and outputs when using MLTensor (it's not yet in the spec). If MLTensor is created with explicitly set location, at least for CPU usage, we should allow user to use normal |
There is a PR explainer for MLTensor. As for the location, MLTensor are created of an MLContext. MLContexts have a device associated with them, either CPU/GPU/NPU. Therefore, an MLTensor will inherit the context's location/device. Also, MLTensor are only valid when used in the context that created them (i.e. MLTensors are non-transferable between devices/MLContexts) Hopefully that helps. |
@fs-eire Thanks for helping us integrate MLTensor into ORT web. To expand on what @egalli said, We are still working on fully specifying how tensor data will move between devices, so at this stage, it's unclear if the location will move predictably or remain fixed on a given Let me know if you'd like more adjustments or details. |
@egalli @bbernhar Thank you for the explaination. I think it is totally fine to consider "MLTensor" as a virtual (logical) location in the context of
|
I suppose this would be a common inference scenario and we should support it. Yulong, what's your guidance? I understand @egalli also has a plan to improve the performance of the ort.Tensor(CPU) input/output by reusing MLTensor. Enrico, do you happen to create a issue tracking this optimization?
I suppose we should support Tensor.download and Tensor.dispose. Could this be implemented in a follow-up CL? |
@huningxin I have not created an issue, but I have a preliminary change set ready
The code currently uses the Allocator and DataTransfer classes in the C++ code. Is this a problem?
Is this different than the currently implemented solution? btw, keep in the mind that unlike ort.Tensor(CPU), MLTensor(CPU) is not easily accessible to JS/Wasm. In Chromium, MLTensor(CPU) is allocated in by the TFLife backend outside of the render process. Therefore, IPC/Mojo calls are required to read and write to it. |
The basic idea of resource management in onnxruntime-web is: CPU:user should never worry about resource management for CPU tensor. Since no resource management is needed, there is also no lifecycle consideration for CPU tensors. GPU (and other non-CPU location):If an instance of non-CPU
If an instance of non-CPU
|
it should be OK.
I think if user requires the "location" to be on CPU, it means to use |
Considering that the WebNN WG wants to remove |
I don't expect we need to allow mapping Does this answer your question/concern? |
Is using I would prefer that using |
My understanding is that Is there anything else blocking this PR? |
No. I think it's all good. |
/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-ortmodule-distributed,Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline |
Azure Pipelines successfully started running 6 pipeline(s). |
Yes, it can be allowed so long as |
@egalli That idea was proposed, but there's not consensus on it. More likely it will end up being relaxed to be more of a hint than a requirement given CoreML's |
### Description Enables using the MLTensor to pass data between models. ### Motivation and Context Using MLTensor instead of ArrayBuffers reduces the number of copies between the CPU and devices as well as the renderer and GPU process in Chromium.
Description
Enables using the MLTensor to pass data between models.
Motivation and Context
Using MLTensor instead of ArrayBuffers reduces the number of copies between the CPU and devices as well as the renderer and GPU process in Chromium.