cuda export supported #14478

Gasoonjia · 2025-09-22T19:49:47Z

Summary: this diff introuce the cuda backend that compiles the partitioned model graph to run on CUDA devices. It uses the AOTInductor compiler to generate optimized CUDA kernels for the model's operators with libtorch-free. The compiled model can be executed on CUDA devices using the Executorch runtime.

Differential Revision: D82987410

pytorch-bot · 2025-09-22T19:49:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14478

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 7 New Failures, 5 Pending, 1 Unrelated Failure

As of commit 3779980 with merge base b3f3111 ():

NEW FAILURES - The following jobs have failed:

Build Presets / apple (llm) / build (gh)
The process '/opt/homebrew/bin/git' failed with exit code 128
Build Presets / apple (macos) / build (gh)
The process '/opt/homebrew/bin/git' failed with exit code 128
Build Presets / apple (profiling) / build (gh)
The process '/opt/homebrew/bin/git' failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.10) / linux-job (gh)
RuntimeError: Command docker exec -t 1e4a2a159073c843f07d99c666fa7bc546744b38ff7a587895707e5b71919f6d /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.11) / linux-job (gh)
RuntimeError: Command docker exec -t 2cb96b8d14cd3742cf93ecffd6ebaa5743d3796916c6c5802cc449fb9a2bd0f0 /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.12) / linux-job (gh)
RuntimeError: Command docker exec -t 8e9b9c363d3ba289b2c017b94d47e38b2230bde42196e529c18f51e86f59ac39 /exec failed with exit code 1
pull / test-samsung-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t d33d4565ed8ce53df272904c9e71e721a0f83d3e8707649b1fab6715f3cf5c42 /exec failed with exit code 1

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / macos / macos-job (gh) (trunk failure)
exir/tests/test_quant_fusion_pass.py::TestQuantFusionPass::test_embedding_torchao

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-09-22T19:49:58Z

@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating diff in D82987410.

github-actions · 2025-09-22T19:50:37Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Summary: this diff introuce the cuda backend that compiles the partitioned model graph to run on CUDA devices. It uses the AOTInductor compiler to generate optimized CUDA kernels for the model's operators with libtorch-free. The compiled model can be executed on CUDA devices using the Executorch runtime. Reviewed By: larryliu0820 Differential Revision: D82987410

Summary: this diff introuce the cuda backend that compiles the partitioned model graph to run on CUDA devices. It uses the AOTInductor compiler to generate optimized CUDA kernels for the model's operators with libtorch-free. The compiled model can be executed on CUDA devices using the Executorch runtime. Reviewed By: angelayi, larryliu0820 Differential Revision: D82987410

facebook-github-bot · 2025-09-23T17:18:57Z

@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating diff in D82987410.

JacobSzwejbka · 2025-09-23T17:25:08Z

backends/cuda/cuda_backend.py

+        return PreprocessResult(
+            processed_bytes=b"",
+            debug_handle_map={},
+            data_store_output=named_data_store.get_named_data_store_output(),


Why are you putting this in the named_data_store since the .so is not actually shareable? Just legacy from when we were going to share with nativeRT?

we just want to make sure in et we are using the correct pipeline.
in the future we need to find the way to load .so directly from ptd file which benefits both et loading efficiency and other partners like nativeRT.

Summary: this diff introuce the cuda backend that compiles the partitioned model graph to run on CUDA devices. It uses the AOTInductor compiler to generate optimized CUDA kernels for the model's operators with libtorch-free. The compiled model can be executed on CUDA devices using the Executorch runtime. Reviewed By: angelayi, larryliu0820 Differential Revision: D82987410

facebook-github-bot · 2025-09-23T22:22:39Z

@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating diff in D82987410.

Summary: this diff introuce the cuda backend that compiles the partitioned model graph to run on CUDA devices. It uses the AOTInductor compiler to generate optimized CUDA kernels for the model's operators with libtorch-free. The compiled model can be executed on CUDA devices using the Executorch runtime. Reviewed By: angelayi, larryliu0820 Differential Revision: D82987410

facebook-github-bot · 2025-09-23T23:28:17Z

@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating diff in D82987410.

Summary: this diff introuce the cuda backend that compiles the partitioned model graph to run on CUDA devices. It uses the AOTInductor compiler to generate optimized CUDA kernels for the model's operators with libtorch-free. The compiled model can be executed on CUDA devices using the Executorch runtime. Reviewed By: angelayi, larryliu0820 Differential Revision: D82987410

facebook-github-bot · 2025-09-23T23:33:07Z

@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating diff in D82987410.

Summary: this diff introuce the cuda backend that compiles the partitioned model graph to run on CUDA devices. It uses the AOTInductor compiler to generate optimized CUDA kernels for the model's operators with libtorch-free. The compiled model can be executed on CUDA devices using the Executorch runtime. Reviewed By: angelayi, larryliu0820 Differential Revision: D82987410

facebook-github-bot · 2025-09-24T04:54:20Z

@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating diff in D82987410.

Summary: this diff introuce the cuda backend that compiles the partitioned model graph to run on CUDA devices. It uses the AOTInductor compiler to generate optimized CUDA kernels for the model's operators with libtorch-free. The compiled model can be executed on CUDA devices using the Executorch runtime. Reviewed By: angelayi, larryliu0820 Differential Revision: D82987410

facebook-github-bot · 2025-09-24T17:55:42Z

@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating diff in D82987410.

Summary: this diff introuce the cuda backend that compiles the partitioned model graph to run on CUDA devices. It uses the AOTInductor compiler to generate optimized CUDA kernels for the model's operators with libtorch-free. The compiled model can be executed on CUDA devices using the Executorch runtime. Reviewed By: angelayi, larryliu0820 Differential Revision: D82987410

facebook-github-bot · 2025-09-24T18:20:03Z

@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating diff in D82987410.

Summary: this diff introuce the cuda backend that compiles the partitioned model graph to run on CUDA devices. It uses the AOTInductor compiler to generate optimized CUDA kernels for the model's operators with libtorch-free. The compiled model can be executed on CUDA devices using the Executorch runtime. Reviewed By: angelayi, larryliu0820 Differential Revision: D82987410

facebook-github-bot · 2025-09-25T02:50:48Z

@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating diff in D82987410.

Summary: this diff introuce the cuda backend that compiles the partitioned model graph to run on CUDA devices. It uses the AOTInductor compiler to generate optimized CUDA kernels for the model's operators with libtorch-free. The compiled model can be executed on CUDA devices using the Executorch runtime. Reviewed By: angelayi, larryliu0820 Differential Revision: D82987410

facebook-github-bot · 2025-09-25T02:51:24Z

@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating diff in D82987410.

Summary: this diff introuce the cuda backend that compiles the partitioned model graph to run on CUDA devices. It uses the AOTInductor compiler to generate optimized CUDA kernels for the model's operators with libtorch-free. The compiled model can be executed on CUDA devices using the Executorch runtime. Reviewed By: angelayi, larryliu0820 Differential Revision: D82987410

facebook-github-bot · 2025-09-25T05:13:28Z

@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating diff in D82987410.

Summary: Pull Request resolved: pytorch#14478 this diff introuce the cuda backend that compiles the partitioned model graph to run on CUDA devices. It uses the AOTInductor compiler to generate optimized CUDA kernels for the model's operators with libtorch-free. The compiled model can be executed on CUDA devices using the Executorch runtime. Differential Revision: D82987410 Reviewed By: angelayi, larryliu0820

Gasoonjia requested review from JacobSzwejbka and larryliu0820 as code owners September 22, 2025 19:49

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 22, 2025

facebook-github-bot added fb-exported meta-exported labels Sep 22, 2025

larryliu0820 approved these changes Sep 22, 2025

View reviewed changes

Gasoonjia force-pushed the export-D82987410 branch from e690b0a to 26299fe Compare September 23, 2025 17:18

JacobSzwejbka reviewed Sep 23, 2025

View reviewed changes

Gasoonjia force-pushed the export-D82987410 branch from 26299fe to 2a9e51d Compare September 23, 2025 22:22

Gasoonjia force-pushed the export-D82987410 branch from 2a9e51d to 16cf09f Compare September 23, 2025 23:28

Gasoonjia force-pushed the export-D82987410 branch from 16cf09f to 2ff245a Compare September 23, 2025 23:32

Gasoonjia force-pushed the export-D82987410 branch from 2ff245a to 420593e Compare September 24, 2025 04:54

Gasoonjia force-pushed the export-D82987410 branch from 420593e to 2834b2c Compare September 24, 2025 17:55

Gasoonjia force-pushed the export-D82987410 branch from 2834b2c to caae471 Compare September 24, 2025 18:19

Gasoonjia force-pushed the export-D82987410 branch from caae471 to 243cae8 Compare September 25, 2025 02:50

Gasoonjia force-pushed the export-D82987410 branch from 243cae8 to a3da839 Compare September 25, 2025 02:51

Gasoonjia force-pushed the export-D82987410 branch from a3da839 to 3779980 Compare September 25, 2025 05:13

Gasoonjia closed this Sep 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cuda export supported #14478

cuda export supported #14478

Uh oh!

Gasoonjia commented Sep 22, 2025

Uh oh!

pytorch-bot bot commented Sep 22, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Sep 22, 2025

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

facebook-github-bot commented Sep 23, 2025

Uh oh!

JacobSzwejbka Sep 23, 2025 •

edited

Loading

Uh oh!

Gasoonjia Sep 24, 2025

Uh oh!

facebook-github-bot commented Sep 23, 2025

Uh oh!

facebook-github-bot commented Sep 23, 2025

Uh oh!

facebook-github-bot commented Sep 23, 2025

Uh oh!

facebook-github-bot commented Sep 24, 2025

Uh oh!

facebook-github-bot commented Sep 24, 2025

Uh oh!

facebook-github-bot commented Sep 24, 2025

Uh oh!

facebook-github-bot commented Sep 25, 2025

Uh oh!

facebook-github-bot commented Sep 25, 2025

Uh oh!

facebook-github-bot commented Sep 25, 2025

Uh oh!

Uh oh!

cuda export supported #14478

cuda export supported #14478

Uh oh!

Conversation

Gasoonjia commented Sep 22, 2025

Uh oh!

pytorch-bot bot commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14478

❌ 7 New Failures, 5 Pending, 1 Unrelated Failure

Uh oh!

facebook-github-bot commented Sep 22, 2025

Uh oh!

github-actions bot commented Sep 22, 2025

This PR needs a release notes: label

Uh oh!

facebook-github-bot commented Sep 23, 2025

Uh oh!

JacobSzwejbka Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Gasoonjia Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Sep 23, 2025

Uh oh!

facebook-github-bot commented Sep 23, 2025

Uh oh!

facebook-github-bot commented Sep 23, 2025

Uh oh!

facebook-github-bot commented Sep 24, 2025

Uh oh!

facebook-github-bot commented Sep 24, 2025

Uh oh!

facebook-github-bot commented Sep 24, 2025

Uh oh!

facebook-github-bot commented Sep 25, 2025

Uh oh!

facebook-github-bot commented Sep 25, 2025

Uh oh!

facebook-github-bot commented Sep 25, 2025

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 22, 2025 •

edited

Loading

This PR needs a `release notes:` label

JacobSzwejbka Sep 23, 2025 •

edited

Loading