Add enable_profiling in runoptions #26846

xiaofeihan1 · 2025-12-22T06:19:10Z

Description

Support run-level profiling

This PR adds support for profiling individual Run executions, similar to session-level profiling. Developers can enable run-level profiling by setting enable_profiling and profile_file_prefix in RunOptions. Once the run completes, a JSON profiling file will be saved using profile_file_prefix + timestamp.

Key Changes

Introduced a local variable run_profiler in InferenceSession::Run, which is destroyed after the run completes. Using a dedicated profiler per run ensures that profiling data is isolated and prevents interleaving or corruption across runs.
To maintain accurate execution time when both session-level and run-level profiling are enabled, overloaded Start and EndTimeAndRecordEvent functions have been added. These allow the caller to provide timestamps instead of relying on std::chrono::high_resolution_clock::now(), avoiding potential timing inaccuracies.
Added a TLS variable tls_run_profiler_ to support run-level profiling with WebGPU Execution Provider (EP). This ensures that when multiple threads enable run-level profiling, each thread logs only to its own WebGPU profiler, keeping thread-specific data isolated.
Use HH:MM:SS.mm instead of HH:MM:SSin the JSON filename to prevent conflicts when profiling multiple consecutive runs.

Motivation and Context

Previously, profiling only for session level. Sometimes developer want to profile for specfic run . so the PR comes.

Some details

When profiling is enabled via RunOptions, it should ideally collect two types of events:

Profiler events
Used to calculate the CPU execution time of each operator.
Execution Provider (EP) profiler events
Used to measure GPU kernel execution time.

Unlike session-level, we need to ensure the collecting events is correct for multiple thread scenario.

For 1, this can be supported easily(sequential_executor.cc). We use a thread-local storage (TLS) variable, RunLevelState (defined in profiler.h), to maintain run-level profiling state for each thread.

For 2, each Execution Provider (EP) has its own profiler implementation, and each EP must ensure correct behavior under run-level profiling. This PR ensures that the WebGPU profiler works correctly with run-level profiling.

Test Cases

Scenario	Example	Expected Result
Concurrent runs on the same session with different run-level profiling settings	t1: `sess1.Run({ enable_profiling: true })` t2: `sess1.Run({ enable_profiling: false })` t3: `sess1.Run({ enable_profiling: true })`	Two trace JSON files are generated: one for `t1` and one for `t3`.
Run-level profiling enabled together with session-level profiling	`sess1 = OrtSession({ enable_profiling: true })` `sess1.Run({ enable_profiling: true })`	Two trace JSON files are generated: one corresponding to session-level profiling and one corresponding to run-level profiling.

Copilot

Pull request overview

This PR adds per-run profiling capability to ONNX Runtime by introducing enable_profiling and profile_file_prefix options to RunOptions. This allows users to enable profiling for individual inference runs independent of session-level profiling, providing more granular control over performance analysis.

Key changes:

Added enable_profiling and profile_file_prefix fields to RunOptions structure
Modified execution providers to accept an enable_profiling parameter in GetProfiler() method
Enhanced timestamp formatting to include milliseconds for more precise profiling file naming

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
include/onnxruntime/core/framework/run_options.h	Added enable_profiling flag and profile_file_prefix configuration
onnxruntime/python/onnxruntime_pybind_state.cc	Exposed new profiling options to Python API
onnxruntime/core/session/inference_session.cc	Implemented run-level profiler creation, initialization, and lifecycle management
include/onnxruntime/core/framework/execution_provider.h	Updated GetProfiler signature to accept enable_profiling parameter
onnxruntime/core/providers/cuda/cuda_execution_provider.h/cc	Updated GetProfiler implementation for CUDA provider
onnxruntime/core/providers/vitisai/vitisai_execution_provider.h/cc	Updated GetProfiler implementation for VitisAI provider
onnxruntime/core/providers/webgpu/webgpu_execution_provider.h/cc	Implemented session vs run profiler separation using thread_local storage
onnxruntime/core/providers/webgpu/webgpu_context.h/cc	Added profiler registration/unregistration and multi-profiler event collection
onnxruntime/core/providers/webgpu/webgpu_profiler.cc	Updated to register/unregister with context and handle event collection
onnxruntime/core/common/profiler.h/cc	Added overloaded Start and EndTimeAndRecordEvent methods accepting explicit timestamps
onnxruntime/core/framework/utils.h/cc	Propagated run_profiler parameter through execution graph functions
onnxruntime/core/framework/sequential_executor.h/cc	Added run_profiler support in SessionScope and KernelScope for dual profiling

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

onnxruntime/core/providers/webgpu/webgpu_context.cc

onnxruntime/core/session/inference_session.cc

onnxruntime/core/providers/webgpu/webgpu_execution_provider.cc

onnxruntime/core/session/inference_session.cc

onnxruntime/core/framework/utils.cc

keep same data impl disable profiling for graph capture stage

yuslepukhin · 2026-01-09T19:34:49Z

onnxruntime/core/framework/utils.cc

-                      run_options.only_execute_path_to_fetches);
+                      run_options.only_execute_path_to_fetches,
+                      nullptr,
+                      run_profiler);


We have a number of things being passed from RunOptions here. Can we modify the signature in a way that a reference to RunOptions is being passed?

Then we can instantiate the profiler higher in the stack, inside ExecuteGraph?

I can see that RunOptions are being passed in one of the overloads, that seems sensible.

yuslepukhin · 2026-01-09T19:47:14Z

onnxruntime/core/framework/sequential_executor.cc

-                                          concurrency::ThreadPool::StopProfiling(session_state_.GetThreadPool())},
-                                     });
+
+      std::initializer_list<std::pair<std::string, std::string>> event_args = {


std::

Should this be constexpr? Should this be outside the destructor?

yuslepukhin · 2026-01-09T19:47:52Z

onnxruntime/core/framework/sequential_executor.cc

-    if (session_state_.Profiler().IsEnabled()) {
-      session_start_ = session_state.Profiler().Start();
+    bool session_profiling_enabled = session_state_.Profiler().IsEnabled();
+    bool run_profiling_enabled = run_profiler_ && run_profiler_->IsEnabled();


yuslepukhin · 2026-01-09T19:48:00Z

onnxruntime/core/framework/sequential_executor.cc

  {
-    if (session_state_.Profiler().IsEnabled()) {
-      session_start_ = session_state.Profiler().Start();
+    bool session_profiling_enabled = session_state_.Profiler().IsEnabled();


yuslepukhin · 2026-01-09T19:49:18Z

onnxruntime/core/framework/sequential_executor.cc

  {
-    if (session_state_.Profiler().IsEnabled()) {
-      session_start_ = session_state.Profiler().Start();
+    bool session_profiling_enabled = session_state_.Profiler().IsEnabled();


session_state_.Profiler().IsEnabled();

I am still not convinced that we should allow both profilers to run in parallel.

Do you have a use case for that? What would be the purpose to collect the same data?

If someone wants continuous profiling, would it not be the same thing as running it with RunOptons?

This depends on how we want to handle the case when both run-level and session-level profiling are enabled.

For example, when a user calls Session::Run with both run-level and session-level profiling enabled, there will be two profilers active: a local run_profiler and the session_profiler_ owned by InferenceSession. The current implementation guarantees that two JSON files are generated, and that the events recorded in the run-level profiling output are a strict subset of those in the session-level profiling output.

In this scenario, each operator execution generates two identical profiling events: one is recorded by the session-level profiler, and the other is recorded by the run-level profiler.

yuslepukhin · 2026-01-09T19:51:06Z

onnxruntime/core/providers/webgpu/webgpu_context.cc

                                     event_args);
-        events.emplace_back(std::move(event));
+
+        // Distribute the event to all WebGPU EP profilers.


Distribute the event to all WebGPU EP profilers.

How many profilers are we expecting?

Let me say the

case 1: session-level: ON, run-level: OFF

session1 = InferenceSession(enable_profiling = true /*session-level*/) thread1: session1.Run(enable_profiling=false /*run-level*/)

There are one Profiler instance, one WebGpuProfiler instance(is owned by Profiler). So here the number of profilers is one. It always use session-level profiler, i.g. session_profiler_ of InfereceSession.

case 2: session-level: OFF, run-level: ON

session1 = InferenceSession(enable_profiling = false /*session-level*/) session1.Run(enable_profiling=true /*run-level*/)

There are one Profiler instance, one WebGpuProfiler instance(is owned by Profiler). So here the number of profilers is one. It always use run-level profiler, i.g. local variable profiler.

case 3: session-level: ON, run-level: ON

session1 = InferenceSession(enable_profiling = true /*session-level*/) session1.Run(enable_profiling=true /*run-level*/)

There are two Profiler instance, two WebGpuProfiler instance(is owned by Profiler). So here the number of profilers is two. It always use run-level profiler.

case 4: session-level: ON, run-level: two threads ON, one thread OFF

session1 = InferenceSession(enable_profiling = true /*session-level*/) thread1: session1.Run(enable_profiling=true /*run-level*/) thread2: session1.Run(enable_profiling=true /*run-level*/) thread3: session1.Run(enable_profiling=false /*run-level*/)

There are three Profiler instances and three WebGpuProfiler instances(are owned by Profiler). Because webgpu ep doesn't support concurrent run yet, so when the number of profilers in this CollectProfilingData is two: one is session-level profiler and another one is one of the two EP profilers(which one is used is determined by the current thread during Run)

yuslepukhin · 2026-01-09T19:53:38Z

onnxruntime/core/providers/webgpu/webgpu_execution_provider.cc

+    profilers.push_back(session_profiler_);
+  }
+
+  if (run_options.enable_profiling && tls_run_profiler_) {


tls_run_profiler_

Why do we need TLS if RunOptions profiler is thread local already? TLS and RunOptions are two different things.
Also who is going to clean up TLS if we no longer need profiling, it would be better to keep everything on the stack or at least Run() scoped.

There are two types of profiling: CPU time, collected by the Profiler, and GPU time, collected by EP profilers (e.g., the WebGPU profiler). Here I have no better way to get correct WebGPU Profiler of current running thread for the above case mentioned before(case 4: session-level: ON, run-level: two threads ON, one thread OFF), so I stored it in TLS.

Oh, I just thought of a possible solution. Do you think we could make Profiler a member of RunOptions, instead of a local variable in InferenceSession::Run?

Before

Status WebGpuExecutionProvider::OnRunEnd(bool /* sync_stream */, const onnxruntime::RunOptions& run_options) { ... if (run_options.enable_profiling && tls_run_profiler_) { if (tls_run_profiler_->Enabled()) { profilers.push_back(tls_run_profiler_); } tls_run_profiler_ = nullptr; } if (!profilers.empty()) { context_.CollectProfilingData(profilers); }

After

Status WebGpuExecutionProvider::OnRunEnd(bool /* sync_stream */, const onnxruntime::RunOptions& run_options) { ... if (run_options.enable_profiling && tls_run_profiler_) { if (tls_run_profiler_->Enabled()) { profilers.push_back(run_options.run_profiler); } if (!profilers.empty()) { context_.CollectProfilingData(profilers); } } ``

yuslepukhin · 2026-01-09T19:57:52Z

include/onnxruntime/core/framework/run_options.h

+  // The actual filename will be: <profile_file_prefix>_<timestamp>.json
+  // Only used when enable_profiling is true.
+  std::string profile_file_prefix = "onnxruntime_run_profile";
+


We need C and C++ API and Python comes after that.

And we need tests

yuslepukhin

🕐

xiaofeihan1 changed the title ~~Add enable_profiling in runoptions~~ [Local variable]Add enable_profiling in runoptions Dec 23, 2025

xiaofeihan1 requested a review from Copilot January 7, 2026 06:42

Copilot started reviewing on behalf of xiaofeihan1 January 7, 2026 06:43 View session

Copilot AI reviewed Jan 7, 2026

View reviewed changes

xiaofeihan1 force-pushed the xiaofeihan/runoptions_profiling branch 3 times, most recently from 828938d to 0022eb0 Compare January 8, 2026 06:53

xiaofeihan1 changed the title ~~[Local variable]Add enable_profiling in runoptions~~ Add enable_profiling in runoptions Jan 8, 2026

xiaofeihan1 force-pushed the xiaofeihan/runoptions_profiling branch from 1fa65ff to 978b59a Compare January 8, 2026 13:45

initial

c48efdb

keep same data impl disable profiling for graph capture stage

xiaofeihan1 force-pushed the xiaofeihan/runoptions_profiling branch from 978b59a to c48efdb Compare January 8, 2026 13:48

yuslepukhin reviewed Jan 9, 2026

View reviewed changes

yuslepukhin requested changes Jan 9, 2026

View reviewed changes

Add enable_profiling in runoptions #26846

Are you sure you want to change the base?

Add enable_profiling in runoptions #26846

Conversation

xiaofeihan1 commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Changes

Motivation and Context

Some details

Test Cases

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

case 1: session-level: ON, run-level: OFF

case 2: session-level: OFF, run-level: ON

case 3: session-level: ON, run-level: ON

case 4: session-level: ON, run-level: two threads ON, one thread OFF

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiaofeihan1 Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuslepukhin left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xiaofeihan1 commented Dec 22, 2025 •

edited

Loading

yuslepukhin Jan 9, 2026 •

edited

Loading

yuslepukhin Jan 9, 2026 •

edited

Loading

xiaofeihan1 Jan 10, 2026 •

edited

Loading