Skip to content

Benjaming/use stack pids#51

Open
bgn64 wants to merge 2 commits into
mainfrom
benjaming/use-stack-pids
Open

Benjaming/use stack pids#51
bgn64 wants to merge 2 commits into
mainfrom
benjaming/use-stack-pids

Conversation

@bgn64

@bgn64 bgn64 commented May 5, 2026

Copy link
Copy Markdown
Contributor

Fix: Missing CPU samples in ETW traces with unresolved ProcessIDs

Problem

Profile Explorer shows significantly fewer CPU samples than WPA for the same ETW trace. For a Photoshop process in a WPR trace, WPA shows ~57K samples while Profile Explorer shows ~8.5K.

Two related bugs cause this:

  1. Thread-to-process misattribution in merged WPR traces: For traces with LogFileName == "[multiple files]", the first pass builds a ThreadToProcess mapping reflecting final thread ownership. The second
    pass reuses this mapping, causing samples from reused thread IDs to be silently attributed to the wrong process (e.g., Photoshop samples counted under PowerShell).
  2. Unresolved ProcessID samples dropped: TraceEvent reports ProcessID = -1 for ~40% of PerfInfoSample events in WPR traces lacking ThreadDCStart rundown events. These correspond to threads that existed
    before the trace started, where ETW's post-merge PID resolution has no ThreadStart event to work from. Profile Explorer skips all such samples. However, StackWalkStack events for these same threads
    reliably carry valid ProcessIDs — the kernel resolves them at capture time for stack walking purposes.

Fix

Commit 1: Always reopen trace between passes

Ensures the second pass builds its ThreadToProcess mapping incrementally as events arrive, reflecting temporal thread ownership rather than final state. This fixes the misattribution and correctly surfaces
PID = -1 for samples that can't be resolved from thread lifecycle events alone.

Commit 2: Deferred PID resolution from StackWalk events

When PerfInfoSample has PID = -1, defers sample creation until the immediately-following StackWalk event provides the correct ProcessID. A per-thread cache handles subsequent samples on the same thread
without deferral. Applied to all relevant event handlers: PerfInfoSample, PerfInfoPMCSample, StackWalkStack, StackWalkStackKeyKernel, StackWalkStackKeyUser, and BuildProcessSummary.

Testing

Verified against FileSave_PNG.generalprofile.etl using Profile Explorer's actual ProcessEvents and BuildProcessSummary code paths:

  • Before fix: 8,489 samples for PID 14464
  • After fix: 56,892 samples — exact match with WPA

bgn64 and others added 2 commits May 5, 2026 14:56
For merged WPR traces (LogFileName == '[multiple files]'), the first
pass built a ThreadToProcess mapping reflecting final thread ownership.
The second pass reused this mapping, causing samples from reused thread
IDs to be silently attributed to the wrong process.

Always reopen the trace between passes so the second pass builds its
ThreadToProcess mapping incrementally as events arrive, reflecting
temporal thread ownership rather than final state.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TraceEvent reports ProcessID = -1 for ~40% of PerfInfoSample events in
WPR traces lacking ThreadDCStart rundown events. Previously these samples
were silently dropped, causing significant undercounting vs WPA.

StackWalk events immediately follow their PerfInfoSample in the event
stream and reliably carry valid ProcessIDs. Use deferred same-pass
resolution: when PerfInfoSample has PID=-1, defer sample creation until
the following StackWalk event provides the correct ProcessID. A per-thread
cache handles subsequent samples without deferral.

Applied to all event handlers: PerfInfoSample, PerfInfoPMCSample,
StackWalkStack, StackWalkStackKeyKernel, StackWalkStackKeyUser, and
BuildProcessSummary.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 5, 2026 23:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes missing / misattributed CPU samples when loading merged WPR ETW traces by reopening the trace between passes and using StackWalk events to resolve ProcessID = -1 samples.

Changes:

  • Reopen the ETW trace for the main processing pass to rebuild ThreadToProcess mapping temporally (avoids thread-ID reuse misattribution in merged traces).
  • Defer PerfInfoSample sample creation when ProcessID = -1 and resolve it from immediately following StackWalk events (with a per-thread PID cache).
  • Propagate the resolved PID through StackWalk handlers (StackWalkStack / StackWalkStackKeyKernel / StackWalkStackKeyUser) when creating contexts/stacks.
Comments suppressed due to low confidence (2)

src/ProfileExplorerCore/Profile/ETW/ETWEventProcessor.cs:650

  • This handler can return early after calling RentTempContext/AddContext without calling profile.ReturnContext(contextId) (e.g., when the triggering timestamp doesn't match). If AddContext inserted the temp context into the context table, failing to ReturnContext leaves tempContext_ aliased to a stored context and subsequent RentTempContext calls can mutate already-stored contexts. Ensure ReturnContext is called on all exit paths (e.g., wrap the body after AddContext in try/finally).
      int sampleId = perThreadLastSampleMap.GetValueOrDefault(data.ThreadID);
      var triggeringEventTimestamp = TimeSpan.FromMilliseconds(data.EventTimeStampRelativeMSec);

      // Check if the last sample on the core did not trigger this stack collection
      if (sampleId == 0 || profile.Samples[sampleId - 1].Time != triggeringEventTimestamp) {
        // Check if the last sample from the context did not trigger this stack collection
        if (!perContextLastSampleMap.TryGetValue(contextId, out sampleId) ||
            profile.Samples[sampleId - 1].Time != triggeringEventTimestamp) {
          // We don't know what sample this stack belongs to so we won't collect it
          return;
        }

src/ProfileExplorerCore/Profile/ETW/ETWEventProcessor.cs:709

  • This handler can return early after calling RentTempContext/AddContext without calling profile.ReturnContext(contextId) (e.g., when the triggering timestamp doesn't match). If AddContext inserted the temp context into the context table, failing to ReturnContext leaves tempContext_ aliased to a stored context and subsequent RentTempContext calls can mutate already-stored contexts. Ensure ReturnContext is called on all exit paths (e.g., wrap the body after AddContext in try/finally).
      int sampleId = perThreadLastSampleMap.GetValueOrDefault(data.ThreadID);
      var triggeringEventTimestamp = TimeSpan.FromMilliseconds(data.EventTimeStampRelativeMSec);

      // Check if the last sample on the core did not trigger this stack collection
      if (sampleId == 0 || profile.Samples[sampleId - 1].Time != triggeringEventTimestamp) {
        // Check if the last sample from the context did not trigger this stack collection
        if (!perContextLastSampleMap.TryGetValue(contextId, out sampleId) ||
            profile.Samples[sampleId - 1].Time != triggeringEventTimestamp) {
          // We don't know what sample this stack belongs to so we won't collect it
          return;
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 238 to 240
// Go again over events and accumulate samples to build the process summary.
source_.Process();
profile.Dispose();
Comment on lines +913 to 920
int pmcProcessId = data.ProcessID;

if (pmcProcessId < 0) {
lastKnownThreadPid.TryGetValue(data.ThreadID, out pmcProcessId);
}

if (!IsAcceptedProcess(pmcProcessId)) {
return; // Ignore events from other processes.
Comment on lines +493 to +509
// Create any deferred sample whose PID was unknown at PerfInfoSample time.
if (pendingPidSamples.TryGetValue(data.ThreadID, out var pending)) {
pendingPidSamples.Remove(data.ThreadID);

if (stackProcessId >= 0 && IsAcceptedProcess(stackProcessId)) {
var pendingCtx = profile.RentTempContext(stackProcessId, pending.ThreadId, pending.Cpu);
int pendingCtxId = profile.AddContext(pendingCtx);
var pendingSample = new ProfileSample(pending.IP,
TimeSpan.FromMilliseconds(pending.TimestampMs),
TimeSpan.FromMilliseconds(pending.WeightMs),
pending.IsKernelCode, pendingCtxId);
int pendingSampleId = profile.AddSample(pendingSample);
profile.ReturnContext(pendingCtxId);
perThreadLastSampleMap[data.ThreadID] = pendingSampleId;
perContextLastSampleMap[pendingCtxId] = pendingSampleId;
}
}
// The thread ID -> process ID mapping is used internally.
if (data.ProcessID < 0) {
return;
int processId = data.ProcessID;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: seems unnecessary to make a copy of data.ProcessID here? since the new variable it looks like is used only one time, and then there are more calls to data.ProcessID farther below

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants