Benjaming/use stack pids#51
Open
bgn64 wants to merge 2 commits into
Open
Conversation
For merged WPR traces (LogFileName == '[multiple files]'), the first pass built a ThreadToProcess mapping reflecting final thread ownership. The second pass reused this mapping, causing samples from reused thread IDs to be silently attributed to the wrong process. Always reopen the trace between passes so the second pass builds its ThreadToProcess mapping incrementally as events arrive, reflecting temporal thread ownership rather than final state. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TraceEvent reports ProcessID = -1 for ~40% of PerfInfoSample events in WPR traces lacking ThreadDCStart rundown events. Previously these samples were silently dropped, causing significant undercounting vs WPA. StackWalk events immediately follow their PerfInfoSample in the event stream and reliably carry valid ProcessIDs. Use deferred same-pass resolution: when PerfInfoSample has PID=-1, defer sample creation until the following StackWalk event provides the correct ProcessID. A per-thread cache handles subsequent samples without deferral. Applied to all event handlers: PerfInfoSample, PerfInfoPMCSample, StackWalkStack, StackWalkStackKeyKernel, StackWalkStackKeyUser, and BuildProcessSummary. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes missing / misattributed CPU samples when loading merged WPR ETW traces by reopening the trace between passes and using StackWalk events to resolve ProcessID = -1 samples.
Changes:
- Reopen the ETW trace for the main processing pass to rebuild
ThreadToProcessmapping temporally (avoids thread-ID reuse misattribution in merged traces). - Defer
PerfInfoSamplesample creation whenProcessID = -1and resolve it from immediately following StackWalk events (with a per-thread PID cache). - Propagate the resolved PID through StackWalk handlers (StackWalkStack / StackWalkStackKeyKernel / StackWalkStackKeyUser) when creating contexts/stacks.
Comments suppressed due to low confidence (2)
src/ProfileExplorerCore/Profile/ETW/ETWEventProcessor.cs:650
- This handler can return early after calling RentTempContext/AddContext without calling profile.ReturnContext(contextId) (e.g., when the triggering timestamp doesn't match). If AddContext inserted the temp context into the context table, failing to ReturnContext leaves tempContext_ aliased to a stored context and subsequent RentTempContext calls can mutate already-stored contexts. Ensure ReturnContext is called on all exit paths (e.g., wrap the body after AddContext in try/finally).
int sampleId = perThreadLastSampleMap.GetValueOrDefault(data.ThreadID);
var triggeringEventTimestamp = TimeSpan.FromMilliseconds(data.EventTimeStampRelativeMSec);
// Check if the last sample on the core did not trigger this stack collection
if (sampleId == 0 || profile.Samples[sampleId - 1].Time != triggeringEventTimestamp) {
// Check if the last sample from the context did not trigger this stack collection
if (!perContextLastSampleMap.TryGetValue(contextId, out sampleId) ||
profile.Samples[sampleId - 1].Time != triggeringEventTimestamp) {
// We don't know what sample this stack belongs to so we won't collect it
return;
}
src/ProfileExplorerCore/Profile/ETW/ETWEventProcessor.cs:709
- This handler can return early after calling RentTempContext/AddContext without calling profile.ReturnContext(contextId) (e.g., when the triggering timestamp doesn't match). If AddContext inserted the temp context into the context table, failing to ReturnContext leaves tempContext_ aliased to a stored context and subsequent RentTempContext calls can mutate already-stored contexts. Ensure ReturnContext is called on all exit paths (e.g., wrap the body after AddContext in try/finally).
int sampleId = perThreadLastSampleMap.GetValueOrDefault(data.ThreadID);
var triggeringEventTimestamp = TimeSpan.FromMilliseconds(data.EventTimeStampRelativeMSec);
// Check if the last sample on the core did not trigger this stack collection
if (sampleId == 0 || profile.Samples[sampleId - 1].Time != triggeringEventTimestamp) {
// Check if the last sample from the context did not trigger this stack collection
if (!perContextLastSampleMap.TryGetValue(contextId, out sampleId) ||
profile.Samples[sampleId - 1].Time != triggeringEventTimestamp) {
// We don't know what sample this stack belongs to so we won't collect it
return;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
238
to
240
| // Go again over events and accumulate samples to build the process summary. | ||
| source_.Process(); | ||
| profile.Dispose(); |
Comment on lines
+913
to
920
| int pmcProcessId = data.ProcessID; | ||
|
|
||
| if (pmcProcessId < 0) { | ||
| lastKnownThreadPid.TryGetValue(data.ThreadID, out pmcProcessId); | ||
| } | ||
|
|
||
| if (!IsAcceptedProcess(pmcProcessId)) { | ||
| return; // Ignore events from other processes. |
Comment on lines
+493
to
+509
| // Create any deferred sample whose PID was unknown at PerfInfoSample time. | ||
| if (pendingPidSamples.TryGetValue(data.ThreadID, out var pending)) { | ||
| pendingPidSamples.Remove(data.ThreadID); | ||
|
|
||
| if (stackProcessId >= 0 && IsAcceptedProcess(stackProcessId)) { | ||
| var pendingCtx = profile.RentTempContext(stackProcessId, pending.ThreadId, pending.Cpu); | ||
| int pendingCtxId = profile.AddContext(pendingCtx); | ||
| var pendingSample = new ProfileSample(pending.IP, | ||
| TimeSpan.FromMilliseconds(pending.TimestampMs), | ||
| TimeSpan.FromMilliseconds(pending.WeightMs), | ||
| pending.IsKernelCode, pendingCtxId); | ||
| int pendingSampleId = profile.AddSample(pendingSample); | ||
| profile.ReturnContext(pendingCtxId); | ||
| perThreadLastSampleMap[data.ThreadID] = pendingSampleId; | ||
| perContextLastSampleMap[pendingCtxId] = pendingSampleId; | ||
| } | ||
| } |
jhpohovey
approved these changes
May 8, 2026
| // The thread ID -> process ID mapping is used internally. | ||
| if (data.ProcessID < 0) { | ||
| return; | ||
| int processId = data.ProcessID; |
Member
There was a problem hiding this comment.
nit: seems unnecessary to make a copy of data.ProcessID here? since the new variable it looks like is used only one time, and then there are more calls to data.ProcessID farther below
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix: Missing CPU samples in ETW traces with unresolved ProcessIDs
Problem
Profile Explorer shows significantly fewer CPU samples than WPA for the same ETW trace. For a Photoshop process in a WPR trace, WPA shows ~57K samples while Profile Explorer shows ~8.5K.
Two related bugs cause this:
pass reuses this mapping, causing samples from reused thread IDs to be silently attributed to the wrong process (e.g., Photoshop samples counted under PowerShell).
before the trace started, where ETW's post-merge PID resolution has no ThreadStart event to work from. Profile Explorer skips all such samples. However, StackWalkStack events for these same threads
reliably carry valid ProcessIDs — the kernel resolves them at capture time for stack walking purposes.
Fix
Commit 1: Always reopen trace between passes
Ensures the second pass builds its ThreadToProcess mapping incrementally as events arrive, reflecting temporal thread ownership rather than final state. This fixes the misattribution and correctly surfaces
PID = -1 for samples that can't be resolved from thread lifecycle events alone.
Commit 2: Deferred PID resolution from StackWalk events
When PerfInfoSample has PID = -1, defers sample creation until the immediately-following StackWalk event provides the correct ProcessID. A per-thread cache handles subsequent samples on the same thread
without deferral. Applied to all relevant event handlers: PerfInfoSample, PerfInfoPMCSample, StackWalkStack, StackWalkStackKeyKernel, StackWalkStackKeyUser, and BuildProcessSummary.
Testing
Verified against FileSave_PNG.generalprofile.etl using Profile Explorer's actual ProcessEvents and BuildProcessSummary code paths: