Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sve: *FirstFaulting* are mostly failing in gcstress runs #112463

Open
kunalspathak opened this issue Feb 12, 2025 · 9 comments
Open

Sve: *FirstFaulting* are mostly failing in gcstress runs #112463

kunalspathak opened this issue Feb 12, 2025 · 9 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support blocking-clean-ci-optional Blocking optional rolling runs
Milestone

Comments

@kunalspathak
Copy link
Member

When working on #112362, I see there are other gcstress failures for SVE tests.

Discussion: #112389 (comment)

Sample run: https://dev.azure.com/dnceng-public/public/_build/results?buildId=948285&view=ms.vss-test-web.build-test-results-tab&runId=25190942&resultId=169023&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab

@kunalspathak kunalspathak self-assigned this Feb 12, 2025
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 12, 2025
@kunalspathak kunalspathak added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Feb 12, 2025
@kunalspathak
Copy link
Member Author

@dotnet/arm64-contrib

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Feb 12, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@a74nh
Copy link
Contributor

a74nh commented Feb 12, 2025

I wonder if this a failure in the mechanism that tracks the first faulting state?

@JulieLeeMSFT JulieLeeMSFT added this to the 10.0.0 milestone Feb 12, 2025
@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Feb 12, 2025
@kunalspathak
Copy link
Member Author

It looks like we are in stuck in some kind of deadlock in gc code and not sure if it something that these tests exposed in GC or we are encoding wrong gc information that is making us end up in this state. @mrsharm - lets work offline to see what is going on.

call stack
 0:005> ~*kc

   0  Id: 2d00.3224 Suspend: 1 Teb: 00000091`d595c000 Unfrozen
 # Call Site
00 ntdll!NtWaitForSingleObject
01 KERNELBASE!WaitForSingleObjectEx
02 coreclr!GCEvent::Impl::Wait
03 coreclr!GCEvent::Wait
04 coreclr!WKS::gc_heap::wait_for_gc_done
05 coreclr!WKS::gc_heap::try_allocate_more_space
06 coreclr!WKS::gc_heap::allocate_more_space
07 coreclr!WKS::gc_heap::allocate
08 coreclr!WKS::GCHeap::Alloc
09 coreclr!Alloc
0a coreclr!Alloc
0b coreclr!AllocateObject
0c coreclr!AllocateObject
0d coreclr!EEException::CreateThrowable
0e coreclr!CreateCOMPlusExceptionObject
0f coreclr!ExceptionTracker::CreateThrowable
10 coreclr!ProcessCLRExceptionNew
11 coreclr!ProcessCLRException
12 ntdll!RtlpExecuteHandlerForUnwind
13 ntdll!RtlUnwindEx
14 ntdll!RtlUnwind
15 coreclr!ClrUnwindEx
16 coreclr!ProcessCLRExceptionNew
17 coreclr!ProcessCLRException
18 ntdll!RtlpExecuteHandlerForException
19 ntdll!RtlDispatchException
1a ntdll!KiUserExceptionDispatch
1b KERNELBASE!#DebugBreak
1c coreclr!WKS::FATAL_GC_ERROR
1d coreclr!WKS::gc_heap::verify_free_lists
1e coreclr!WKS::gc_heap::verify_heap
1f coreclr!WKS::gc_heap::garbage_collect
20 coreclr!WKS::GCHeap::GarbageCollectGeneration
21 coreclr!WKS::GCHeap::GarbageCollectTry
22 coreclr!WKS::GCHeap::StressHeap
23 coreclr!DoGcStress
24 coreclr!OnGcCoverageInterrupt
25 coreclr!IsGcMarker
26 coreclr!CLRVectoredExceptionHandlerShim
27 ntdll!RtlpCallVectoredHandlers
28 ntdll!RtlDispatchException
29 ntdll!KiUserExceptionDispatch
2a 0x0
2b 0x0
2c 0x0
2d 0x0
2e 0x0
2f coreclr!CallDescrWorkerInternal
30 coreclr!CallDescrWorkerWithHandler
31 coreclr!MethodDescCallSite::CallTargetWorker
32 coreclr!MethodDescCallSite::Call_RetArgSlot
33 coreclr!RunMainInternal
34 coreclr!RunMain::__l30::__Body::Run::__l5::__Body::Run
35 coreclr!`RunMain'::`30'::__Body::Run
36 coreclr!RunMain
37 coreclr!Assembly::ExecuteMainMethod
38 coreclr!CorHost2::ExecuteAssembly
39 coreclr!coreclr_execute_assembly
3a corerun
3b corerun!GetCurrentClrDetails
3c corerun!GetCurrentClrDetails
3d corerun!GetCurrentClrDetails
3e corerun!GetCurrentClrDetails
3f corerun!GetCurrentClrDetails
40 KERNEL32!BaseThreadInitThunk
41 ntdll!RtlUserThreadStart

   1  Id: 2d00.ed4 Suspend: 1 Teb: 00000091`d5962000 Unfrozen ".NET EventPipe"
 # Call Site
00 ntdll!NtWaitForMultipleObjects
01 KERNELBASE!WaitForMultipleObjectsEx
02 coreclr!ds_ipc_poll
03 coreclr!ds_ipc_stream_factory_get_next_available_stream
04 coreclr!server_thread
05 KERNEL32!BaseThreadInitThunk
06 ntdll!RtlUserThreadStart

   2  Id: 2d00.24e4 Suspend: 1 Teb: 00000091`d5964000 Unfrozen ".NET Debugger"
 # Call Site
00 ntdll!NtWaitForMultipleObjects
01 KERNELBASE!WaitForMultipleObjectsEx
02 coreclr!DebuggerRCThread::MainLoop
03 coreclr!DebuggerRCThread::ThreadProc
04 coreclr!DebuggerRCThread::ThreadProcStatic
05 KERNEL32!BaseThreadInitThunk
06 ntdll!RtlUserThreadStart

   3  Id: 2d00.eec Suspend: 1 Teb: 00000091`d5966000 Unfrozen ".NET Finalizer"
 # Call Site
00 ntdll!NtWaitForSingleObject
01 KERNELBASE!WaitForSingleObjectEx
02 coreclr!GCEvent::Impl::Wait
03 coreclr!GCEvent::Wait
04 coreclr!WKS::GCHeap::WaitUntilGCComplete
05 coreclr!Thread::RareDisablePreemptiveGC
06 coreclr!Thread::DisablePreemptiveGC
07 coreclr!Thread::PulseGCMode
08 coreclr!JIT_MonEnter_Helper
09 coreclr!JIT_MonReliableEnter_Portable
0a 0x0

   4  Id: 2d00.2954 Suspend: 1 Teb: 00000091`d5980000 Unfrozen
 # Call Site
00 ntdll!NtWaitForWorkViaWorkerFactory
01 ntdll!TppWorkerThread
02 KERNEL32!BaseThreadInitThunk
03 ntdll!RtlUserThreadStart

#  5  Id: 2d00.310c Suspend: 1 Teb: 00000091`d5982000 Unfrozen
 # Call Site
00 ntdll!DbgBreakPoint
01 ntdll!DbgUiRemoteBreakin
02 KERNEL32!BaseThreadInitThunk
03 ntdll!RtlUserThreadStart


@kunalspathak
Copy link
Member Author

Seems like related to #112203, #110350 and #105780

@kunalspathak
Copy link
Member Author

@mangod9

@kunalspathak kunalspathak added the blocking-clean-ci-optional Blocking optional rolling runs label Feb 13, 2025
@kunalspathak
Copy link
Member Author

It turns out that we use lot of Unsafe.CopyBlockUnaligned in tests and while doing that we are not copying the right number of elements. That is overwriting the GC data and causing such crashes. I am guessing #112264 is also manifestation of it.

@mangod9
Copy link
Member

mangod9 commented Feb 18, 2025

so these are jit tests using unsafe* ?

@kunalspathak
Copy link
Member Author

so these are jit tests using unsafe* ?

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support blocking-clean-ci-optional Blocking optional rolling runs
Projects
None yet
Development

No branches or pull requests

4 participants