Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SB Scenario tests are intermittently failing to execute #4952

Closed
ellahathaway opened this issue Mar 17, 2025 · 21 comments
Closed

SB Scenario tests are intermittently failing to execute #4952

ellahathaway opened this issue Mar 17, 2025 · 21 comments
Labels
area-upstream-fix Needs a change in a contributing repo ops-monitor Issues created/handled by the source build monitor role

Comments

@ellahathaway
Copy link
Member

[FAIL] Microsoft.DotNet.ScenarioTests.SdkTemplateTests.SdkTemplateTests.VerifyMSTestTemplate(language: VB)
System.InvalidOperationException : Failed to execute /__w/1/s/artifacts/obj/extracted-dotnet-sdk/dotnet new globaljson --sdk-version 10.0.100-preview.3.25167.1
    Exit code: 139
    The template "global.json file" was created successfully.

       at Microsoft.DotNet.ScenarioTests.SdkTemplateTests.DotNetSdkHelper.ExecuteCmd(String args, String workingDirectory, Action`1 additionalProcessConfigCallback, Int32 expectedExitCode, Int32 millisecondTimeout) in /_/src/scenario-tests/src/Microsoft.DotNet.ScenarioTests.SdkTemplateTests/DotNetSdkHelper.cs:line 38
       at Microsoft.DotNet.ScenarioTests.SdkTemplateTests.DotNetSdkHelper.ExecuteNew(String projectType, String projectName, String projectDirectory, String language, String customArgs) in /_/src/scenario-tests/src/Microsoft.DotNet.ScenarioTests.SdkTemplateTests/DotNetSdkHelper.cs:line 113
       at Microsoft.DotNet.ScenarioTests.SdkTemplateTests.SdkTemplateTest.Execute(DotNetSdkHelper dotNetHelper, String testRoot, String[] frameworks, String PreMadeSolution, Int32 retryCounter) in /_/src/scenario-tests/src/Microsoft.DotNet.ScenarioTests.SdkTemplateTests/SdkTemplateTest.cs:line 42
       at Microsoft.DotNet.ScenarioTests.SdkTemplateTests.SdkTemplateTests.VerifyMSTestTemplate(DotNetLanguage language) in /_/src/scenario-tests/src/Microsoft.DotNet.ScenarioTests.SdkTemplateTests/SdkTemplateTests.cs:line 99
       at InvokeStub_SdkTemplateTests.VerifyMSTestTemplate(Object, Span`1)
       at System.Reflection.MethodBaseInvoker.InvokeWithOneArg(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)

https://dev.azure.com/dnceng/internal/_build/results?buildId=2665635&view=logs&j=609589e2-4f74-5576-cdb7-914bcaea778b&t=85a3ad82-e833-5143-3e25-ec1affd2c703&l=251

@ellahathaway ellahathaway added the ops-monitor Issues created/handled by the source build monitor role label Mar 17, 2025
@ellahathaway
Copy link
Member Author

@ViktorHofer - have you seen this failure before? It's affecting PRs in sdk too.

@ViktorHofer
Copy link
Member

Yes. The assumption is that this gets fixed with dotnet/runtime#113598

This got introduced with dotnet/sdk#47580 (comment)

@ViktorHofer
Copy link
Member

It looks like dotnet/sdk@f981d32 didn't help. I still see the crash in jobs with that commit in HEAD.

@akoeplinger
Copy link
Member

@ViktorHofer it makes sense that dotnet/runtime#113598 wouldn't have fixed this since the VMR uses it's own versions of the container images in https://github.com/dotnet/sdk/blob/f255b7461bef7db14620199c3aa00626752edb52/eng/pipelines/templates/variables/vmr-build.yml#L118-L121. We'd need to do the pinning there.

@ViktorHofer
Copy link
Member

ViktorHofer commented Mar 18, 2025

Makes sense. This is failing in the source-build leg most of the time, which uses this image: https://github.com/dotnet/sdk/blob/f255b7461bef7db14620199c3aa00626752edb52/eng/pipelines/templates/variables/vmr-build.yml#L98-L101

Do we know the list of images that need to be pinned? cc @richlander @MichalStrehovsky

@akoeplinger
Copy link
Member

@ViktorHofer hm that's actually an interesting point, clang wasn't updated in the centos-stream9 from what I can see so that one doesn't seem related.

@ellahathaway
Copy link
Member Author

@akoeplinger @ViktorHofer - There was a rebuild of CentOS on Mar 11th but this issue didn't start showing up until Mar 14th. Would this particular issue affect the centos-stream9 image given that timeline?

cc @mthalman

@akoeplinger
Copy link
Member

yeah I think we'd have noticed earlier if it was introduced on the 11th.

my current bet is something in the runtime bump caused it, but I wasn't able to reproduce it so far so maybe it's specific to the Azure SKU we're using.

the only change that could possibly be related are the SIMD changes.

@MichaelSimons
Copy link
Member

the only change that could possibly be related are the SIMD changes.

Are those changes straight forward to revert? Would it be worthwhile to revert them in a VMR PR and run the SB Lite pipeline over them a few times? That pipeline has been failing more times than not.

@ViktorHofer
Copy link
Member

I'm seeing a similar test failing with the same crash and exit code but on the Ubuntu job:

    [FAIL] Microsoft.DotNet.ScenarioTests.SdkTemplateTests.SdkTemplateTests.VerifyNUnitTemplate(language: CSharp)
    System.InvalidOperationException : Failed to execute /__w/1/vmr/artifacts/obj/extracted-dotnet-sdk/dotnet new globaljson --sdk-version 10.0.100-ci
    Exit code: 139
    The template "global.json file" was created successfully.
    
       at Microsoft.DotNet.ScenarioTests.Common.ExecuteHelper.ValidateExitCode(ValueTuple`3 result, Int32 expectedExitCode) in /_/src/scenario-tests/src/Microsoft.DotNet.ScenarioTests.Common/ExecuteHelper.cs:line 112
       at Microsoft.DotNet.ScenarioTests.SdkTemplateTests.DotNetSdkHelper.ExecuteCmd(String args, String workingDirectory, Action`1 additionalProcessConfigCallback, Int32 expectedExitCode, Int32 millisecondTimeout) in /_/src/scenario-tests/src/Microsoft.DotNet.ScenarioTests.SdkTemplateTests/DotNetSdkHelper.cs:line 38
       at Microsoft.DotNet.ScenarioTests.SdkTemplateTests.DotNetSdkHelper.ExecuteNew(String projectType, String projectName, String projectDirectory, String language, String customArgs) in /_/src/scenario-tests/src/Microsoft.DotNet.ScenarioTests.SdkTemplateTests/DotNetSdkHelper.cs:line 113
       at Microsoft.DotNet.ScenarioTests.SdkTemplateTests.SdkTemplateTest.Execute(DotNetSdkHelper dotNetHelper, String testRoot, String[] frameworks, String PreMadeSolution, Int32 retryCounter) in /_/src/scenario-tests/src/Microsoft.DotNet.ScenarioTests.SdkTemplateTests/SdkTemplateTest.cs:line 42
       at Microsoft.DotNet.ScenarioTests.SdkTemplateTests.SdkTemplateTests.VerifyNUnitTemplate(DotNetLanguage language) in /_/src/scenario-tests/src/Microsoft.DotNet.ScenarioTests.SdkTemplateTests/SdkTemplateTests.cs:line 89
       at System.Reflection.MethodBaseInvoker.InterpretedInvoke_Method(Object obj, IntPtr* args)
       at System.Reflection.MethodBaseInvoker.InvokeDirectByRefWithFewArgs(Object obj, Span`1 copyOfArgs, BindingFlags invokeAttr)
    [SKIP] Microsoft.DotNet.ScenarioTests.SdkTemplateTests.SdkTemplateTests.VerifyWorkloadCmd
    [SKIP] Microsoft.DotNet.ScenarioTests.SdkTemplateTests.SdkTemplateTests.VerifyAspireTemplate
    Finished Microsoft.DotNet.ScenarioTests.SdkTemplateTests, Version=42.42.42.42, Culture=neutral, PublicKeyToken=31bf3856ad364e35
    
    Tests run: 34, Errors: 0, Failures: 1, Skipped: 2. Time: 343.1768632s

from https://dev.azure.com/dnceng-public/public/_build/results?buildId=985840&view=logs&jobId=879e79ff-d5e4-564a-9322-bee9d0a651ca&j=879e79ff-d5e4-564a-9322-bee9d0a651ca&t=32006230-9129-59ef-d8ca-594d44d5476a

@jkotas
Copy link
Member

jkotas commented Mar 19, 2025

From dotnet/sdk#47580 (comment) : The suspect commits range dotnet/runtime@5ff417f...c3d95b4

I think dotnet/runtime@035ee5c is the most likely related. @elinor-fung Could you please take a look?

@ellahathaway ellahathaway changed the title VerifyMSTestTemplate Test is Failing SB Scenario tests are intermittently failing to execute Mar 19, 2025
@ellahathaway
Copy link
Member Author

Updated the title to better reflect the issue.

@elinor-fung
Copy link
Member

Is there simple way to enable/upload crash dumps from the CI?

I can't seem to get a local repro. I tried:

  1. With image: mcr.microsoft.com/dotnet-buildtools/prereqs:ubuntu-24.04
  2. Downloaded assets and packages from one of the failing CI builds: https://dev.azure.com/dnceng-public/public/_build/results?buildId=985840&view=results
  3. Built scenarios-tests repo
  4. Ran Microsoft.DotNet.ScenarioTests.SdkTemplateTests.SdkTemplateTests.VerifyNUnitTemplate (the test that failed in that build) 100 times using the SDK and packages from (2)
  5. Ran dotnet new globaljson 100 times using the SDK from (2)

@akoeplinger
Copy link
Member

I was able to reproduce it once right now on a Standard_D4as_v4 VM in the helix-repro-vms DTL when doing the full source build and build.sh --test but the container wasn't configured to capture core dumps .... will try again, it's getting a bit late here :)

@akoeplinger
Copy link
Member

@elinor-fung success! I uploaded the coredump and dotnet binaries to my OneDrive, let me know if you need anything else.

This is from a mcr.microsoft.com/dotnet-buildtools/prereqs:centos-stream9 instance with the vmr repo mapped to the container in /hostvm/dotnet

(gdb) bt
#0  0x0000000000000140 in ?? ()
#1  0x00007d84d713a3ff in HostInformation::ExternalAssemblyProbe (path=..., data=0x7d844fdfb0e0, size=0x7d844fdfb0d8) at /hostvm/dotnet/src/runtime/src/coreclr/vm/hostinformation.cpp:56
#2  0x00007d84d70eea45 in AssemblyProbeExtension::Probe (path=..., pathIsBundleRelative=<optimized out>) at /hostvm/dotnet/src/runtime/src/coreclr/vm/assemblyprobeextension.cpp:36
#3  0x00007d84d735e895 in BINDER_SPACE::AssemblyBinderCommon::BindByTpaList (pApplicationContext=pApplicationContext@entry=0x5dfaabd87200, pRequestedAssemblyName=pRequestedAssemblyName@entry=0x7d4388047d70, excludeAppPaths=false, pBindResult=pBindResult@entry=0x7d844fdfb6f8) at /hostvm/dotnet/src/runtime/src/coreclr/binder/assemblybindercommon.cpp:858
#4  0x00007d84d735e391 in BINDER_SPACE::AssemblyBinderCommon::BindLocked (pApplicationContext=pApplicationContext@entry=0x5dfaabd87200, pAssemblyName=pAssemblyName@entry=0x7d4388047d70, skipVersionCompatibilityCheck=false, excludeAppPaths=false, pBindResult=pBindResult@entry=0x7d844fdfb6f8) at /hostvm/dotnet/src/runtime/src/coreclr/binder/assemblybindercommon.cpp:510
#5  0x00007d84d735cedb in BINDER_SPACE::AssemblyBinderCommon::BindByName (pApplicationContext=pApplicationContext@entry=0x5dfaabd87200, pAssemblyName=pAssemblyName@entry=0x7d4388047d70, skipFailureCaching=false, skipVersionCompatibilityCheck=false, excludeAppPaths=false, pBindResult=pBindResult@entry=0x7d844fdfb6f8) at /hostvm/dotnet/src/runtime/src/coreclr/binder/assemblybindercommon.cpp:430
#6  0x00007d84d735ca55 in BINDER_SPACE::AssemblyBinderCommon::BindAssembly (pBinder=0x5dfaabd871e0, pAssemblyName=0x7d4388047d70, excludeAppPaths=<optimized out>, ppAssembly=0x7d844fdfb920) at /hostvm/dotnet/src/runtime/src/coreclr/binder/assemblybindercommon.cpp:211
#7  0x00007d84d73648c2 in DefaultAssemblyBinder::BindAssemblyByNameWorker (this=0x5dfaabd871e0, pAssemblyName=0x7d4388047d70, ppCoreCLRFoundAssembly=0x7d844fdfb920, excludeAppPaths=false) at /hostvm/dotnet/src/runtime/src/coreclr/binder/defaultassemblybinder.cpp:26
#8  DefaultAssemblyBinder::BindUsingAssemblyName (this=0x7d844fdfaea0, pAssemblyName=0x7d844fdfb0e0, ppAssembly=0x7d844fdfb9d8) at /hostvm/dotnet/src/runtime/src/coreclr/binder/defaultassemblybinder.cpp:51
#9  0x00007d84d6fd5f7c in RuntimeInvokeHostAssemblyResolver (pManagedAssemblyLoadContextToBindWithin=138009493443344, pAssemblyName=0x7d4388047d70, pDefaultBinder=0x5dfaabd871e0, pBinder=0x7d43880437b0, ppLoadedAssembly=ppLoadedAssembly@entry=0x7d844fdfbd68) at /hostvm/dotnet/src/runtime/src/coreclr/vm/appdomain.cpp:4224
#10 0x00007d84d73602ea in BINDER_SPACE::AssemblyBinderCommon::BindUsingHostAssemblyResolver (pManagedAssemblyLoadContextToBindWithin=138007229214368, pAssemblyName=0x7d844fdfb0e0, pDefaultBinder=0x7d844fdfb0d8, pBinder=0xaeb3168317437500, ppAssembly=0x7d844fdfbd90) at /hostvm/dotnet/src/runtime/src/coreclr/binder/assemblybindercommon.cpp:1159
#11 0x00007d84d7368614 in CustomAssemblyBinder::BindUsingAssemblyName (this=0x7d43880437b0, pAssemblyName=0x7d4388047d70, ppAssembly=0x7d844fdfbe48) at /hostvm/dotnet/src/runtime/src/coreclr/binder/customassemblybinder.cpp:75
#12 0x00007d84d6fdf858 in AssemblyBinder::BindAssemblyByName (this=0x7d43880437b0, pAssemblyNameData=<optimized out>, ppAssembly=0x7d844fdfbe48) at /hostvm/dotnet/src/runtime/src/coreclr/vm/assemblybinder.cpp:23
#13 0x00007d84d7112845 in AssemblySpec::Bind (this=0x7d844fdfc8e0, pAppDomain=<optimized out>, ppAssembly=0x7d844fdfc328) at /hostvm/dotnet/src/runtime/src/coreclr/vm/coreassemblyspec.cpp:66
#14 0x00007d84d6fd388c in AppDomain::BindAssemblySpec (this=0x5dfaabd84ff0, pSpec=0x7d844fdfc8e0, fThrowOnFileNotFound=1) at /hostvm/dotnet/src/runtime/src/coreclr/vm/appdomain.cpp:3172
#15 0x00007d84d7086cf7 in PEAssembly::LoadAssembly (this=0x7d438808bcb0, kAssemblyRef=587202563) at /hostvm/dotnet/src/runtime/src/coreclr/vm/peassembly.cpp:492
#16 0x00007d84d6febfdf in Module::LoadAssemblyImpl (this=0x7d845df16b18, kAssemblyRef=1340059872) at /hostvm/dotnet/src/runtime/src/coreclr/vm/ceeload.cpp:2418
#17 0x00007d84d6fdbbd0 in ModuleBase::LoadAssembly (this=0x7d845df16b18, kAssemblyRef=1340059872) at /hostvm/dotnet/src/runtime/src/coreclr/vm/ceeload.h:577
#18 Assembly::FindModuleByTypeRef (pModule=0x7d845df16b18, tkType=1340059872, loadFlag=Loader::Load, pfNoResolutionScope=0x7d844fdfcaf4) at /hostvm/dotnet/src/runtime/src/coreclr/vm/assembly.cpp:878
#19 0x00007d84d6ffdbb5 in ClassLoader::LoadTypeDefOrRefThrowing (pModule=<optimized out>, typeDefOrRef=<optimized out>, fNotFoundAction=ClassLoader::ThrowIfNotFound, fUninstantiated=ClassLoader::FailIfUninstDefOrRef, tokenNotToLoad=0, level=CLASS_LOADED) at /hostvm/dotnet/src/runtime/src/coreclr/vm/clsload.cpp:2075
#20 0x00007d84d70a582e in SigPointer::GetTypeHandleThrowing (this=<optimized out>, pModule=0x480c5af1e769dbfa, pTypeContext=<optimized out>, fLoadTypes=ClassLoader::LoadTypes, level=27, dropGenericArgumentLevel=0, pSubst=0x0, pZapSigContext=0x0, pMTInterfaceMapOwner=0x0, pRecursiveFieldGenericHandling=0x0) at /hostvm/dotnet/src/runtime/src/coreclr/vm/siginfo.cpp:1663
#21 0x00007d84d70567cc in CEEInfo::getArgClass (this=<optimized out>, sig=0x7d844fdfd270, args=<optimized out>) at /hostvm/dotnet/src/runtime/src/coreclr/vm/jitinterface.cpp:9740
#22 0x00007d84c7dac145 in Compiler::lvaInitTypeRef (this=0x5dfaabdd97c8) at /hostvm/dotnet/src/runtime/src/coreclr/jit/lclvars.cpp:246
#23 0x00007d84c7cd5498 in Compiler::compCompileHelper (this=this@entry=0x5dfaabdd97c8, classPtr=classPtr@entry=0x7d845df16b18, compHnd=0x7d844fdfd370, methodInfo=0x7d844fdfd1d0, methodCodePtr=methodCodePtr@entry=0x7d844fdfcfd8, methodCodeSize=methodCodeSize@entry=0x7d844fdfd18c, compileFlags=0x7d844fdfd000) at /hostvm/dotnet/src/runtime/src/coreclr/jit/compiler.cpp:6863
#24 0x00007d84c7cd4018 in Compiler::compCompile(CORINFO_MODULE_STRUCT_*, void**, unsigned int*, JitFlags*)::$_0::operator()(Compiler::compCompile(CORINFO_MODULE_STRUCT_*, void**, unsigned int*, JitFlags*)::__JITParam*) const (this=<optimized out>, __JITpParam=<optimized out>) at /hostvm/dotnet/src/runtime/src/coreclr/jit/compiler.cpp:6314
#25 Compiler::compCompile (this=0x5dfaabdd97c8, classPtr=0x7d845df16b18, methodCodePtr=0x7d844fdfcfd8, methodCodeSize=0x7d844fdfd18c, compileFlags=0x7d844fdfd000) at /hostvm/dotnet/src/runtime/src/coreclr/jit/compiler.cpp:6333
#26 0x00007d84c7cd6157 in jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*)::$_0::operator()(jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*)::__JITParam*) const::{lambda(jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*)::$_0::operator()(jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*)::__JITParam*) const::__JITParam*)#1}::operator()(jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*)::$_0::operator()(jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*)::__JITParam*) const::__JITParam*) const (this=<optimized out>, __JITpParam=<optimized out>) at /hostvm/dotnet/src/runtime/src/coreclr/jit/compiler.cpp:7777
#27 jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*)::$_0::operator()(jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*)::__JITParam*) const (this=<optimized out>, __JITpParam=<optimized out>) at /hostvm/dotnet/src/runtime/src/coreclr/jit/compiler.cpp:7801
#28 jitNativeCode (methodHnd=0x7d845df19b00, classPtr=0x7d845df16b18, compHnd=compHnd@entry=0x7d844fdfd370, methodInfo=methodInfo@entry=0x7d844fdfd1d0, methodCodePtr=methodCodePtr@entry=0x7d844fdfcfd8, methodCodeSize=methodCodeSize@entry=0x7d844fdfd18c, compileFlags=0x7d844fdfd000, inlineInfoPtr=0x0) at /hostvm/dotnet/src/runtime/src/coreclr/jit/compiler.cpp:7803
#29 0x00007d84c7cda6b4 in CILJit::compileMethod (this=<optimized out>, compHnd=0x7d844fdfd370, methodInfo=0x7d844fdfd1d0, flags=<optimized out>, entryAddress=0x7d844fdfd190, nativeSizeOfCode=0x7d844fdfd18c) at /hostvm/dotnet/src/runtime/src/coreclr/jit/ee_il_dll.cpp:301
#30 0x00007d84d705a866 in invokeCompileMethodHelper (jitMgr=jitMgr@entry=0x5dfaabcf6a90, comp=comp@entry=0x7d844fdfd370, info=info@entry=0x7d844fdfd1d0, jitFlags=..., nativeEntry=nativeEntry@entry=0x7d844fdfd190, nativeSizeOfCode=nativeSizeOfCode@entry=0x7d844fdfd18c) at /hostvm/dotnet/src/runtime/src/coreclr/vm/jitinterface.cpp:12519
#31 0x00007d84d705aa5a in invokeCompileMethod (jitMgr=jitMgr@entry=0x5dfaabcf6a90, comp=comp@entry=0x7d844fdfd370, info=info@entry=0x7d844fdfd1d0, jitFlags=..., nativeEntry=nativeEntry@entry=0x7d844fdfd190, nativeSizeOfCode=nativeSizeOfCode@entry=0x7d844fdfd18c) at /hostvm/dotnet/src/runtime/src/coreclr/vm/jitinterface.cpp:12572
#32 0x00007d84d705b5e7 in UnsafeJitFunction (config=config@entry=0x7d844fdfd8e0, ILHeader=ILHeader@entry=0x7d844fdfd640, pJitFlags=pJitFlags@entry=0x7d844fdfd560, pSizeOfCode=pSizeOfCode@entry=0x7d844fdfd67c) at /hostvm/dotnet/src/runtime/src/coreclr/vm/jitinterface.cpp:13016
#33 0x00007d84d709598a in MethodDesc::JitCompileCodeLocked (this=this@entry=0x7d845df19b00, pConfig=pConfig@entry=0x7d844fdfd8e0, pilHeader=pilHeader@entry=0x7d844fdfd640, pEntry=pEntry@entry=0x7d437805db80, pSizeOfCode=pSizeOfCode@entry=0x7d844fdfd67c) at /hostvm/dotnet/src/runtime/src/coreclr/vm/prestub.cpp:919
#34 0x00007d84d709526e in MethodDesc::JitCompileCodeLockedEventWrapper (this=this@entry=0x7d845df19b00, pConfig=pConfig@entry=0x7d844fdfd8e0, pEntry=pEntry@entry=0x7d437805db80) at /hostvm/dotnet/src/runtime/src/coreclr/vm/prestub.cpp:830
#35 0x00007d84d70949da in MethodDesc::JitCompileCode (this=this@entry=0x7d845df19b00, pConfig=pConfig@entry=0x7d844fdfd8e0) at /hostvm/dotnet/src/runtime/src/coreclr/vm/prestub.cpp:717
#36 0x00007d84d70944b2 in MethodDesc::PrepareILBasedCode (this=0x7d845df19b00, pConfig=0x7d844fdfd8e0) at /hostvm/dotnet/src/runtime/src/coreclr/vm/prestub.cpp:433
#37 0x00007d84d700521c in CodeVersionManager::PublishVersionableCodeIfNecessary (this=0x5dfaabd85a18, pMethodDesc=0x7d845df19b00, callerGCMode=CallerGCMode::Coop, doBackpatchRef=0x7d844fdfd9b8, doFullBackpatchRef=<optimized out>) at /hostvm/dotnet/src/runtime/src/coreclr/vm/codeversion.cpp:1738
#38 0x00007d84d7099937 in MethodDesc::DoPrestub (this=this@entry=0x7d845df19b00, pDispatchingMT=pDispatchingMT@entry=0x0, callerGCMode=callerGCMode@entry=CallerGCMode::Coop) at /hostvm/dotnet/src/runtime/src/coreclr/vm/prestub.cpp:2908
#39 0x00007d84d70991eb in PreStubWorker (pTransitionBlock=0x7d844fdfdbf8, pMD=0x7d845df19b00) at /hostvm/dotnet/src/runtime/src/coreclr/vm/prestub.cpp:2708
#40 0x00007d84d72a8fc4 in ThePreStub () at /hostvm/dotnet/src/runtime/src/coreclr/vm/amd64/theprestubamd64.S:17
#41 0x00007d845de60e69 in ?? ()
#42 0x0000000000000000 in ?? ()
(gdb)

@MichaelSimons MichaelSimons added area-testing Improvements in CI and testing and removed untriaged labels Mar 20, 2025
@MichaelSimons MichaelSimons moved this from Backlog to In Progress in .NET Source Build Mar 20, 2025
@MichaelSimons
Copy link
Member

Expected to be fixed with dotnet/runtime#113738, keeping open until the change flows into the VMR.

@elinor-fung
Copy link
Member

elinor-fung commented Mar 21, 2025

Thanks, @akoeplinger!

The host_runtime_contract::external_assembly_probe is set to 0x140 - so we incorrectly detect that there is a callback there that should be invoked.

(lldb) print s_hostContract
(host_runtime_contract *) $4 = 0x00005dfaabceb268
(lldb) print *s_hostContract
(host_runtime_contract) $5 = {
  size = 48
  context = 0x00005dfaabceb160
  get_runtime_property = 0x00007d84d76a3e60 (libhostpolicy.so`(anonymous namespace)::get_runtime_property(char const*, char*, unsigned long, void*) at hostpolicy_context.cpp:114)
  bundle_probe = 0x0000000000000000
  pinvoke_override = 0x0000000000000000
  external_assembly_probe = 0x0000000000000140
}

It looks like a race with shutdown (0x7 is ShutDown_Start | ShutDown_Finalize1 | Shutdown_Finalize2):

(lldb) print g_fEEShutDown
(DWORD) $6 = 7

Aside from the thread with the crash (which looks to be running something in Microsoft.Build.NuGetSdkResolver), the only other thread is the event pipe server, presumably only because it hasn't looped back to detect that the shutdown state has been set.

Seems like we should either make all the HostInformation functions check for g_fEEShutDown or clear out s_hostContract as part of coreclr_shutdown (it gets set as part of coreclr_initialize).

@jkotas
Copy link
Member

jkotas commented Mar 21, 2025

Seems like we should either make all the HostInformation functions check for g_fEEShutDown or clear out s_hostContract as part of coreclr_shutdown (it gets set as part of coreclr_initialize).

I do not think that this is going to work reliably. The runtime is not stopped during shutdown. You have to account for the case where other threads are still running and accessing the global data structures.

The best way to fix shutdown issues like this one is to do less work during shutdown, don't worry about any cleanups, and just let stuff leak.

If it is not an option, you need to invent your own locking strategy for the global data structure. g_fEEShutDown won't save you since it is not a lock.

@elinor-fung
Copy link
Member

I have a PR out in runtime: dotnet/runtime#113776

Is there a good way to make that PR run against the source build pipeline that was hitting this issue?

@akoeplinger
Copy link
Member

@elinor-fung the easiest option is to port the change to a PR in https://github.com/dotnet/dotnet and then trigger the dotnet-source-build pipeline a couple times on the PR.

@MichaelSimons
Copy link
Member

Closing as the offending runtime fix was revert in main with dotnet/runtime#113738 and preview-3 with dotnet/runtime#113746

@github-project-automation github-project-automation bot moved this from Blocked to Done in .NET Source Build Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-upstream-fix Needs a change in a contributing repo ops-monitor Issues created/handled by the source build monitor role
Projects
Status: Done
Development

No branches or pull requests

6 participants