-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SB Scenario tests are intermittently failing to execute #4952
Comments
@ViktorHofer - have you seen this failure before? It's affecting PRs in sdk too. |
Yes. The assumption is that this gets fixed with dotnet/runtime#113598 This got introduced with dotnet/sdk#47580 (comment) |
It looks like dotnet/sdk@f981d32 didn't help. I still see the crash in jobs with that commit in HEAD. |
@ViktorHofer it makes sense that dotnet/runtime#113598 wouldn't have fixed this since the VMR uses it's own versions of the container images in https://github.com/dotnet/sdk/blob/f255b7461bef7db14620199c3aa00626752edb52/eng/pipelines/templates/variables/vmr-build.yml#L118-L121. We'd need to do the pinning there. |
Makes sense. This is failing in the source-build leg most of the time, which uses this image: https://github.com/dotnet/sdk/blob/f255b7461bef7db14620199c3aa00626752edb52/eng/pipelines/templates/variables/vmr-build.yml#L98-L101 Do we know the list of images that need to be pinned? cc @richlander @MichalStrehovsky |
@ViktorHofer hm that's actually an interesting point, clang wasn't updated in the |
@akoeplinger @ViktorHofer - There was a rebuild of CentOS on Mar 11th but this issue didn't start showing up until Mar 14th. Would this particular issue affect the centos-stream9 image given that timeline? cc @mthalman |
yeah I think we'd have noticed earlier if it was introduced on the 11th. my current bet is something in the runtime bump caused it, but I wasn't able to reproduce it so far so maybe it's specific to the Azure SKU we're using. the only change that could possibly be related are the SIMD changes. |
Are those changes straight forward to revert? Would it be worthwhile to revert them in a VMR PR and run the SB Lite pipeline over them a few times? That pipeline has been failing more times than not. |
I'm seeing a similar test failing with the same crash and exit code but on the Ubuntu job:
|
From dotnet/sdk#47580 (comment) : The suspect commits range dotnet/runtime@5ff417f...c3d95b4 I think dotnet/runtime@035ee5c is the most likely related. @elinor-fung Could you please take a look? |
VerifyMSTestTemplate
Test is Failing
Updated the title to better reflect the issue. |
Is there simple way to enable/upload crash dumps from the CI? I can't seem to get a local repro. I tried:
|
I was able to reproduce it once right now on a Standard_D4as_v4 VM in the helix-repro-vms DTL when doing the full source build and |
@elinor-fung success! I uploaded the coredump and dotnet binaries to my OneDrive, let me know if you need anything else. This is from a
|
Expected to be fixed with dotnet/runtime#113738, keeping open until the change flows into the VMR. |
Thanks, @akoeplinger! The
It looks like a race with shutdown (0x7 is ShutDown_Start | ShutDown_Finalize1 | Shutdown_Finalize2):
Aside from the thread with the crash (which looks to be running something in Seems like we should either make all the |
I do not think that this is going to work reliably. The runtime is not stopped during shutdown. You have to account for the case where other threads are still running and accessing the global data structures. The best way to fix shutdown issues like this one is to do less work during shutdown, don't worry about any cleanups, and just let stuff leak. If it is not an option, you need to invent your own locking strategy for the global data structure. |
I have a PR out in runtime: dotnet/runtime#113776 Is there a good way to make that PR run against the source build pipeline that was hitting this issue? |
@elinor-fung the easiest option is to port the change to a PR in https://github.com/dotnet/dotnet and then trigger the |
Closing as the offending runtime fix was revert in main with dotnet/runtime#113738 and preview-3 with dotnet/runtime#113746 |
https://dev.azure.com/dnceng/internal/_build/results?buildId=2665635&view=logs&j=609589e2-4f74-5576-cdb7-914bcaea778b&t=85a3ad82-e833-5143-3e25-ec1affd2c703&l=251
The text was updated successfully, but these errors were encountered: