-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken C++ runtime on windows-2022 version 20240603.1.0 #10020
Comments
We worked around the problem by switching to using the static msvc runtime (bfgroup/b2@0075644). And plan on staying with the static runtime to avoid this problem again and again. As it's not the first time such botched updates have happened. |
Looks like the same as #10004, there are some workarounds in that issue. |
Glad it works for you. However, I have many DLL's and executables in the project. We can't link with the static runtime. If we did so, each process would end up with as many instances of the runtime as DLL's (+1 for the .exe), which does not work.
Thanks for pointing this. This is a very recent one too. They opened it while I was chasing the reason for the problem (it took a long time to identify the mutex issue). |
In fact, unless you have total control over your users' machines, you should be grateful to Github for showing you the problem. The expression the Windows DLL hell exists for a reason and the only viable solution when shipping binaries for Windows is to either go static, either ship this DLL yourself. At least when it comes for me, this is the last time I got burned by the MSVC runtime. |
There has definately been some changes in the 'default construction' of a mutex. Extracts from various recent versions... 14.32.31326
14.39.35519
14.40.33807 - this seems to be the latest
in 14.32 and 14.39, the default behaviour (when no compiler directive is specified) is
Not providing the |
It's worse than that. I tried to install the same VC runtime as used during the build and the problem remains the same, see #10004 (comment) |
This is normal, if you build with this version of MSVC, you need the new runtime. |
Precisely, I explicitly install on the runner system the MSVC runtime with which I built the code. So, the same runtime is used in compilation and run. But it does not work. Running the application still fails on locking the mutex. There is an inconsistency somewhere. Either the VCRedist package in the VS tree is not the same as used by the compiler, or there is a mess of already installed MSVC runtime which takes precedence because of a PATH settings. So, this is either an inconsistency in VS or an inconsistency in the GH runner. |
MSVC does not build with the runtime. MSVC produces code that expects to find its runtime. This would be the same with |
You misinterpreted what I meant. The compiler and the RTL work together, always. The compiler (well, the compilation environment at large) provides header files. These header files contain specific definitions, here the variants of the mutex constructor. The binary of the runtime must be compatible with these definitions. If a new version of the compiler (and compilation environment) introduces a new version of a constructor, the corresponding code must be in the RTL. Therefore, there is a new version of the RTL which comes (somehow) with the compiler. When you install Visual Studio, the installed tree of files contains a VCRedist package, a package to install on target systems to make sure that they will be able to run applications which are compiled by this compiler. This is what I mean: When you build with a given version of Visual Studio, the headers which are used during the compilation must be compatible with the provided VCRedist package. This is why, when you package an application, you typically include this VCRedist in the package of your application and you install both at the same time. Thus, you can be confident that your application will work on the target system, even if it is a bit older (not too old up to some point). This is why, in my test, I explicitly installed the VCRedist that is found in the Visual Studio setup of the GH runner, before running the application (before compiling in fact). The expected result is that the application which is built is compatible with the RTL that we just installed. And this is what fails... Therefore, something is rotten in the state of GitHub. |
Yes, indeed. I agree, they updated the compiler without updating the VCRedist package. However this is also a huge wake-up call for everyone who ships Windows binaries. In my case, it is a Node.js addon that is installed through |
@lelegard We are looking into the issue, we will get back to you after investigating this issue. |
… crash on `windows-2022` after MSVC update from 14.39.33519 to 14.40.33807 (#42123) ### Rationale for this change After the `windows-2022` GitHub runner image was updated last week, MATLAB began crashing when running the unit tests in `arrow/matlab/test/tfeather.m` on Windows. As part of the update, VS 2022 was updated from `17.9.34902.65` to `17.10.34928.147` and MSVC was updated from `14.39.33519` to `14.40.33807`. It looks like many other projects have run into this issue as well: 1. actions/runner-images#10004 2. actions/runner-images#10020 The suggested workaround for this crash is to supply the flag `_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR` when building. ### What changes are included in this PR? 1. Supply `_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR` flag when building Arrow C++. ### Are these changes tested? N/A. Existing tests used. ### Are there any user-facing changes? No. * GitHub Issue: #42015 Authored-by: Sarah Gilmore <[email protected]> Signed-off-by: Sarah Gilmore <[email protected]>
The deployment has completed, could you please try to rerun the workflow.
If you have any issues, please reach out to us. |
The problem started with GitHub runner version 2.317.0, image version 20240603.1.0. Said to be fixed now. See actions/runner-images#10020
Thanks for the update. It works "a bit better" but all JNI applications (Java Native Interface) are still crashing. So, no, the updated runner is not acceptable. Let me explain: In my project, I have C++ DLL's and C++ executables. There are also Python and Java bindings. All C++ applications now work correctly. Same for Python applications (the Python interpreter successfully loads my DLL when Python applications calls my Python bindings). However, when Java application calls the Java bindings, loading the DLL fails with this error:
Log here: https://github.com/tsduck/tsduck/actions/runs/9518023948/job/26237929549 That is exactly the same symptom as with the previous update: The initialization routines of the DLL loads stuff using non-thread-safe system functions and they use a Note that the crash occurs only with 64-bit applications. The 32-bit version works with Java (probably not the same mixture of VC runtime DLL's). I re-enabled the workaround I implemented earlier, defining So, please consider adding JNI test cases in your validation suites. |
This is not foolproof. It is the reason for the continuing Java crashes. JNI modules are built with the latest VC++ and need the latest runtime but Java Temurin contains its own older version of the vcruntime which is loaded by the JVM. When it loads a JNI module, it links the module with the vcruntime it already loaded. When the module attempts to create a mutex it calls the code in the older vcruntime and the JVM crashes. The workaround is to remove the vcruntime from the Temurin installation. |
Thanks for your confirmation, we are closing the issue as completed. |
@RaviAkshintala, so you say that you "close this issue as completed" while you "look into the issue". |
All we've confirmed is that the runner image is still broken. I am therefore in total agreement with @lelegard.
|
@RaviAkshintala, I don't understand. The confirmation indicated that the updated image is still unreliable and does not offer a stable, reliable build platform. It was clearly not a confirmation that everything is working once again. Why is this closed? |
@RaviAkshintala, you initially wrote:
Then I wrote this:
And, after my comment, you edited your previous comment and you removed the sentence "Actually we look into the issue". It is fortunate that the editing history of posts is available to demonstrate this. Let me say that this is extremely offensive and dishonest. As @MarkCallow and @mprather confirmed with me, the problem is NOT fixed. Not only you close the issue without a complete fix but you also erase the part of the discussion which exhibits this. |
I can confirm that the problem is not solved. Please reopen this issue and, please, solve it asap. At least by reverting to the old working runner. We are experiencing two weeks of broken runners and this is very unprofessional. |
Hi @lelegard We Apologise for the mistake, will look into the carefully.Thanks. |
If GitHub supported it, the right thing to do with this is mark it as a duplicate of #10004 so there aren't multiple threads of discussion going on. The description I gave earlier of the JNI failure is what remains of the original problem since deployment of 20240610.1.0. Actually #10055 was opened specifically re the JNI issue. That too, in my view, is a duplicate of #10004. |
@RaviAkshintala and all GitHub folks, Because characterizing the problem was only possible in a GitHub Actions runner context, I had to run many workflows on a copy of a big repo to come to the conclusion of the C++ Because of this problem, which was created by GitHub with a careless, insufficiently tested, upgrade, I burnt all my Actions credits:
This is the first time it happens to me in 11 years of GitHub usage. GitHub cannot credit back the many hours of my time I lost on this issue (and many others' time as well). However, it would be fare from GitHub to restore my Actions credits. Again, this credit was lost because of a GitHub bug, not for my own usage or the usage of my project. So, please consider recrediting my Actions quota. |
To all, 7 days after complaining that investigating GitHub's problem burnt all my GH Action credits and asking for a refill of the credits, I still got nothing. My GH Actions credit is still zero. All burnt to do what GH should have done: investigating the problem that they created. And the problem is still not fixed. The contempt and disregard of GH for its users seems have no limit. |
@lelegard Did you file a customer support request? Those have always been resolved to my satisfaction. A comment in this thread won't get you the help you seek. |
Description
Something was broken in runner image windows-2022 version 20240603.1.0 , maybe in the VC++ runtime.
When running test programs in a workflow, using a C++ mutex (std::mutex) immediately terminates the application with error 1.
As a consequence, all continuous integration pipelines / non-regression test suites for C++ applications are broken, unusable, as soon as the application uses a mutex.
This is a very serious issue which should be addressed with a high priority.
The problem appears after the upgrade of windows-2022 (aka "windows-latest")
Platforms affected
Runner images affected
Image version and build link
The problem is demonstrated in the following simple repo: https://github.com/lelegard/gh-runner-lock
The log with the sample failure: https://github.com/lelegard/gh-runner-lock/actions/runs/9431982526/job/25981214253
Is it regression?
Last worked on windows-2022 runner version 2.316.1, image version 20240514.3.0
Expected behavior
The C++ applications which are built as part of a workflow should not crash during the subsequent test phase.
Actual behavior
See repro section.
Repro steps
The problem is demonstrated in the following simple repo: https://github.com/lelegard/gh-runner-lock
The log with the sample failure: https://github.com/lelegard/gh-runner-lock/actions/runs/9431982526/job/25981214253
The C++ program is quite simple:
Of course, being so simple, this program works well everywhere, including on local Windows development systems.
When executed in a GitHub workflow, starting with Windows runner version 2.317.0, it fails in the lock step. The above-mentioned log contains this:
Background
The problem initially appears after that upgrade on the project TSDuck where all workflows suddenly failed on Windows platforms.
The project is quite big (~ 350,000 loc, C++). Everything works fine on local Windows development systems. Only the GitHub CI failed. Because of the size of the project and the absence of direct interaction with the GitHub runner, identifying the reason for the failure was quite hard. I spent hours of test repos and run 52 versions of the CI workflow to understand the nature of the problem.
The text was updated successfully, but these errors were encountered: