Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option '-threads 8' causes unpredictable hanging due to lock issue #640

Open
mhechthz opened this issue Mar 10, 2024 · 10 comments
Open

Option '-threads 8' causes unpredictable hanging due to lock issue #640

mhechthz opened this issue Mar 10, 2024 · 10 comments

Comments

@mhechthz
Copy link

I opened an issue on Python Pulp on this but they told it is related to CBC - see coin-or/pulp#737

From Python-Pulp this is called

C:\Python312\Lib\site-packages\pulp\solverdir\cbc\win\64\cbc.exe C:\Users\Michael\AppData\Local\Temp\9f2f4340f4914ba58ec92a8628688845-pulp.mps -threads 8 -timeMode elapsed -branch -printingOptions all -solution C:\Users\Michael\AppData\Local\Temp\9f2f4340f4914ba58ec92a8628688845-pulp.sol

The mps file is produced from the code depicted at coin-or/pulp#737.

I can run this sometimes only once, sometimes 10 times but eventually it hangs with this output:

At line 2 NAME          MODEL
At line 3 ROWS
At line 509 COLUMNS
At line 314385 RHS
At line 314890 BOUNDS
At line 377666 ENDATA
Problem MODEL has 504 rows, 62775 columns and 125550 elements
Coin0008I MODEL read with 0 errors
threads was changed from 0 to 8
Option for timeMode changed from cpu to elapsed
Continuous objective value is 241 - 0.35 seconds
Cgl0004I processed model has 504 rows, 62775 columns (62775 integer (62775 of which binary)) and 125550 elements
Cutoff increment increased from 1e-05 to 0.9999
Cbc0038I Initial state - 0 integers unsatisfied sum - 0
Cbc0038I Solution found of 241
Cbc0038I Before mini branch and bound, 62775 integers at bound fixed and 0 continuous
Cbc0038I Mini branch and bound did not improve solution (2.15 seconds)
Cbc0038I After 2.15 seconds - Feasibility pump exiting with objective of 241 - took 0.12 seconds
Cbc0012I Integer solution of 241 found by feasibility pump after 0 iterations and 0 nodes (2.17 seconds)
Cbc0030I Thread 0 used 0 times,  waiting to start 0, 0 cpu time, 0 locks, 0 locked, 0 waiting for locks
Cbc0030I Thread 1 used 0 times,  waiting to start 0, 0 cpu time, 0 locks, 0 locked, 0 waiting for locks
Cbc0030I Thread 2 used 0 times,  waiting to start 0, 0 cpu time, 0 locks, 0 locked, 0 waiting for locks
Cbc0030I Thread 3 used 0 times,  waiting to start 0, 0 cpu time, 0 locks, 0 locked, 0 waiting for locks
Cbc0030I Thread 4 used 0 times,  waiting to start 0, 0 cpu time, 0 locks, 0 locked, 0 waiting for locks
Cbc0030I Thread 5 used 0 times,  waiting to start 0, 0 cpu time, 0 locks, 0 locked, 0 waiting for locks
Cbc0030I Thread 6 used 0 times,  waiting to start 0, 0 cpu time, 0 locks, 0 locked, 0 waiting for locks

@mhechthz
Copy link
Author

I did some further tests. It seems, that Python-Mip is not facing this problem. Mip uses obviously cbc-c-windows-x86-64.dll and not the cbc.exe but I'm not sure which version. See coin-or/pulp#737

@mhechthz
Copy link
Author

mhechthz commented Mar 10, 2024

I replaced the cbc.exe with the one from here: https://github.com/FranksMathematics/CBC_ReleaseParallel_Win86

Now it works and is really fast.

Could it be a good idea to make a new pre-compiled binary with parallel optimization also for the binary downloads on coin-or? Today all computers have at least 4 cores with 8 threads, so parallelisation should be available.

@jhmgoossens
Copy link
Contributor

Can you upload the mps file for testing?

@mhechthz
Copy link
Author

mhechthz commented Mar 10, 2024

The file is quite large, therefore I added the code to generate the file at coin-or/pulp#737 (several of the files generated here hang, so I don't think it's a problem of the actual program to solve). I compressed it and attached it here:
9f2f4340f4914ba58ec92a8628688845-pulp.zip

Nevertheless, it's just a toy example I di let generate by GPT4, because I wanted to check something.

@jhmgoossens
Copy link
Contributor

Thanks. I'll try the mps with the master build of the Windows build of cbc.
Can somebody see if this also happens in the Linux builds?

@zroug
Copy link

zroug commented Mar 14, 2024

I probably experienced this with version 2.10.8 (Debian 12 docker image). However, it happens very rarely and is incredibly hard to reproduce. I could not reproduce it with the example provided.

Probably unrelated, but once I got this error message:

ClpSimplexDual.cpp:3626: int ClpSimplexDual::dualColumn0(const CoinIndexedVector*, const CoinIndexedVector*, CoinIndexedVector*, double, double&, double&): Assertion `getStatus(iSequence + addSequence) != isFree && getStatus(iSequence + addSequence) != superBasic' failed.

The solver didn't crash after the error message, it just froze. But I also experienced hanging without an error message.

I'm using the Rust bindings, so I can't be 100% sure it's related to Coin CBC itself, but I do think so.

@svigerske
Copy link
Member

Thanks. I'll try the mps with the master build of the Windows build of cbc. Can somebody see if this also happens in the Linux builds?

With current master and my build, if I add -solve, it seems to run fine:

$ bin/cbc 9f2f4340f4914ba58ec92a8628688845-pulp.mps -threads 8 -timeMode elapsed -branch -printingOptions all -solve
Welcome to the CBC MILP Solver
Version: Devel (unstable)
Build Date: Mar 17 2024
command line - 9f2f4340f4914ba58ec92a8628688845-pulp.mps -threads 8 -timeMode elapsed -branch -printingOptions all -solve (default strategy 1)
At line 2 NAME          MODEL
At line 3 ROWS
At line 509 COLUMNS
At line 314385 RHS
At line 314890 BOUNDS
At line 377666 ENDATA
Problem MODEL has 504 rows, 62775 columns and 125550 elements
Coin0008I MODEL read with 0 errors
threads was changed from 0 to 8
Option for timeMode changed from cpu to elapsed
Continuous objective value is 241 - 0.866795 seconds
Cgl0004I processed model has 504 rows, 62775 columns (62775 integer (62775 of which binary)) and 125550 elements
Coin3009W Conflict graph built in 4.893 seconds, density: 0.201% !!
Cgl0015I Clique Strengthening extended 0 cliques, 0 were dominated
Cbc0045I Nauty sparseSpace 751694 affine 62121 coefficient count 125401
Cbc0045I Nauty did not find any useful orbits in time 75.7339
Cbc0038I Initial state - 0 integers unsatisfied sum - 0
Cbc0038I Solution found of 241
Cbc0038I Before mini branch and bound, 62775 integers at bound fixed and 0 continuous
Cbc0038I Mini branch and bound did not improve solution (85.79 seconds)
Cbc0038I After 85.79 seconds - Feasibility pump exiting with objective of 241 - took 0.48 seconds
Cbc0012I Integer solution of 241 found by feasibility pump after 0 iterations and 0 nodes (85.87 seconds)
Cbc0030I Thread 0 used 0 times,  waiting to start 2.9836783, 0 cpu time, 0 locks, 0 locked, 0 waiting for locks
Cbc0030I Thread 1 used 0 times,  waiting to start 2.6222253, 0 cpu time, 0 locks, 0 locked, 0 waiting for locks
Cbc0030I Thread 2 used 0 times,  waiting to start 2.2428255, 0 cpu time, 0 locks, 0 locked, 0 waiting for locks
Cbc0030I Thread 3 used 0 times,  waiting to start 1.8365674, 0 cpu time, 0 locks, 0 locked, 0 waiting for locks
Cbc0030I Thread 4 used 0 times,  waiting to start 1.4166172, 0 cpu time, 0 locks, 0 locked, 0 waiting for locks
Cbc0030I Thread 5 used 0 times,  waiting to start 0.96853828, 0 cpu time, 0 locks, 0 locked, 0 waiting for locks
Cbc0030I Thread 6 used 0 times,  waiting to start 0.5693121, 0 cpu time, 0 locks, 0 locked, 0 waiting for locks
Cbc0030I Thread 7 used 0 times,  waiting to start 0.11956644, 0 cpu time, 0 locks, 0 locked, 0 waiting for locks
Cbc0030I Main thread 0 waiting for threads,  1 locks, 5.698204e-05 locked, 1.6689301e-06 waiting for locks
Cbc0001I Search completed - best objective 241, took 0 iterations and 0 nodes (89.41 seconds)
Cbc0035I Maximum depth 0, 0 variables fixed on reduced cost
Cuts at root node changed objective from 241 to 241
Probing was tried 0 times and created 0 cuts (0 seconds)
Gomory was tried 0 times and created 0 cuts (0 seconds)
Knapsack was tried 0 times and created 0 cuts (0 seconds)
Clique was tried 0 times and created 0 cuts (0 seconds)
OddWheel was tried 0 times and created 0 cuts (0 seconds)
MixedIntegerRounding2 was tried 0 times and created 0 cuts (0 seconds)
FlowCover was tried 0 times and created 0 cuts (0 seconds)
TwoMirCuts was tried 0 times and created 0 cuts (0 seconds)
ZeroHalf was tried 0 times and created 0 cuts (0 seconds)

Result - Optimal solution found
Objective value:                241
Enumerated nodes:               0
Total iterations:               0
Time (CPU seconds):             88.7186
Time (Wallclock seconds):       89.9318
Total time (CPU seconds):       89.0334   (Wallclock seconds):       90.2602

Also when run 10 times in a loop.

@jhmgoossens
Copy link
Contributor

I also ran the latest stable Cbc 2.10.11 (x64) on Windows for 100 times with the given command (plus -solve) and the MPS file without any issues.

Given that this can't be reproduced with current Master and not with current Stable (2.10.11), I would propose to conclude that whatever the underlying root cause once was, it's fixed now.
So let's Close this issue?

@mhechthz
Copy link
Author

where can I get the latest Cbc as Windows binary? To check if it works.

@jhmgoossens
Copy link
Contributor

jhmgoossens commented Mar 20, 2024

Latest stable or latest master branch?

The latest stable build is available via Releases. Unfortunately, I believe the published Windows binaries are not built with multithreading support. So not good for you.

The latest master build (aka nightly / trunk / unstable) is , as far as I know, not available as built binary but you can build it yourself available in the Artifacts of the latest successful build, though also these Windows binaries are not built with multithreading support--anyway these are master builds are not 'stable'. So also not good for you.

To build for Windows with parallel / multithreading, see building CBC with Visual Studio. For my test with Cbc 2.10.11 (above), I built with multithreading support via Visual Studio using pthread-win32.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants