-
Notifications
You must be signed in to change notification settings - Fork 92
Add concurrent lp solve and crossover at root node #602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/25.12
Are you sure you want to change the base?
Conversation
Fix sign bug when crushing dual
… concurrent-root-solve
… concurrent-root-solve
WalkthroughThis PR introduces thread-aware root relaxation solution propagation to the branch-and-bound solver, refactors LP solver APIs to remove redundant problem parameters, enhances presolve with range-row handling and dual residual computation, and integrates PDLP-based solving in the diversity manager with callbacks to communicate solutions between components. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
cpp/src/dual_simplex/phase2.cpp (1)
1931-1955: Fix tolerance comparison asymmetry on lines 1943 and 1955; remove commented-out code.Line 1955 uses
z[j] < fixed_toleranceinstead ofz[j] < -fixed_tolerance, creating an asymmetry with line 1943'sz[j] > fixed_tolerance. This allows small positive reduced costs to incorrectly trigger the NONBASIC_UPPER branch instead of NONBASIC_LOWER.The comparison operators should be symmetric for dual feasibility logic:
- Line 1943:
z[j] > fixed_tolerance(move positive reduced costs to lower bound)- Line 1955: should be
z[j] < -fixed_tolerance(move negative reduced costs to upper bound)Additionally, remove the commented-out code at lines 1931–1936 (old exact equality checks for
z[j] == 0), as it's been replaced by the tolerance-based logic.cpp/src/dual_simplex/presolve.cpp (1)
731-778: Potential mismatch in num_artificial_vars when equality_rows and range_rows differ
add_artifical_variablesnow computes:const i_t num_artificial_vars = equality_rows.size() - range_rows.size(); const i_t num_cols = n + num_artificial_vars; i_t nnz = problem.A.col_start[n] + num_artificial_vars; ... std::vector<bool> is_range_row(problem.num_rows, false); for (i_t i : range_rows) { is_range_row[i] = true; } for (i_t i : equality_rows) { if (is_range_row[i]) { continue; } // add artificial variable }This assumes a 1:1 relationship where every
range_rows[i]is also inequality_rows. If that invariant is ever broken (e.g., some range rows are not flagged as equality inequality_rows),num_artificial_varscan become negative, leading to invalidnum_cols/nnz, incorrect resizing ofcol_start, and assertions likeassert(j == num_cols)failing.A safer pattern is to derive
num_artificial_varsfrom the actual rows you’ll process:- const i_t num_artificial_vars = equality_rows.size() - range_rows.size(); + std::vector<bool> is_range_row(problem.num_rows, false); + for (i_t i : range_rows) { is_range_row[i] = true; } + + i_t num_artificial_vars = 0; + for (i_t i : equality_rows) { + if (!is_range_row[i]) { ++num_artificial_vars; } + }and then compute
num_cols/nnzfrom thisnum_artificial_vars. You can also add a defensiveassert(num_artificial_vars >= 0);if you want the invariant checked in debug builds.This keeps the matrix bookkeeping correct even if the relationship between
equality_rowsandrange_rowschanges in the future.cpp/src/mip/problem/problem_helpers.cuh (1)
19-23: Guard kernel launch for zero-constraint problems.The greater-to-less conversion logic and bound update look correct, but
kernel_convert_greater_to_less<<<problem.n_constraints, TPB>>>will receive a grid size of 0 whenproblem.n_constraints == 0, which is an invalid configuration in CUDA and will raise a runtime error. Add a cheap guard:template <typename i_t, typename f_t> static void convert_greater_to_less(detail::problem_t<i_t, f_t>& problem) { raft::common::nvtx::range scope("convert_greater_to_less"); auto* handle_ptr = problem.handle_ptr; + if (problem.n_constraints == 0) { + handle_ptr->sync_stream(); + return; + } + constexpr i_t TPB = 256; kernel_convert_greater_to_less<i_t, f_t> <<<problem.n_constraints, TPB, 0, handle_ptr->get_stream()>>>( raft::device_span<f_t>(problem.coefficients.data(), problem.coefficients.size()), raft::device_span<const i_t>(problem.offsets.data(), problem.offsets.size()), raft::device_span<f_t>(problem.constraint_lower_bounds.data(), problem.constraint_lower_bounds.size()), raft::device_span<f_t>(problem.constraint_upper_bounds.data(), problem.constraint_upper_bounds.size())); RAFT_CHECK_CUDA(handle_ptr->get_stream()); problem.compute_transpose_of_problem(); handle_ptr->sync_stream(); }This keeps the new functionality robust for empty-constraint LPs without impacting the common case.
Also applies to: 315-371
cpp/src/mip/diversity/diversity_manager.cu (1)
374-400:lp_optimal_solution_copyis never filled, leading to undefined LP solution usageIn this block,
lp_optimal_solution_copyis allocated andsolve_lp_with_methodis called, butlp_optimal_solution_copyis never populated fromlp_result. Later, underrelaxed_solution_mutex, you copylp_optimal_solution_copyintolp_optimal_solutionwhensimplex_solution_existsis false. That meanslp_optimal_solution(used to seed heuristics and later clamped within bounds) will contain uninitialized/garbage data instead of the PDLP solution.You should copy the primal solution from
lp_resultintolp_optimal_solution_copyimmediately after solving, before the mutex-protected block and before clamping. For example:- rmm::device_uvector<f_t> lp_optimal_solution_copy(lp_optimal_solution.size(), - problem_ptr->handle_ptr->get_stream()); - timer_t lp_timer(lp_time_limit); - auto lp_result = solve_lp_with_method<i_t, f_t>(*problem_ptr, pdlp_settings, lp_timer); - - CUOPT_LOG_INFO("LP enum: %s", lp_result.get_termination_status_string().c_str()); + rmm::device_uvector<f_t> lp_optimal_solution_copy(lp_optimal_solution.size(), + problem_ptr->handle_ptr->get_stream()); + timer_t lp_timer(lp_time_limit); + auto lp_result = solve_lp_with_method<i_t, f_t>(*problem_ptr, pdlp_settings, lp_timer); + + auto& d_primal_solution = lp_result.get_primal_solution(); + cuopt_assert(d_primal_solution.size() == lp_optimal_solution_copy.size(), + "Unexpected size mismatch in LP optimal solution"); + raft::copy(lp_optimal_solution_copy.data(), + d_primal_solution.data(), + lp_optimal_solution_copy.size(), + problem_ptr->handle_ptr->get_stream()); + + CUOPT_LOG_INFO("LP enum: %s", lp_result.get_termination_status_string().c_str());With this change, the later copy into
lp_optimal_solutionand the subsequentclamp_within_var_boundscall will operate on an actual LP solution rather than uninitialized memory.
🧹 Nitpick comments (12)
cpp/src/dual_simplex/phase2.cpp (2)
2204-2204: Reconsider unconditional macro definition.The
PRINT_VSTATUS_CHANGESmacro is unconditionally defined, which means the gated code at lines 2294-2300 is always included. This defeats the purpose of using a compile-time switch.Consider one of these alternatives:
- If this should always be enabled, remove the
#ifdefguards and the#define- If this should be optional, remove the
#defineand control it via build flags- Use a
constexpr boolflag (likeprint_normsat line 2268) for consistency with other debug features in this fileExample for option 3:
-#define PRINT_VSTATUS_CHANGES +constexpr bool print_vstatus_changes = false; // Set to true for debuggingThen change lines 2294-2300:
-#ifdef PRINT_VSTATUS_CHANGES +if constexpr (print_vstatus_changes) {
2268-2280: Consider defaulting verbose logging to disabled.The
print_normsflag is hardcoded totrue, meaning the norm computations and logging at lines 2270-2280 always execute. Whileconstexprensures no runtime overhead from the condition itself, thevector_norm_infcalls still iterate through entire vectors, which can impact performance in a production solver.Suggestion: Default to
falsefor production builds, especially since the PR description mentions this is for "concurrent root solve" which may be performance-critical:- constexpr bool print_norms = true; + constexpr bool print_norms = false; // Set to true for debuggingThis maintains the logging capability for development while avoiding unnecessary overhead in production.
cpp/tests/mip/miplib_test.cu (1)
63-70: Test coverage reduced to a single MIPLIB instanceCommenting out
neos5andswath1narrows this test to50v-10only. If the skipped instances are flaky or too slow, consider:
- Adding a brief comment/TODO explaining why they’re disabled, or
- Guarding them behind a config/label (e.g., long-running or nightly suite) instead of permanently commenting them out.
This keeps the reason discoverable and makes it easier to re-enable later.
cpp/src/dual_simplex/presolve.hpp (1)
150-157: crush_dual_solution return type change is consistent; consider using the residualExposing
crush_dual_solutionasf_tto return the dual residual norm matches the updated implementation and is a useful diagnostic. Some existing callers (e.g.,convert_user_lp_with_guess) currently ignore the return value; if the residual is intended to drive logic (e.g., root-relaxation quality checks), it may be worth plumbing it through at those call sites instead of discarding it.If you want to double‑check where this new return is actually consumed, you can grep for
crush_dual_solution(and confirm all intended call sites are updated.cpp/src/dual_simplex/branch_and_bound.hpp (1)
18-19: Root-relaxation and main-thread APIs look good; double-check concurrency semanticsThe new APIs:
set_root_relaxation_solution(...)correctly copies primal/dual/reduced-costs intoroot_relax_soln_and usesroot_relaxation_solution_set_.store(..., std::memory_order_release), which is appropriate for cross-thread publication if readers use an acquire load.set_main_thread/is_main_threadprovide a simple way to specialize behavior for the main B&B thread.Two concurrency nits to confirm:
- Ensure any reader of
root_relaxation_solution_set_usesmemory_order_acquire(or the default) so that updates toroot_relax_soln_androot_objective_are fully visible.- If a single
branch_and_bound_tinstance is touched by multiple threads, consider makingmain_thread_atomic as well (or guarantee that it’s only written/read from the owning thread before parallel work begins) to avoid a potential data race.If
branch_and_bound_tinstances are strictly thread-local, the current design is fine.Also applies to: 122-139, 143-145, 201-207
cpp/src/dual_simplex/presolve.cpp (1)
1308-1385: Range-row handling and dual residual return in crush_dual_solution look correctThe updated
crush_dual_solution:
- Marks range rows via
user_problem.range_rowsand uses that to decide the sign for the new slack reduced costs:
- For non-range equality rows,
z_j = -y_ias before.- For range rows,
A(i,j) = -1inconvert_range_rows, soA_j^T y = -y_iand the dual equationA^T y + z = cwithc_j = 0impliesz_j = y_i. The newif (is_range_row[i]) { z[j] = y[i]; } else { z[j] = -y[i]; }matches that derivation.- Computes
dual_residual = A^T y + z - cand returns its infinity norm asf_t, which is a useful quality measure.- Keeps the prior assertions and debug printing pattern, but relaxes the hard
assert(dual_res_inf < 1e-4)in favor of returning the residual.Two minor follow-ups you might consider:
- At call sites that care about root-relaxation quality (e.g., any new PDLP→B&B wiring), make explicit use of the returned residual to gate callbacks or logging.
- In
convert_user_lp_with_guess, you currently ignore the return value; that’s fine if you don’t need it, but you may want to at least log/debug it when investigating crossover/root issues.Overall, the range-row sign fix and residual return are numerically sound.
Also applies to: 1604-1610
cpp/src/dual_simplex/crossover.cpp (1)
281-327: Recomputing duals from the final basis after primal push is a solid fixThe new
compute_dual_solution_from_basis:
- Uses
basis_update_mpf_t::b_transpose_solvewith sparse vectors to solveBᵀ y = c_Band then reconstructszso thatAᵀ y + z = c, settingz_B = 0andz_N = c_N − Nᵀ y. This matches the standard dual reconstruction from a basis.- Initializes
ywithresize(m, 0.0)and then scatters the sparse solution, so all entries are well-defined.Calling it after a successful
primal_push:compute_dual_solution_from_basis(lp, ft, basic_list, nonbasic_list, solution.y, solution.z);ensures that
solution.yandsolution.zare consistent with the final basis used to recomputesolution.x. This makes the subsequent feasibility checks (primal_infeasibility,dual_infeasibility,dual_residual, etc.) meaningful and avoids evaluating primal/dual residuals against a stale dual.Given it’s only invoked after the potentially disruptive primal push, the extra solve cost is well justified.
Also applies to: 1201-1208
cpp/src/mip/diversity/diversity_manager.cuh (1)
73-76: Non-owning B&B pointer looks fine; ensure lifetime/thread-safety are documented.This raw
branch_and_bound_t<i_t, f_t>*is a non-owning hook; from this header alone the change is OK, but callers must guarantee the pointed-to B&B outlivesdiversity_manager_tand is only accessed from threads where the B&B instance is valid. Consider adding brief documentation at the declaration or in the .cu constructor explaining ownership and threading expectations.cpp/src/mip/problem/problem.cuh (1)
209-213: Callback addition is fine; clarify non-copying semantics and argument contract.The new
set_root_relaxation_solution_callbackmirrors the existingbranch_and_bound_callbackpattern and is initialized/reset to empty in the constructors, soproblem_tcopies intentionally drop callbacks. If that is the design (no callback propagation across copies), it looks consistent; just make sure this is documented somewhere together with the exact meaning/order of the six callback arguments so future call sites don’t get them wrong.cpp/src/mip/diversity/diversity_manager.cu (2)
37-41: Unusedbranch_and_bound_ptrmember indiversity_manager_t
branch_and_bound_ptris initialized tonullptrbut not used or wired to the actualbranch_and_boundinstance (which is stored oncontextelsewhere). If the intent is to access B&B from the diversity manager, consider initializing this member fromcontextor remove it to avoid confusion.
416-456: Root-relaxation callback wiring from PDLP into B&B is sound; consider cost of host allocationsThe new block that copies PDLP’s primal, dual, and reduced costs from device to host, syncs the stream, computes both solver and user objectives, and invokes
set_root_relaxation_solution_callbackgives B&B a rich root-relaxation view before exploring the tree. This matches the callback signature wired insolver.cuand is guarded by a null check, so it’s safe when no B&B instance is present.If you expect this path to be hit frequently in future (e.g., multiple root LPs), consider reusing host buffers instead of allocating
std::vectoreach time to avoid repeated heap allocations, but for a single root call this is acceptable.cpp/src/mip/solver.cu (1)
212-224: Root-relaxation callback binding is correct; consider resetting it after B&B finishesBinding
context.problem_ptr->set_root_relaxation_solution_callbacktobranch_and_bound_t::set_root_relaxation_solutionmatches the six-argument callback used indiversity_manager_t::run_solver, allowing the PDLP root relaxation to feed directly into B&B.However, this callback remains on
problem_ptrafter B&B completes andbranch_and_boundis destroyed. If amip_solver_tinstance or itsproblem_ptris reused, this would leave a dangling bound function. Consider clearing the callback (e.g., setting it tonullptr) once B&B is done (afterbranch_and_bound_status_future.get()), or in the solver’s teardown, to avoid potential use-after-free in future reuse scenarios.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
cpp/src/dual_simplex/branch_and_bound.cpp(2 hunks)cpp/src/dual_simplex/branch_and_bound.hpp(4 hunks)cpp/src/dual_simplex/crossover.cpp(1 hunks)cpp/src/dual_simplex/phase2.cpp(6 hunks)cpp/src/dual_simplex/presolve.cpp(8 hunks)cpp/src/dual_simplex/presolve.hpp(1 hunks)cpp/src/dual_simplex/right_looking_lu.cpp(1 hunks)cpp/src/dual_simplex/simplex_solver_settings.hpp(1 hunks)cpp/src/linear_programming/solve.cu(5 hunks)cpp/src/linear_programming/solve.cuh(0 hunks)cpp/src/mip/diversity/diversity_manager.cu(4 hunks)cpp/src/mip/diversity/diversity_manager.cuh(1 hunks)cpp/src/mip/diversity/lns/rins.cu(1 hunks)cpp/src/mip/diversity/lns/rins.cuh(1 hunks)cpp/src/mip/problem/problem.cu(2 hunks)cpp/src/mip/problem/problem.cuh(1 hunks)cpp/src/mip/problem/problem_helpers.cuh(2 hunks)cpp/src/mip/solver.cu(2 hunks)cpp/tests/mip/miplib_test.cu(1 hunks)
💤 Files with no reviewable changes (1)
- cpp/src/linear_programming/solve.cuh
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-22T14:25:22.899Z
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 527
File: cpp/src/mip/diversity/lns/rins.cu:167-175
Timestamp: 2025-10-22T14:25:22.899Z
Learning: In MIP (Mixed Integer Programming) problems in the cuOPT codebase, `n_integer_vars == 0` is impossible by definition—MIP problems must have at least one integer variable. If there are no integer variables, it would be a pure Linear Programming (LP) problem, not a MIP problem.
Applied to files:
cpp/src/mip/problem/problem.cuh
🧬 Code graph analysis (7)
cpp/src/dual_simplex/phase2.cpp (2)
cpp/src/dual_simplex/sparse_matrix.cpp (2)
vector_norm_inf(748-757)vector_norm_inf(748-748)cpp/src/dual_simplex/vector_math.hpp (2)
vector_norm_inf(17-26)vector_norm_inf(17-17)
cpp/src/mip/diversity/diversity_manager.cu (2)
cpp/src/dual_simplex/presolve.cpp (2)
convert_greater_to_less(532-570)convert_greater_to_less(532-536)cpp/src/linear_programming/solve.cu (2)
solve_lp_with_method(758-773)solve_lp_with_method(758-762)
cpp/src/dual_simplex/branch_and_bound.hpp (1)
cpp/src/mip/solver.cu (6)
solution(64-68)solution(64-64)solution(70-75)solution(70-72)solution(77-80)solution(77-77)
cpp/src/dual_simplex/crossover.cpp (1)
cpp/src/dual_simplex/phase2.cpp (3)
compute_dual_solution_from_basis(530-572)compute_dual_solution_from_basis(530-535)basic_list(2185-2185)
cpp/src/dual_simplex/branch_and_bound.cpp (4)
cpp/src/dual_simplex/solve.hpp (1)
solve_linear_program_advanced(43-48)cpp/src/dual_simplex/solve.cpp (3)
solve_linear_program_advanced(103-125)solve_linear_program_advanced(103-108)solve_linear_program_advanced(645-651)cpp/src/dual_simplex/presolve.cpp (6)
crush_primal_solution(1227-1262)crush_primal_solution(1227-1231)crush_primal_solution(1598-1602)crush_dual_solution(1309-1385)crush_dual_solution(1309-1315)crush_dual_solution(1604-1610)cpp/src/dual_simplex/crossover.cpp (3)
crossover(1041-1392)crossover(1041-1046)crossover(1396-1402)
cpp/src/mip/solver.cu (2)
cpp/src/linear_programming/solve.cu (2)
solve_lp_with_method(758-773)solve_lp_with_method(758-762)cpp/src/dual_simplex/branch_and_bound.cpp (2)
set_new_solution(252-313)set_new_solution(252-252)
cpp/src/linear_programming/solve.cu (2)
cpp/tests/linear_programming/pdlp_test.cu (3)
problem(252-252)problem(532-532)problem(618-618)cpp/tests/mip/elim_var_remap_test.cu (1)
problem(78-78)
🔇 Additional comments (13)
cpp/src/dual_simplex/simplex_solver_settings.hpp (1)
18-18: The review comment is incorrect—#include <vector>is pre-existing, not a new addition.The verification confirms that both the callback declarations using
std::vector(lines 139–142) and the#include <vector>header were already present in the codebase before this PR. The file has not been meaningfully modified. The review comment incorrectly identifies a pre-existing include as a new addition to the file.Likely an incorrect or invalid review comment.
cpp/src/dual_simplex/phase2.cpp (2)
1913-1913: LGTM: Helpful diagnostic logging added.The tolerance logging at function entry aids in debugging and understanding solver behavior.
1996-1996: LGTM: Appropriate log level for diagnostic message.Downgrading from
printftosettings.log.debugis correct for this type of diagnostic message, maintaining consistency with other debug logging in this function.cpp/src/mip/diversity/lns/rins.cuh (1)
20-26: Header include spacing is fineOnly a blank line was added between project and utility includes; no functional impact and consistent with typical style.
cpp/src/dual_simplex/right_looking_lu.cpp (1)
54-61: Corrected degree bounds match matrix dimensions and container sizesSwitching to
Cdegree[k] <= mandRdegree[k] <= nmatches the actual interpretation (column nnz ≤ rows, row nnz ≤ cols) and aligns withcol_countandrow_countallocations in both LU routines. This also protects against out-of-range indexing if the invariants were ever violated.cpp/src/mip/diversity/lns/rins.cu (1)
235-248: Disabling branch-and-bound logging inside RINS looks appropriateSetting
branch_and_bound_settings.log.log = false;for the RINS sub-MIP will significantly reduce log noise while still tagging any remaining messages with[RINS]vialog_prefix. This is a reasonable default for an internal heuristic; you can always flip it back on when debugging.cpp/src/mip/problem/problem.cu (1)
140-145: Constructor/callback wiring and solve dispatch look consistent.Initializing both
branch_and_bound_callbackandset_root_relaxation_solution_callbacktonullptrin the constructors and leaving them empty in the shallow-copy ctor keeps callbacks from leaking acrossproblem_tcopies, which matches the existing pattern. The change to callsolve_lp_with_method(problem, settings, lp_timer, is_batch_mode)is also consistent with the updatedsolve_lp_with_methodsignature and keeps all LP solving based on the internaldetail::problem_t. No issues from this file’s changes alone.Also applies to: 148-156, 861-862
cpp/src/dual_simplex/branch_and_bound.cpp (1)
1118-1136: Verify whetherroot_relax_soln_should be updated withcrossover_solutionafter successful crossover.The original review's observation is verified as correct: after the crossover call at lines 1139–1144, the output
crossover_solutionis never copied back intoroot_relax_soln_. Onlyroot_vstatus_is updated. The crushed PDLP solution remains inroot_relax_soln_, which is then used to computeroot_objective_at line 1175 and passed to downstream algorithms.The web search confirms that crossover is intended to convert a non-basic PDLP solution into a high-quality basic (vertex) solution for simplex warm-start. However, whether this codebase should propagate that basic solution back into
root_relax_soln_depends on the intended behavior: if crossover is meant only to refine basis information (root_vstatus_) while keeping the non-basic solution for dual simplex, the current code is correct; if crossover should supply an improved root state for the branch-and-bound tree, the code should propagate the solution.Clarify the intended use of the crossover solution in your root relaxation workflow. If crossover can supply a better-quality basic solution that should feed strong branching or node relaxation, apply the suggested fix; otherwise, document that only basis status is retained.
cpp/src/linear_programming/solve.cu (1)
653-667: All signatures properly updated and external callers correctly use the refactored API.Verification confirms:
- Declaration (solve.cuh:24), definition (solve.cu:758), and template instantiation (solve.cu:1016) all use
detail::problem_t<i_t, f_t>&- Internal call (solve.cu:861) passes
problemreference correctly- External calls (mip/solver.cu:136, diversity_manager.cu:377) dereference
problem_t<i_t, f_t>*pointers correctly permip_solver_context_tmember type (solver_context.cuh:40)- No lingering overloads with
op_problemparameter existThe refactor to centralize on
detail::problem_tis complete and consistent across the codebase.cpp/src/mip/diversity/diversity_manager.cu (2)
8-18: Header additions are consistent with new functionalityThe added headers (
cuda_profiler_api.h,diversity_manager.cuh,linear_programming/solve.cuh,utilities/scope_guard.hpp) match the new usage in this file (PDLP solving and scope guard). No issues from a build or dependency standpoint.
342-373: Constraint normalization and PDLP tolerance setup look reasonableConverting
>=rows to<=viaconvert_greater_to_less(*problem_ptr)before calling the LP solver is consistent with standardizing constraint senses and should be safe as long as this is only done once per presolved problem lifetime, which appears to be the case here. The relative tolerance derivation fromabsolute_tolerance / relative_tolerancewith a guard for zero divisor is also sane for scaling PDLP tolerances.cpp/src/mip/solver.cu (2)
128-147: Concurrent LP-only path now correctly uses the presolved problemThe LP-only branch now calls
solve_lp_with_method(*context.problem_ptr, settings, lp_timer), which matches the solver interface and ensures the concurrent LP is run on the presolved problem representation. The subsequent use ofopt_sol.get_primal_solution()to populatesolis consistent with this change.
205-210: Marking B&B as main thread improves threading semanticsCalling
branch_and_bound->set_main_thread(true)immediately after construction, before launching the asyncsolve, is a clean way to tag the owning thread in the B&B implementation. This should help avoid ambiguities in any main-thread-specific logic.
| #include <omp.h> | ||
| #include <algorithm> | ||
| #include <dual_simplex/branch_and_bound.hpp> | ||
| #include <dual_simplex/crossover.hpp> | ||
| #include <dual_simplex/initial_basis.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix duplicated NUMERICAL_ISSUES condition and unreachable “Unbounded” branch.
In the main-thread root path, the crossover status handling has a clear copy‑paste bug:
if (crossover_status == crossover_status_t::NUMERICAL_ISSUES) {
settings_.log.printf("MIP Infeasible\n");
return mip_status_t::INFEASIBLE;
}
if (crossover_status == crossover_status_t::NUMERICAL_ISSUES) {
settings_.log.printf("MIP Unbounded\n");
...
return mip_status_t::UNBOUNDED;
}The second if is unreachable and the intended status for the “MIP Unbounded” path is almost certainly something else (e.g., a distinct crossover enum for unboundedness or a feasibility variant). As written, you will never hit the “MIP Unbounded” path, and all NUMERICAL_ISSUES are treated as infeasible.
Please correct the second condition to the appropriate status (and/or use else if) so that infeasibility vs. unboundedness is distinguished as intended, and add a brief comment documenting the mapping between crossover_status_t and MIP status here.
Also applies to: 1061-1095, 1111-1169
🧰 Tools
🪛 Clang (14.0.6)
[error] 8-8: 'omp.h' file not found
(clang-diagnostic-error)
| root_relax_soln_.resize(original_lp_.num_rows, original_lp_.num_cols); | ||
|
|
||
| settings_.log.printf("Solving LP root relaxation\n"); | ||
| settings_.log.printf("Waiting for PDLP root relaxation\n"); | ||
|
|
||
| simplex_solver_settings_t lp_settings = settings_; | ||
| lp_settings.inside_mip = 1; | ||
| lp_status_t root_status = solve_linear_program_advanced( | ||
| original_lp_, stats_.start_time, lp_settings, root_relax_soln_, root_vstatus_, edge_norms_); | ||
| stats_.total_lp_iters = root_relax_soln_.iterations; | ||
| stats_.total_lp_solve_time = toc(stats_.start_time); | ||
| if (root_status == lp_status_t::INFEASIBLE) { | ||
| settings_.log.printf("MIP Infeasible\n"); | ||
| // FIXME: rarely dual simplex detects infeasible whereas it is feasible. | ||
| // to add a small safety net, check if there is a primal solution already. | ||
| // Uncomment this if the issue with cost266-UUE is resolved | ||
| // if (settings.heuristic_preemption_callback != nullptr) { | ||
| // settings.heuristic_preemption_callback(); | ||
| // } | ||
| return mip_status_t::INFEASIBLE; | ||
| } | ||
| if (root_status == lp_status_t::UNBOUNDED) { | ||
| settings_.log.printf("MIP Unbounded\n"); | ||
| if (settings_.heuristic_preemption_callback != nullptr) { | ||
| settings_.heuristic_preemption_callback(); | ||
| if (!is_main_thread()) { | ||
| lp_status_t root_status = solve_linear_program_advanced( | ||
| original_lp_, stats_.start_time, lp_settings, root_relax_soln_, root_vstatus_, edge_norms_); | ||
| stats_.total_lp_iters = root_relax_soln_.iterations; | ||
| stats_.total_lp_solve_time = toc(stats_.start_time); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Avoid unbounded busy-spin on root_relaxation_solution_set_ and clarify the callback contract.
The main-thread path now does:
settings_.log.printf("Waiting for PDLP root relaxation\n");
// ...
while (!root_relaxation_solution_set_.load(std::memory_order_acquire)) {
continue;
}This pure spin loop will:
- Burn a CPU core while PDLP is running, and
- Deadlock indefinitely if the root-relaxation callback is never invoked (e.g., error path, early exit, wiring issue), with no time-limit or cancellation check.
For robustness, consider at least:
- Adding a time-limit/interrupt check in the loop, and
- Using
std::this_thread::sleep_foror a condition variable/atomic wait to avoid a hot spin.
Also, make the contract explicit (in the header or nearby comment) that all PDLP paths must eventually call the setter that flips root_relaxation_solution_set_ (with memory_order_release) even on failure/time-limit, so B&B cannot hang forever waiting for a solution that never arrives.
To validate the wiring, please grep for all writes to root_relaxation_solution_set_ and ensure they are on the success and failure/time-limit paths of the PDLP root solve.
Also applies to: 1111-1136
🏁 Script executed:
#!/bin/bash
rg -n "root_relaxation_solution_set_" cppLength of output: 347
I need to see more context to verify the reviewer's concerns about the deadlock risk. Let me examine the spin loop and callback invocation sites.
---
🏁 Script executed:
```shell
#!/bin/bash
# Show the spin loop context (lines 1110-1120)
sed -n '1110,1120p' cpp/src/dual_simplex/branch_and_bound.cpp
Length of output: 520
🏁 Script executed:
#!/bin/bash
# Show the callback setter context (lines 135-145)
sed -n '135,145p' cpp/src/dual_simplex/branch_and_bound.hppLength of output: 449
🏁 Script executed:
#!/bin/bash
# Search for PDLP solver invocation to understand when callback is called
rg -n "pdlp|PDLP" cpp/src/dual_simplex/branch_and_bound.cpp -A 2 -B 2 | head -100Length of output: 214
🏁 Script executed:
#!/bin/bash
# Look for error handling or time-limit paths that might involve root_relaxation
rg -n "time.limit|error|timeout|fail" cpp/src/dual_simplex/branch_and_bound.cpp -C 2 | head -80Length of output: 3311
🏁 Script executed:
#!/bin/bash
# Check if there's a destructor or cleanup code
rg -n "~|destructor" cpp/src/dual_simplex/branch_and_bound.hppLength of output: 0
🏁 Script executed:
#!/bin/bash
# Check for any thread join or wait logic in main thread
rg -n "join|detach|thread" cpp/src/dual_simplex/branch_and_bound.cpp -C 2 | head -100Length of output: 4639
🏁 Script executed:
#!/bin/bash
# Look at the section mentioned in "Also applies to: 1111-1136"
sed -n '1111,1136p' cpp/src/dual_simplex/branch_and_bound.cppLength of output: 1206
🏁 Script executed:
#!/bin/bash
# Search for where the callback/lambda capturing root_relaxation_solution_set_ is registered
rg -n "root_relaxation_solution_set_|set_root_relax" cpp/src/dual_simplex/branch_and_bound.cpp -B 5 -A 5 | head -150Length of output: 592
I need to find where the callback method is defined and all sites that invoke it.
---
🏁 Script executed:
```shell
#!/bin/bash
# Find the method containing the setter at line 137
sed -n '120,150p' cpp/src/dual_simplex/branch_and_bound.hpp
Length of output: 1431
🏁 Script executed:
#!/bin/bash
# Search for all invocations of the root relaxation setter callback
rg -n "set_root_relax|set_root_relax_soln" cpp/src/dual_simplex/ -B 3 -A 1Length of output: 491
🏁 Script executed:
#!/bin/bash
# Search for PDLP configuration, callback setup, or lambda capture
rg -n "pdlp_solver|pdlp_settings|AddCallback|callback|set_root" cpp/src/dual_simplex/branch_and_bound.cpp -B 2 -A 2 | head -150Length of output: 3680
🏁 Script executed:
#!/bin/bash
# Check if there's error handling in the non-main thread path (lines 1067-1110)
sed -n '1067,1110p' cpp/src/dual_simplex/branch_and_bound.cppLength of output: 1969
🏁 Script executed:
#!/bin/bash
# Search for where PDLP solver is invoked and where callbacks might be registered
rg -n "solve_linear_program_advanced|pdlp" cpp/src/dual_simplex/ --type=hpp --type=cpp -B 5 -A 5 | head -200Length of output: 27
I need to find where set_root_relaxation_solution() is actually invoked. The earlier search didn't find call sites, which is concerning.
---
🏁 Script executed:
```shell
#!/bin/bash
# Search for call sites of set_root_relaxation_solution
rg -n "set_root_relaxation_solution" cpp/src/dual_simplex/ -A 1 -B 1
Length of output: 312
🏁 Script executed:
#!/bin/bash
# Correct syntax and search for PDLP solver setup
rg -n "solve_linear_program_advanced" cpp/src/dual_simplex/Length of output: 727
🏁 Script executed:
#!/bin/bash
# Look for callback mechanism or PDLP configuration
rg -n "callback|Callback" cpp/src/dual_simplex/branch_and_bound.hpp | head -30Length of output: 46
🏁 Script executed:
#!/bin/bash
# Check main thread detection and thread spawning
sed -n '1030,1080p' cpp/src/dual_simplex/branch_and_bound.cppLength of output: 1993
🏁 Script executed:
#!/bin/bash
# Search for where PDLP might be set up or where branch_and_bound is used as callback
rg -n "branch_and_bound" cpp/src/dual_simplex/ --type cpp -lLength of output: 114
🏁 Script executed:
#!/bin/bash
# Look for files that might setup PDLP with callback
find cpp/src -name "*.cpp" -o -name "*.h" | xargs grep -l "set_root_relaxation_solution" 2>/dev/nullLength of output: 0
The setter is never called within the B&B code itself—it must be invoked from external PDLP worker. Let me trace the callback mechanism.
---
🏁 Script executed:
```shell
#!/bin/bash
# Examine solve_linear_program_advanced definition
sed -n '103,180p' cpp/src/dual_simplex/solve.cpp
Length of output: 3592
🏁 Script executed:
#!/bin/bash
# Search for branch_and_bound usage in solve.cpp
rg -n "branch_and_bound|set_root_relaxation" cpp/src/dual_simplex/solve.cpp -B 3 -A 3Length of output: 949
🏁 Script executed:
#!/bin/bash
# Look for PDLP or callback context setup
rg -n "pdlp|PDLP|context|Context|callback" cpp/src/dual_simplex/solve.cpp | head -50Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check if branch_and_bound is passed to solve functions or if there's threading
rg -n "thread|Thread|PDLP|pdlp|worker" cpp/src/dual_simplex/branch_and_bound.cpp | head -40Length of output: 1930
🏁 Script executed:
#!/bin/bash
# Look for any MIP root solve entry point that might interface with external PDLP
rg -n "root.*relax|mip_status_t.*solve|solve.*mip" cpp/src/dual_simplex/branch_and_bound.cpp | head -30Length of output: 2162
Based on my verification, the reviewer's concern is confirmed as a legitimate major issue. Here's what I found:
Key findings:
set_root_relaxation_solution()(the setter) is never called from within B&B code—it must be invoked by external PDLP worker thread- The main thread path (line 1112) enters a pure spin loop with no timeout, no sleep, and no condition variable
- If the external PDLP never calls the setter (due to error, timeout, or wiring failure), the main thread deadlocks indefinitely
- The non-main thread path (line 1068) directly solves the LP but doesn't set the flag on error paths before returning
- Comment at line 1111 references "diversity manager" suggesting external integration, confirming the setter depends on external code
Avoid unbounded busy-spin on root_relaxation_solution_set_ and guarantee callback contract.
The main-thread path (line 1112) enters a pure spin loop with no safeguards:
while (!root_relaxation_solution_set_.load(std::memory_order_acquire)) {
continue;
}This will deadlock indefinitely if the external PDLP worker never calls set_root_relaxation_solution() (e.g., error path, timeout, or missing callback wiring), since there is no time-limit check or cancellation mechanism.
Required fixes:
- Add a time-limit check in the spin loop (or use
std::condition_variableorstd::atomic::waitfor better efficiency) - Document the callback contract: ensure all PDLP exit paths (success, error, time-limit) invoke the setter with
memory_order_release - Add a safety net (e.g., break after timeout) to prevent indefinite hangs
Also verify that all external PDLP error handlers guarantee invoking the setter before returning control to B&B.
🤖 Prompt for AI Agents
In cpp/src/dual_simplex/branch_and_bound.cpp around lines 1061-1072 and the spin
at ~1112, the main-thread busy-wait on root_relaxation_solution_set_ can
deadlock if the external PDLP never calls set_root_relaxation_solution();
replace the unbounded spin with a safe wait (prefer std::condition_variable or
std::atomic::wait with a timeout) and add a timeout/failure path that breaks the
wait, logs the timeout, and returns an error instead of spinning forever; also
ensure every PDLP exit path (success and all error/timeout handlers) calls
set_root_relaxation_solution() with memory_order_release (or signals the
condition_variable) so the contract is always fulfilled, and add unit/inline
comments documenting the requirement.
This PR implements concurrent root solve for MIP
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Refactor