-
Notifications
You must be signed in to change notification settings - Fork 92
Add Probing Implications and Improve Variable Order #580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/25.12
Are you sure you want to change the base?
Conversation
WalkthroughReorders probing/presolve sequencing and raises default probing time; refactors probing cache to use per-thread modification/substitution queues and dual-probe bound storage; adds variable substitution, bound-update, and CSR-sorting helpers to problem_t; adds a thread-safe work-unit queue and small presolve signature/guard changes. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45–75 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
cpp/src/mip/diversity/diversity_config.hpp(1 hunks)cpp/src/mip/diversity/diversity_manager.cu(1 hunks)cpp/src/mip/local_search/rounding/constraint_prop.cu(1 hunks)cpp/src/mip/presolve/multi_probe.cu(1 hunks)cpp/src/mip/presolve/probing_cache.cu(5 hunks)cpp/src/mip/presolve/trivial_presolve.cu(1 hunks)cpp/src/mip/presolve/trivial_presolve.cuh(3 hunks)cpp/src/mip/presolve/trivial_presolve_helpers.cuh(1 hunks)cpp/src/mip/problem/problem.cu(4 hunks)cpp/src/mip/problem/problem.cuh(1 hunks)cpp/src/mip/problem/problem_helpers.cuh(3 hunks)cpp/src/mip/utilities/work_unit_ordered_queue.cuh(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-22T14:25:22.899Z
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 527
File: cpp/src/mip/diversity/lns/rins.cu:167-175
Timestamp: 2025-10-22T14:25:22.899Z
Learning: In MIP (Mixed Integer Programming) problems in the cuOPT codebase, `n_integer_vars == 0` is impossible by definition—MIP problems must have at least one integer variable. If there are no integer variables, it would be a pure Linear Programming (LP) problem, not a MIP problem.
Applied to files:
cpp/src/mip/problem/problem.cucpp/src/mip/problem/problem_helpers.cuh
🧬 Code graph analysis (5)
cpp/src/mip/local_search/rounding/constraint_prop.cu (1)
cpp/src/utilities/copy_helpers.hpp (4)
get_lower(107-110)get_lower(107-107)get_upper(113-116)get_upper(113-113)
cpp/src/mip/presolve/multi_probe.cu (2)
cpp/src/linear_programming/pdlp.cu (2)
compute_stats(843-867)compute_stats(843-846)cpp/src/mip/presolve/bounds_presolve.cu (2)
calculate_activity(103-113)calculate_activity(103-103)
cpp/src/mip/problem/problem.cu (1)
cpp/src/utilities/copy_helpers.hpp (16)
device_copy(247-253)device_copy(247-248)device_copy(264-270)device_copy(264-266)device_copy(281-287)device_copy(281-282)device_copy(296-313)device_copy(296-296)make_span(332-337)make_span(332-334)make_span(340-345)make_span(340-342)make_span(348-351)make_span(348-348)make_span(354-357)make_span(354-354)
cpp/src/mip/diversity/diversity_manager.cu (2)
cpp/src/mip/presolve/probing_cache.cu (2)
compute_probing_cache(708-803)compute_probing_cache(708-710)cpp/src/mip/presolve/lb_probing_cache.cu (2)
compute_probing_cache(320-406)compute_probing_cache(320-322)
cpp/src/mip/presolve/probing_cache.cu (3)
cpp/src/utilities/copy_helpers.hpp (22)
make_span(332-337)make_span(332-334)make_span(340-345)make_span(340-342)make_span(348-351)make_span(348-348)make_span(354-357)make_span(354-354)host_copy(128-135)host_copy(128-128)host_copy(146-161)host_copy(146-146)host_copy(172-175)host_copy(172-172)host_copy(185-191)host_copy(185-185)host_copy(202-208)host_copy(202-202)host_copy(219-222)host_copy(219-219)host_copy(233-236)host_copy(233-233)cpp/src/mip/problem/problem.cu (2)
view(917-954)view(917-917)cpp/src/mip/presolve/trivial_presolve.cu (2)
trivial_presolve(24-24)trivial_presolve(28-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, arm64, rockylinux8
- GitHub Check: checks / check-style
| cusparseCreate(&handle); | ||
| cusparseSetStream(handle, stream); | ||
|
|
||
| i_t nnz = values.size(); | ||
| i_t m = rows; | ||
| i_t n = cols; | ||
|
|
||
| cusparseMatDescr_t matA; | ||
| cusparseCreateMatDescr(&matA); | ||
| cusparseSetMatIndexBase(matA, CUSPARSE_INDEX_BASE_ZERO); | ||
| cusparseSetMatType(matA, CUSPARSE_MATRIX_TYPE_GENERAL); | ||
|
|
||
| size_t pBufferSizeInBytes = 0; | ||
| check_cusparse_status(cusparseXcsrsort_bufferSizeExt( | ||
| handle, m, n, nnz, offsets.data(), indices.data(), &pBufferSizeInBytes)); | ||
| rmm::device_uvector<uint8_t> pBuffer(pBufferSizeInBytes, stream); | ||
| cuopt_assert(((intptr_t)pBuffer.data() % 128) == 0, | ||
| "CUSPARSE buffer size is not aligned to 128 bytes"); | ||
| rmm::device_uvector<i_t> P(nnz, stream); | ||
| thrust::sequence(handle_ptr->get_thrust_policy(), P.begin(), P.end()); | ||
|
|
||
| check_cusparse_status(cusparseXcsrsort( | ||
| handle, m, n, nnz, matA, offsets.data(), indices.data(), P.data(), pBuffer.data())); | ||
|
|
||
| // apply the permutation to the values | ||
| rmm::device_uvector<f_t> values_sorted(nnz, stream); | ||
| thrust::gather( | ||
| handle_ptr->get_thrust_policy(), P.begin(), P.end(), values.begin(), values_sorted.begin()); | ||
| thrust::copy( | ||
| handle_ptr->get_thrust_policy(), values_sorted.begin(), values_sorted.end(), values.begin()); | ||
|
|
||
| cusparseDestroyMatDescr(matA); | ||
| cusparseDestroy(handle); | ||
|
|
||
| check_csr_representation(values, offsets, indices, handle_ptr, cols, rows); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always check cuSPARSE return codes
cusparseCreate, cusparseSetStream, cusparseCreateMatDescr, and the various cusparseSet* / cusparseDestroy* calls all return a status. Right now we ignore those, so if any allocation fails (OOM, bad handle reuse, etc.) we keep running with an invalid handle/descriptor and the later sort call will UB or misbehave before the guarded cusparseXcsrsort. Please route every cuSPARSE invocation through check_cusparse_status so we fail fast and cleanly.
- cusparseCreate(&handle);
- cusparseSetStream(handle, stream);
+ check_cusparse_status(cusparseCreate(&handle));
+ check_cusparse_status(cusparseSetStream(handle, stream));
...
- cusparseCreateMatDescr(&matA);
- cusparseSetMatIndexBase(matA, CUSPARSE_INDEX_BASE_ZERO);
- cusparseSetMatType(matA, CUSPARSE_MATRIX_TYPE_GENERAL);
+ check_cusparse_status(cusparseCreateMatDescr(&matA));
+ check_cusparse_status(cusparseSetMatIndexBase(matA, CUSPARSE_INDEX_BASE_ZERO));
+ check_cusparse_status(cusparseSetMatType(matA, CUSPARSE_MATRIX_TYPE_GENERAL));
...
- cusparseDestroyMatDescr(matA);
- cusparseDestroy(handle);
+ check_cusparse_status(cusparseDestroyMatDescr(matA));
+ check_cusparse_status(cusparseDestroy(handle));🤖 Prompt for AI Agents
In cpp/src/mip/problem/problem_helpers.cuh around lines 342-376 the code calls
several cuSPARSE APIs (cusparseCreate, cusparseSetStream,
cusparseCreateMatDescr, cusparseSetMatIndexBase, cusparseSetMatType,
cusparseDestroyMatDescr, cusparseDestroy and any other cusparseSet*/Destroy*
calls) without checking their return status; update the code to route every
cuSPARSE invocation through check_cusparse_status (e.g.
check_cusparse_status(cusparseCreate(&handle)); etc.) so failures are detected
immediately and handled cleanly before proceeding to calls like
cusparseXcsrsort. Ensure all existing cusparse calls in this block and their
corresponding destroy calls are wrapped with check_cusparse_status.
| cuopt_assert(var_indices.size() == coefficient_values.size(), "size mismatch"); | ||
| auto d_var_indices = device_copy(var_indices, handle_ptr->get_stream()); | ||
| auto d_var_to_substitude_indices = | ||
| device_copy(var_to_substitude_indices, handle_ptr->get_stream()); | ||
| auto d_offset_values = device_copy(offset_values, handle_ptr->get_stream()); | ||
| auto d_coefficient_values = device_copy(coefficient_values, handle_ptr->get_stream()); | ||
| fixing_helpers.reduction_in_rhs.resize(n_constraints, handle_ptr->get_stream()); | ||
| fixing_helpers.variable_fix_mask.resize(n_variables, handle_ptr->get_stream()); | ||
| thrust::fill(handle_ptr->get_thrust_policy(), | ||
| fixing_helpers.reduction_in_rhs.begin(), | ||
| fixing_helpers.reduction_in_rhs.end(), | ||
| 0); | ||
| thrust::fill(handle_ptr->get_thrust_policy(), | ||
| fixing_helpers.variable_fix_mask.begin(), | ||
| fixing_helpers.variable_fix_mask.end(), | ||
| -1); | ||
|
|
||
| rmm::device_scalar<f_t> objective_offset(0., handle_ptr->get_stream()); | ||
| thrust::for_each(handle_ptr->get_thrust_policy(), | ||
| thrust::make_counting_iterator(0), | ||
| thrust::make_counting_iterator(0) + d_var_indices.size(), | ||
| [variable_fix_mask = make_span(fixing_helpers.variable_fix_mask), | ||
| var_indices = make_span(d_var_indices), | ||
| n_variables = n_variables, | ||
| substitute_coefficient = make_span(d_coefficient_values), | ||
| substitute_offset = make_span(d_offset_values), | ||
| var_to_substitude_indices = make_span(d_var_to_substitude_indices), | ||
| objective_coefficients = make_span(objective_coefficients), | ||
| objective_offset = objective_offset.data()] __device__(i_t idx) { | ||
| i_t var_idx = var_indices[idx]; | ||
| i_t substituting_var_idx = var_to_substitude_indices[idx]; | ||
| variable_fix_mask[var_idx] = idx; | ||
| f_t objective_offset_difference = | ||
| objective_coefficients[var_idx] * substitute_offset[idx]; | ||
| atomicAdd(objective_offset, objective_offset_difference); | ||
| objective_coefficients[var_idx] += | ||
| objective_coefficients[var_idx] * substitute_coefficient[idx]; | ||
| }); | ||
| f_t objective_offset_value = objective_offset.value(handle_ptr->get_stream()); | ||
| presolve_data.objective_offset += objective_offset_value; | ||
| const i_t num_segments = n_constraints; | ||
| f_t initial_value{0.}; | ||
|
|
||
| auto input_transform_it = thrust::make_transform_iterator( | ||
| thrust::make_counting_iterator(0), | ||
| [coefficients = make_span(coefficients), | ||
| variables = make_span(variables), | ||
| variable_fix_mask = make_span(fixing_helpers.variable_fix_mask), | ||
| substitute_coefficient = make_span(d_coefficient_values), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Objective mass is lost during substitution
When we substitute var_indices[idx] = offset + coeff * var_to_substitude, the objective term c_i * var_indices[idx] must become c_i * offset (constant) plus c_i * coeff * var_to_substitude. The current code adds c_i * offset but never transfers c_i * coeff to objective_coefficients[var_to_substitude_indices[idx]]; instead, it briefly adds it to the eliminated variable and later zeroes that coefficient when removing duplicates. As a result, the contribution c_i * coeff * var_to_substitude disappears entirely, yielding an incorrect objective after substitution. Please accumulate the scaled coefficient onto the surviving variable (and then zero the eliminated one) so that the objective remains equivalent. (en.wikipedia.org)
🤖 Prompt for AI Agents
In cpp/src/mip/problem/problem.cu around lines 1112-1160, the objective
contribution from substituting var_indices[idx] = offset + coeff *
var_to_substitude is lost because the code adds the scaled coefficient back onto
the eliminated variable instead of the surviving variable. Replace the current
in-kernel update that does objective_coefficients[var_idx] +=
objective_coefficients[var_idx] * substitute_coefficient[idx]; with an atomic
accumulation of the scaled coefficient onto the surviving variable
(atomicAdd(&objective_coefficients[substituting_var_idx], scaled_value)) and
then ensure the eliminated variable's objective coefficient is set to zero
(atomicExch or assignment after kernel) so duplicates removal doesn't drop the
contribution; use the appropriate device atomic for f_t and keep
objective_offset atomicAdd as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I handled your review, can you check again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Extract the substitute_variables method to check the current implementation
# Focus on the objective coefficient update logic
# Show the substitute_variables method implementation
sed -n '/^void problem_t<i_t, f_t>::substitute_variables/,/^}/p' cpp/src/mip/problem/problem.cu | head -100
Length of output: 5888
🏁 Script executed:
#!/bin/bash
# Check if the eliminated variable's objective coefficient is zeroed out
# Look for the part where duplicates are removed and objective coefficients are zeroed
sed -n '/substitute_variables/,/^}/p' cpp/src/mip/problem/problem.cu | grep -A 5 -B 5 "objective_coefficients.*= 0"
Length of output: 633
@akifcorduk, great progress! The fix is mostly correct now. Line 1147 properly transfers the scaled coefficient to the surviving variable (substituting_var_idx). However, there are a few remaining issues:
Issues to address:
-
Remove debug printf on line 1109:
printf("dummy_substituted_variable %d\n", dummy_substituted_variable);
-
Zero out eliminated variable's objective coefficient: After the first
thrust::for_eachkernel (around line 1150), you should explicitly zero outobjective_coefficients[var_idx]for all eliminated variables. Currently, the coefficient is only zeroed during duplicate removal, but if the eliminated variable doesn't create duplicates, its coefficient remains non-zero. Add this after line 1150:// Zero out objective coefficients for eliminated variables thrust::for_each(handle_ptr->get_thrust_policy(), d_var_indices.begin(), d_var_indices.end(), [objective_coefficients = make_span(objective_coefficients)] __device__(i_t var_idx) { objective_coefficients[var_idx] = 0.; });
-
Thread safety consideration: Line 1147 uses
+=which is not thread-safe if multiple variables substitute to the same target. Consider usingatomicAddinstead:atomicAdd(&objective_coefficients[substituting_var_idx], objective_coefficients[var_idx] * substitute_coefficient[idx]);
The core logic is now correct—you're successfully preserving the objective's mathematical equivalence after substitution! 🎉
🧬 Code Graph Analysis Results
[cpp/src/mip/problem/problem.cu] lines 100-150
template <typename i_t, typename f_t> i_t problem_t<i_t, f_t>::get_n_binary_variables()
{
n_binary_vars = thrust::count_if(handle_ptr->get_thrust_policy(),
is_binary_variable.begin(),
is_binary_variable.end(),
cuda::std::identity{});
return n_binary_vars;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
cpp/src/mip/diversity/diversity_manager.cu (2)
18-18: Consider makingfj_only_runa runtime configuration.The compile-time constant limits flexibility. If different problem instances might benefit from skipping probing cache during presolve, consider making this a runtime configuration parameter in
diversity_config.
188-191: Add early exit when insufficient time remains for probing.The concern is valid. Although
compute_probing_cacheinternally guards withtimer.check_time_limit()and exits immediately, it still incurs function call and parallel setup overhead. Whentime_for_probing_cacheis zero or negative (which can occur iftime_limitis small or nearly exhausted), the function would exit immediately on its first check without performing any useful work.Following the established codebase pattern (seen throughout
diverse_solver.hpp), add a guard at the call site to skip the function entirely when time budget is insufficient:if (time_for_probing_cache > 0.0) { timer_t probing_timer{time_for_probing_cache}; compute_probing_cache(ls.constraint_prop.bounds_update, *problem_ptr, probing_timer); }This avoids unnecessary overhead and aligns with how time-limited operations are handled elsewhere in the codebase.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cpp/src/mip/diversity/diversity_config.hpp(1 hunks)cpp/src/mip/diversity/diversity_manager.cu(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- cpp/src/mip/diversity/diversity_config.hpp
🧰 Additional context used
🧬 Code graph analysis (1)
cpp/src/mip/diversity/diversity_manager.cu (2)
cpp/src/mip/presolve/probing_cache.cu (2)
compute_probing_cache(698-793)compute_probing_cache(698-700)cpp/src/mip/presolve/lb_probing_cache.cu (2)
compute_probing_cache(310-396)compute_probing_cache(310-312)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: conda-cpp-build / 12.9.1, 3.10, amd64, rockylinux8
- GitHub Check: conda-cpp-build / 13.0.2, 3.10, amd64, rockylinux8
- GitHub Check: conda-cpp-build / 12.9.1, 3.10, arm64, rockylinux8
- GitHub Check: conda-cpp-build / 13.0.2, 3.10, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-sh-client / 13.0.2, 3.10, amd64, rockylinux8
🔇 Additional comments (1)
cpp/src/mip/diversity/diversity_manager.cu (1)
191-193: The double invocation oftrivial_presolveis intentional, not redundant.The two calls serve different purposes:
- Inside
compute_probing_cache(probing_cache.cu:790):trivial_presolve(problem, true)remaps cache IDs- After
compute_probing_cachereturns (diversity_manager.cu:193):trivial_presolve(*problem_ptr)performs cleanup without remapping (remap_cache_ids defaults to false)The different parameter values indicate distinct responsibilities—the first call handles cache ID remapping as part of probing, while the second performs final presolve operations. No changes needed.
Note: The review references line 793, but the actual call is at line 790 in probing_cache.cu.
| if (!problem_ptr->empty && !check_bounds_sanity(*problem_ptr)) { return false; } | ||
| } | ||
| if (!fj_only_run) { | ||
| // before probing cache or LP, run FJ to generate initial primal feasible solution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix misleading comment.
The comment states "run FJ" but the code executes probing cache computation, not FJ (Feasibility Jump).
Apply this diff to correct the comment:
- // before probing cache or LP, run FJ to generate initial primal feasible solution
+ // Run probing cache before trivial presolve to discover variable implications📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // before probing cache or LP, run FJ to generate initial primal feasible solution | |
| // Run probing cache before trivial presolve to discover variable implications |
🤖 Prompt for AI Agents
In cpp/src/mip/diversity/diversity_manager.cu around line 185, the comment
incorrectly says "run FJ" while the code actually performs probing cache
computation; update the comment text to accurately describe that probing cache
computation (or probing) is executed before probing the cache or LP to generate
an initial primal feasible solution so the comment reflects the implemented
action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
cpp/src/mip/problem/problem.cu (2)
1120-1139: Fix lost objective mass and race during substitution updatesWe’re still dropping objective contribution when substituting: multiple substitutions targeting the same survivor race on
objective_coefficients[substituting_var_idx], and the eliminated variable’s coefficient stays non-zero, so its term survives in the objective. This is the same bug flagged earlier—the data race plus the lingering coefficient wipe out the intendedc_i * coeff * x_sub. Please atomically accumulate onto the surviving variable and zero the eliminated one.- f_t objective_offset_difference = - objective_coefficients[var_idx] * substitute_offset[idx]; - atomicAdd(objective_offset, objective_offset_difference); - objective_coefficients[substituting_var_idx] += - objective_coefficients[var_idx] * substitute_coefficient[idx]; + f_t coeff = objective_coefficients[var_idx]; + f_t objective_offset_difference = coeff * substitute_offset[idx]; + atomicAdd(objective_offset, objective_offset_difference); + atomicAdd(objective_coefficients.data() + substituting_var_idx, + coeff * substitute_coefficient[idx]); + objective_coefficients[var_idx] = 0.;
1226-1233: Do not zero the surviving variable’s objective coefficientThis line zeros
objective_coefficients[next_var_idx], i.e. the coefficient you just transferred mass onto. The surviving variable loses all objective weight, reintroducing the bug from the earlier review. Drop this assignment—the eliminated variable is already nulled out by the fix above.- variables[offset_begin + 1] = dummy_substituted_variable; - coefficients[offset_begin + 1] = 0.; - objective_coefficients[next_var_idx] = 0.; + variables[offset_begin + 1] = dummy_substituted_variable; + coefficients[offset_begin + 1] = 0.;
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cpp/src/mip/presolve/probing_cache.cu(5 hunks)cpp/src/mip/problem/problem.cu(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-22T14:25:22.899Z
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 527
File: cpp/src/mip/diversity/lns/rins.cu:167-175
Timestamp: 2025-10-22T14:25:22.899Z
Learning: In MIP (Mixed Integer Programming) problems in the cuOPT codebase, `n_integer_vars == 0` is impossible by definition—MIP problems must have at least one integer variable. If there are no integer variables, it would be a pure Linear Programming (LP) problem, not a MIP problem.
Applied to files:
cpp/src/mip/problem/problem.cu
🧬 Code graph analysis (2)
cpp/src/mip/problem/problem.cu (1)
cpp/src/utilities/copy_helpers.hpp (16)
device_copy(237-243)device_copy(237-238)device_copy(254-260)device_copy(254-256)device_copy(271-277)device_copy(271-272)device_copy(286-303)device_copy(286-286)make_span(322-327)make_span(322-324)make_span(330-335)make_span(330-332)make_span(338-341)make_span(338-338)make_span(344-347)make_span(344-344)
cpp/src/mip/presolve/probing_cache.cu (3)
cpp/src/utilities/copy_helpers.hpp (22)
make_span(322-327)make_span(322-324)make_span(330-335)make_span(330-332)make_span(338-341)make_span(338-338)make_span(344-347)make_span(344-344)host_copy(118-125)host_copy(118-118)host_copy(136-151)host_copy(136-136)host_copy(162-165)host_copy(162-162)host_copy(175-181)host_copy(175-175)host_copy(192-198)host_copy(192-192)host_copy(209-212)host_copy(209-209)host_copy(223-226)host_copy(223-223)cpp/src/mip/problem/problem.cu (2)
view(907-944)view(907-907)cpp/src/mip/presolve/trivial_presolve.cu (2)
trivial_presolve(14-14)trivial_presolve(18-18)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
cpp/src/mip/presolve/probing_cache.cu (1)
686-693: Critical: std::find on device memory causes undefined behavior.
count_per_variableis adevice_uvector, so its iterators point to GPU memory. Callingstd::findon device iterators from host code is undefined behavior and will either crash or return garbage.Copy
count_per_variableto host first, then find the cutoff:auto h_priority_indices = host_copy(priority_indices); + auto h_counts = host_copy(count_per_variable); problem.handle_ptr->sync_stream(); - // Find the index of the first 0 element in h_priority_indices - auto first_zero_it = std::find(count_per_variable.begin(), count_per_variable.end(), 0); + // Find the index of the first 0 element in counts + auto first_zero_it = std::find(h_counts.begin(), h_counts.end(), 0); size_t first_zero_index = (first_zero_it != count_per_variable.end()) - ? std::distance(count_per_variable.begin(), first_zero_it) - : count_per_variable.size(); + ? std::distance(h_counts.begin(), first_zero_it) + : h_counts.size(); h_priority_indices.erase(h_priority_indices.begin() + first_zero_index, h_priority_indices.end());Note: This is the same issue flagged in the past review comment for lines 686-694, indicating it was not fully addressed.
Based on past review comment
🧹 Nitpick comments (6)
cpp/src/mip/presolve/probing_cache.cu (6)
499-500: Expensive assertions in hot path.These bound-validation assertions execute once per variable per probe in a multi-threaded loop. In debug builds this adds significant overhead; in release builds they vanish, hiding potential bugs.
Consider replacing with conditional error handling that logs and gracefully degrades:
- cuopt_assert(h_var_bounds[i].x <= lower_bound, "lower bound violation"); - cuopt_assert(h_var_bounds[i].y >= upper_bound, "upper bound violation"); + if (h_var_bounds[i].x > lower_bound || h_var_bounds[i].y < upper_bound) { + CUOPT_LOG_WARNING("Bound violation for var %d: inferred [%f,%f] vs current [%f,%f]", + i, lower_bound, upper_bound, h_var_bounds[i].x, h_var_bounds[i].y); + continue; // Skip this update + }
568-597: First-substitution-wins policy may be suboptimal.When multiple substitutions target the same variable, the code arbitrarily keeps the first one (line 578). If later substitutions have better coefficients or fewer dependencies, this heuristic might miss opportunities.
Consider documenting this policy explicitly or evaluating substitutions by a quality metric (e.g., smallest coefficient magnitude, fewest cascading dependencies).
720-722: Modification/substitution vectors accumulate across steps.The per-thread
modification_vector_poolandsubstitution_vector_poolare never cleared afterapply_modification_queue_to_problem(line 771) or between steps. While modifications are applied incrementally, the vectors continue to grow, holding stale entries.Clear the modification pool after applying, or document if accumulation across steps is intentional for replay/debugging:
{ // TODO when we have determinism, check current threads work/time counter and filter queue // items that are smaller or equal to that apply_modification_queue_to_problem(modification_vector_pool, problem); + // Clear applied modifications + for (auto& vec : modification_vector_pool) { vec.clear(); }
736-736: Hardcoded step size may not suit all problem scales.The step size is fixed at
min(2048, priority_indices.size()). For very large problems (millions of variables) this creates many sequential steps, reducing parallelism. For tiny problems it's a no-op but adds loop overhead.Consider scaling step size by available parallelism or making it configurable:
- const size_t step_size = min((size_t)2048, priority_indices.size()); + const size_t step_size = std::min(priority_indices.size(), + std::max((size_t)256, priority_indices.size() / (max_threads * 4)));
728-728: Unused compute_stats flag.
compute_statsis set totruefor each worker, but the stats are never collected, aggregated, or logged. This suggests incomplete instrumentation.Do you want me to generate code to collect and report probing statistics (e.g., infeasibility counts, bound improvements per probe, wall-clock time per variable)?
704-705: Commented-out call to old priority computation.The old
compute_prioritized_integer_indicespath is commented out. If it's deprecated, remove it; if it's an alternative heuristic, document when to use which.Remove the commented line or convert it to a configuration option if both heuristics have merit.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cpp/src/mip/presolve/probing_cache.cu(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
cpp/src/mip/presolve/probing_cache.cu (3)
cpp/src/utilities/copy_helpers.hpp (22)
make_span(322-327)make_span(322-324)make_span(330-335)make_span(330-332)make_span(338-341)make_span(338-338)make_span(344-347)make_span(344-344)host_copy(118-125)host_copy(118-118)host_copy(136-151)host_copy(136-136)host_copy(162-165)host_copy(162-162)host_copy(175-181)host_copy(175-175)host_copy(192-198)host_copy(192-192)host_copy(209-212)host_copy(209-209)host_copy(223-226)host_copy(223-223)cpp/src/mip/problem/problem.cu (2)
view(907-944)view(907-907)cpp/src/mip/presolve/trivial_presolve.cu (2)
trivial_presolve(14-14)trivial_presolve(18-18)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-sh-client / 13.0.2, 3.10, amd64, rockylinux8
🔇 Additional comments (5)
cpp/src/mip/presolve/probing_cache.cu (5)
9-9: LGTM!The new includes support the refactored probing workflow.
trivial_presolve.cuhis invoked at line 790, andwork_unit_ordered_queue.cuhaligns with the new per-thread work-unit processing described in the PR objectives.Also applies to: 13-13
534-566: LGTM!The modification queue correctly merges per-thread updates by taking the intersection of bounds (max of lower, min of upper), ensuring conservative tightening when multiple threads propose conflicting modifications for the same variable.
381-382: ****The codebase does not contain a
problem.is_binary_variablefield, and the current implementation is semantically correct. Theis_binarycheck at line 382 appropriately distinguishes between variables with exact [0,1] bounds versus any variable with bound width of 1. This distinction is intentional and properly explained by the comment—the variable substitution logic at line 513 requires specifically binary-constrained variables [0,1], not just any unit-interval variable. The comment already clarifies this nuance.Likely an incorrect or invalid review comment.
771-771: The review comment is incorrect and based on a false premise.Modifications in
apply_modification_queue_to_problemonly update variable bounds viaproblem.update_variable_bounds()— they do not eliminate variables or change variable indices. The tuple structure<time, var_idx, lb, ub>confirms modifications only tighten bounds.Since modifications cannot eliminate variables, substitutions applied later (line 785) safely reference the indices collected during the loop. The indices remain valid. The asymmetry between modifications (in-loop) and substitutions (after-loop) poses no risk of stale references.
trivial_presolvewithremap_cache_ids=truehandles any subsequent variable elimination and remapping after substitutions are complete.Likely an incorrect or invalid review comment.
646-670: The review comment is incorrect. The code properly handles CUB's runtime buffer sizing.CUB's temp_storage_bytes requirement depends on the operator, data type, segment layout, number of segments and device configuration, with no closed-form formula. The two reductions operate on fundamentally different segment structures:
- Reduction 1:
n_constraintssegments usingproblem.offsets- Reduction 2:
n_variablessegments usingproblem.reverse_offsetsSince segment count and layout differ, the buffer requirements are independent and can differ. The resize at line 657 is necessary, not redundant. The code correctly implements CUB's pattern: call with d_temp_storage == nullptr to query the requirement, allocate, then call again with the allocated storage.
Likely an incorrect or invalid review comment.
| cuopt_assert(multi_probe_presolve.infeas_constraints_count_0 == 0 || | ||
| multi_probe_presolve.infeas_constraints_count_1 == 0, | ||
| "At least one of the probes must be feasible"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsafe assumption: both probes can be infeasible simultaneously.
The assertion requires at least one probe to be feasible, but if the variable's domain is genuinely infeasible (e.g., due to conflicting constraints discovered during probing), both probes can fail. This assertion will crash in production builds or be silently ignored in release builds, masking a real infeasibility.
Replace the assertion with proper infeasibility propagation:
- cuopt_assert(multi_probe_presolve.infeas_constraints_count_0 == 0 ||
- multi_probe_presolve.infeas_constraints_count_1 == 0,
- "At least one of the probes must be feasible");
+ if (multi_probe_presolve.infeas_constraints_count_0 > 0 &&
+ multi_probe_presolve.infeas_constraints_count_1 > 0) {
+ CUOPT_LOG_WARNING("Both probes infeasible for var %d - variable domain is infeasible", var_idx);
+ // Signal global infeasibility or handle as needed
+ return;
+ }🤖 Prompt for AI Agents
In cpp/src/mip/presolve/probing_cache.cu around lines 438 to 440, the code
currently asserts that at least one probe is feasible which is unsafe because
both probes can legitimately be infeasible; replace the assertion with proper
infeasibility propagation by detecting the case where both
infeasibilities_count_0 and infeasibilities_count_1 are non‑zero, mark the
variable/branch as globally infeasible (propagate infeasibility to the
presolver/solver state or return an error code), and otherwise continue handling
the single-feasible-probe path; ensure you log or record the infeasibility cause
and avoid using an assertion so production builds don’t crash or silently ignore
the condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
cpp/src/mip/presolve/probing_cache.cu (2)
439-441: Replace assertion with proper infeasibility propagation.The assertion requires at least one probe to be feasible, but both probes can legitimately fail when the variable's domain is genuinely infeasible due to conflicting constraints discovered during probing. This assertion will crash in debug builds or be silently ignored in release builds, masking real infeasibility.
Replace the assertion with proper infeasibility detection and propagation:
- cuopt_assert(multi_probe_presolve.infeas_constraints_count_0 == 0 || - multi_probe_presolve.infeas_constraints_count_1 == 0, - "At least one of the probes must be feasible"); + if (multi_probe_presolve.infeas_constraints_count_0 > 0 && + multi_probe_presolve.infeas_constraints_count_1 > 0) { + CUOPT_LOG_WARNING("Both probes infeasible for var %d - variable domain is infeasible", var_idx); + // Signal global infeasibility or propagate to caller + return; + }Based on past review comments.
658-671: Remove redundant segmented reduce call.Lines 647-656 and 662-671 call
cub::DeviceSegmentedReduce::Reducewith identical parameters, both writing tocount_per_variable. The second call simply overwrites the result of the first, making lines 647-656 redundant.Remove the redundant first reduce call:
cub::DeviceSegmentedReduce::Reduce(d_temp_storage, temp_storage_bytes, input_transform_it_2, count_per_variable.data(), problem.n_variables, problem.reverse_offsets.data(), problem.reverse_offsets.data() + 1, cuda::std::plus<>{}, 0, problem.handle_ptr->get_stream()); - - temp_storage.resize(temp_storage_bytes, problem.handle_ptr->get_stream()); - d_temp_storage = thrust::raw_pointer_cast(temp_storage.data()); - - // Run reduction - cub::DeviceSegmentedReduce::Reduce(d_temp_storage, - temp_storage_bytes, - input_transform_it_2, - count_per_variable.data(), - problem.n_variables, - problem.reverse_offsets.data(), - problem.reverse_offsets.data() + 1, - cuda::std::plus<>{}, - 0, - problem.handle_ptr->get_stream());cpp/src/mip/problem/problem.cu (1)
1119-1138: Zero out eliminated variables' objective coefficients.The kernel correctly transfers the scaled objective coefficient to the surviving variable using
atomicAdd(lines 1136-1137). However, the eliminated variable's objective coefficient (objective_coefficients[var_idx]) is never explicitly zeroed unless it creates duplicates (see line 1232). If an eliminated variable doesn't produce duplicates in any constraint, its coefficient remains non-zero and will incorrectly affect the objective value.Add this kernel after line 1138 to zero out all eliminated variables' objective coefficients:
+ thrust::for_each(handle_ptr->get_thrust_policy(), + d_var_indices.begin(), + d_var_indices.end(), + [objective_coefficients = make_span(objective_coefficients)] __device__(i_t var_idx) { + objective_coefficients[var_idx] = 0.; + });Based on past review comments.
🧹 Nitpick comments (1)
cpp/src/mip/problem/problem.cu (1)
1856-1857: Clarify assertion messages for bound tightening.The assertion logic is correct (verifying that new bounds tighten existing bounds), but the error messages could be clearer about what constitutes a violation.
Consider updating the messages to be more descriptive:
- cuopt_assert(variable_bounds[var_idx].x <= lb_values[i], "variable lower bound violation"); - cuopt_assert(variable_bounds[var_idx].y >= ub_values[i], "variable upper bound violation"); + cuopt_assert(variable_bounds[var_idx].x <= lb_values[i], + "New lower bound must be >= existing lower bound (bound tightening)"); + cuopt_assert(variable_bounds[var_idx].y >= ub_values[i], + "New upper bound must be <= existing upper bound (bound tightening)");
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
cpp/src/mip/presolve/probing_cache.cu(5 hunks)cpp/src/mip/problem/problem.cu(4 hunks)python/libcuopt/pyproject.toml(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-22T14:25:22.899Z
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 527
File: cpp/src/mip/diversity/lns/rins.cu:167-175
Timestamp: 2025-10-22T14:25:22.899Z
Learning: In MIP (Mixed Integer Programming) problems in the cuOPT codebase, `n_integer_vars == 0` is impossible by definition—MIP problems must have at least one integer variable. If there are no integer variables, it would be a pure Linear Programming (LP) problem, not a MIP problem.
Applied to files:
cpp/src/mip/problem/problem.cucpp/src/mip/presolve/probing_cache.cu
🧬 Code graph analysis (2)
cpp/src/mip/problem/problem.cu (1)
cpp/src/utilities/copy_helpers.hpp (16)
device_copy(237-243)device_copy(237-238)device_copy(254-260)device_copy(254-256)device_copy(271-277)device_copy(271-272)device_copy(286-303)device_copy(286-286)make_span(322-327)make_span(322-324)make_span(330-335)make_span(330-332)make_span(338-341)make_span(338-338)make_span(344-347)make_span(344-344)
cpp/src/mip/presolve/probing_cache.cu (3)
cpp/src/utilities/copy_helpers.hpp (22)
make_span(322-327)make_span(322-324)make_span(330-335)make_span(330-332)make_span(338-341)make_span(338-338)make_span(344-347)make_span(344-344)host_copy(118-125)host_copy(118-118)host_copy(136-151)host_copy(136-136)host_copy(162-165)host_copy(162-162)host_copy(175-181)host_copy(175-175)host_copy(192-198)host_copy(192-192)host_copy(209-212)host_copy(209-209)host_copy(223-226)host_copy(223-223)cpp/src/mip/problem/problem.cu (2)
view(907-944)view(907-907)cpp/src/mip/presolve/trivial_presolve.cu (2)
trivial_presolve(14-14)trivial_presolve(18-18)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: wheel-build-cuopt-sh-client / 13.0.2, 3.10, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, arm64, rockylinux8
🔇 Additional comments (6)
python/libcuopt/pyproject.toml (1)
57-57: Verify that the updated package size threshold was validated during testing.The
max_allowed_size_compressedthreshold increased by 10MB (1.7%) to accommodate the new features (probing implications, variable substitution, work-queue utilities). This appears proportionate given the substantial backend additions described in the PR.Ensure the package was built and its compressed size verified to confirm it fits within 605M during your local testing.
cpp/src/mip/problem/problem.cu (2)
1701-1701: Clarify the commented-out mutex lock.The commented-out
std::lock_guardraises questions about thread safety inget_host_user_problem. If concurrent access is possible and thread safety is required, the mutex should be uncommented. If the mutex is unnecessary, the comment should be removed to avoid confusion.Could you clarify whether:
- Thread safety is needed for this method?
- If yes, why is the mutex commented out?
- If no, can the comment be removed?
1328-1332: LGTM - Clean CSR sorting wrapper.The function provides a clean interface to sort constraint rows by variable indices using cuSPARSE.
cpp/src/mip/presolve/probing_cache.cu (3)
535-567: LGTM - Correct bound tightening logic.The function correctly merges bound modifications by taking the maximum lower bound and minimum upper bound for each variable, which is commutative and ensures determinism regardless of processing order.
569-598: LGTM - Proper substitution deduplication.The function correctly handles potential duplicate substitutions by taking only the first substitution for each variable (line 579), preventing conflicting substitutions. The determinism depends on the queue order being deterministic, which appears to be ensured by timestamp ordering.
689-697: LGTM - Correctly implemented priority computation and early exit.The code correctly:
- Uses
thrust::lower_boundwiththrust::greatercomparator to find the first zero in the descending priority sequence (lines 689-697)- Ensures a non-zero early-exit threshold using
std::max(2, ...)to prevent premature termination on small problems (line 783)Both improvements address issues from past reviews.
Also applies to: 782-784
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (2)
cpp/src/mip/presolve/probing_cache.cu (2)
439-441: Both probes can be infeasible; replace assert with handlingThe assert can trip on genuinely infeasible domains. Handle and propagate infeasibility instead of asserting. This was flagged previously.
- cuopt_assert(multi_probe_presolve.infeas_constraints_count_0 == 0 || - multi_probe_presolve.infeas_constraints_count_1 == 0, - "At least one of the probes must be feasible"); + if (multi_probe_presolve.infeas_constraints_count_0 > 0 && + multi_probe_presolve.infeas_constraints_count_1 > 0) { + CUOPT_LOG_WARNING("Both probes infeasible for var %d; domain infeasible", var_idx); + // Option 1: enqueue global infeasibility signal and return + return; + }
538-571: Bound-merge can produce inverted intervals; guard before applyingMerging via max(lb), min(ub) may yield lb>ub. Validate and either drop the conflicting update or clamp with tolerance before calling
update_variable_bounds.for (const auto& [var_idx, modifications] : var_bounds_modifications) { - var_indices.push_back(var_idx); - lb_values.push_back(modifications.first); - ub_values.push_back(modifications.second); + auto lb = modifications.first; + auto ub = modifications.second; + if (lb <= ub) { + var_indices.push_back(var_idx); + lb_values.push_back(lb); + ub_values.push_back(ub); + } else { + CUOPT_LOG_WARNING("Skipping conflicting bounds for var %d: [%f, %f]", var_idx, lb, ub); + } }
🧹 Nitpick comments (2)
cpp/src/mip/problem/problem.cu (1)
1088-1101: Simplify template and call-site for are_exclusive
are_exclusivedoesn’t usef_t. Drop the extra template parameter and update the call to avoid confusion.-template <typename i_t, typename f_t> -bool are_exclusive(const std::vector<i_t>& var_indices, - const std::vector<i_t>& var_to_substitude_indices) +template <typename i_t> +bool are_exclusive(const std::vector<i_t>& var_indices, + const std::vector<i_t>& var_to_substitude_indices)And:
- cuopt_assert((are_exclusive<i_t, f_t>(var_indices, var_to_substitude_indices)), + cuopt_assert((are_exclusive<i_t>(var_indices, var_to_substitude_indices)),cpp/src/mip/presolve/probing_cache.cu (1)
692-701: Trim priority list by zero counts using host data for simplicityCurrent device-side lower_bound is more complex than needed and mixes device iterators with host erase. Copy counts to host and trim by the first zero position to avoid policy/iterator pitfalls.
- // Find the index of the first 0 element in count_per_variable - auto first_zero_it = thrust::lower_bound(problem.handle_ptr->get_thrust_policy(), - count_per_variable.begin(), - count_per_variable.end(), - 0, - thrust::greater<i_t>()); - size_t first_zero_index = (first_zero_it != count_per_variable.end()) - ? thrust::distance(count_per_variable.begin(), first_zero_it) - : count_per_variable.size(); + auto h_counts = host_copy(count_per_variable); + auto first_zero_it = std::find(h_counts.begin(), h_counts.end(), 0); + size_t first_zero_index = (first_zero_it != h_counts.end()) + ? std::distance(h_counts.begin(), first_zero_it) + : h_counts.size();
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cpp/src/mip/presolve/probing_cache.cu(5 hunks)cpp/src/mip/problem/problem.cu(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-22T14:25:22.899Z
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 527
File: cpp/src/mip/diversity/lns/rins.cu:167-175
Timestamp: 2025-10-22T14:25:22.899Z
Learning: In MIP (Mixed Integer Programming) problems in the cuOPT codebase, `n_integer_vars == 0` is impossible by definition—MIP problems must have at least one integer variable. If there are no integer variables, it would be a pure Linear Programming (LP) problem, not a MIP problem.
Applied to files:
cpp/src/mip/problem/problem.cucpp/src/mip/presolve/probing_cache.cu
🧬 Code graph analysis (2)
cpp/src/mip/problem/problem.cu (1)
cpp/src/utilities/copy_helpers.hpp (16)
device_copy(237-243)device_copy(237-238)device_copy(254-260)device_copy(254-256)device_copy(271-277)device_copy(271-272)device_copy(286-303)device_copy(286-286)make_span(322-327)make_span(322-324)make_span(330-335)make_span(330-332)make_span(338-341)make_span(338-338)make_span(344-347)make_span(344-344)
cpp/src/mip/presolve/probing_cache.cu (3)
cpp/src/utilities/copy_helpers.hpp (22)
make_span(322-327)make_span(322-324)make_span(330-335)make_span(330-332)make_span(338-341)make_span(338-338)make_span(344-347)make_span(344-344)host_copy(118-125)host_copy(118-118)host_copy(136-151)host_copy(136-136)host_copy(162-165)host_copy(162-162)host_copy(175-181)host_copy(175-175)host_copy(192-198)host_copy(192-192)host_copy(209-212)host_copy(209-209)host_copy(223-226)host_copy(223-223)cpp/src/mip/problem/problem.cu (4)
integer_equal(749-752)integer_equal(749-749)view(907-944)view(907-907)cpp/src/mip/presolve/trivial_presolve.cu (2)
trivial_presolve(14-14)trivial_presolve(18-18)
🔇 Additional comments (2)
cpp/src/mip/problem/problem.cu (1)
1135-1156: Objective mass transfer: LGTMAtomic accumulation onto the surviving variable and the objective offset fix the earlier loss-of-mass bug under parallel substitution.
If double precision may run on pre-sm_60 GPUs, confirm
atomicAdd(double*, ...)availability or use a fallback (e.g., RAFT/CUB atomics).cpp/src/mip/presolve/probing_cache.cu (1)
792-798: Good integration: apply substitutions then remap via trivial presolveApplying substitutions before
trivial_presolve(problem, remap_cache_ids)aligns with the new workflow and should maximize reductions.
| bool is_binary = ub == 1 && lb == 0; | ||
| for (i_t i = 0; i < 2; ++i) { | ||
| auto& probe_val = i == 0 ? probe_vals.first : probe_vals.second; | ||
| // if binary, probe both values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Binary check should use tolerance or range-1 test
bool is_binary = (ub == 1 && lb == 0); misses near-binary cases and range-1 binaries. Use the established tolerance logic.
- bool is_binary = ub == 1 && lb == 0;
+ bool is_binary = problem.integer_equal(ub - lb, 1.) &&
+ problem.integer_equal(lb, 0.);Or, if available, consult is_binary_variable[var_idx] from the problem view.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cpp/src/mip/presolve/probing_cache.cu around lines 383 to 386, the current
binary detection uses strict equality bool is_binary = (ub == 1 && lb == 0);
which misses near-binary or range-1 cases; change it to use the established
tolerance/range-1 logic (e.g., fabs(ub - 1.0) <= tol && fabs(lb - 0.0) <= tol or
(ub - lb) <= 1 + tol) or, if available in this context, consult the problem view
flag is_binary_variable[var_idx]; update the condition to use that
tolerant/range-1 test or the is_binary_variable lookup so near-binary variables
are treated as binary.
| cub::DeviceSegmentedReduce::Reduce(d_temp_storage, | ||
| temp_storage_bytes, | ||
| input_transform_it_2, | ||
| count_per_variable.data(), | ||
| problem.n_variables, | ||
| problem.reverse_offsets.data(), | ||
| problem.reverse_offsets.data() + 1, | ||
| cuda::std::plus<>{}, | ||
| 0, | ||
| problem.handle_ptr->get_stream()); | ||
| temp_storage.resize(temp_storage_bytes, problem.handle_ptr->get_stream()); | ||
| d_temp_storage = thrust::raw_pointer_cast(temp_storage.data()); | ||
| // Run reduction | ||
| cub::DeviceSegmentedReduce::Reduce(d_temp_storage, | ||
| temp_storage_bytes, | ||
| input_transform_it_2, | ||
| count_per_variable.data(), | ||
| problem.n_variables, | ||
| problem.reverse_offsets.data(), | ||
| problem.reverse_offsets.data() + 1, | ||
| cuda::std::plus<>{}, | ||
| 0, | ||
| problem.handle_ptr->get_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUB segmented-reduce invoked twice without a fresh size query
The second reduction for count_per_variable is executed twice and reuses temp_storage_bytes computed for a different reduce. This is redundant and can be unsafe if temp storage is insufficient.
- cub::DeviceSegmentedReduce::Reduce(d_temp_storage,
- temp_storage_bytes,
- input_transform_it_2,
- count_per_variable.data(),
- problem.n_variables,
- problem.reverse_offsets.data(),
- problem.reverse_offsets.data() + 1,
- cuda::std::plus<>{},
- 0,
- problem.handle_ptr->get_stream());
-
- temp_storage.resize(temp_storage_bytes, problem.handle_ptr->get_stream());
- d_temp_storage = thrust::raw_pointer_cast(temp_storage.data());
-
- // Run reduction
- cub::DeviceSegmentedReduce::Reduce(d_temp_storage,
+ // Query temp storage size for this reduce
+ d_temp_storage = nullptr;
+ temp_storage_bytes = 0;
+ cub::DeviceSegmentedReduce::Reduce(d_temp_storage,
+ temp_storage_bytes,
+ input_transform_it_2,
+ count_per_variable.data(),
+ problem.n_variables,
+ problem.reverse_offsets.data(),
+ problem.reverse_offsets.data() + 1,
+ cuda::std::plus<>{},
+ 0,
+ problem.handle_ptr->get_stream());
+ temp_storage.resize(temp_storage_bytes, problem.handle_ptr->get_stream());
+ d_temp_storage = thrust::raw_pointer_cast(temp_storage.data());
+ // Run reduction once
+ cub::DeviceSegmentedReduce::Reduce(d_temp_storage,
temp_storage_bytes,
input_transform_it_2,
count_per_variable.data(),
problem.n_variables,
problem.reverse_offsets.data(),
problem.reverse_offsets.data() + 1,
cuda::std::plus<>{},
0,
problem.handle_ptr->get_stream());🤖 Prompt for AI Agents
cpp/src/mip/presolve/probing_cache.cu lines 650-674: the code calls
cub::DeviceSegmentedReduce::Reduce twice with the same temp_storage_bytes but
without performing a proper size-query for the actual reduction, which is
redundant and unsafe if the buffer sizes differ; change the first call to be a
size-query (pass d_temp_storage = nullptr) to populate temp_storage_bytes, then
resize temp_storage to that size and perform a single actual Reduce call (with
d_temp_storage set to raw pointer); remove the duplicated Reduce invocation and
ensure the parameters used in the size-query match the later actual call.
| const i_t dummy_substituted_variable = var_indices[0]; | ||
| cuopt_assert(var_indices.size() == var_to_substitude_indices.size(), "size mismatch"); | ||
| cuopt_assert(var_indices.size() == offset_values.size(), "size mismatch"); | ||
| cuopt_assert(var_indices.size() == coefficient_values.size(), "size mismatch"); | ||
| auto d_var_indices = device_copy(var_indices, handle_ptr->get_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against empty substitution set
dummy_substituted_variable = var_indices[0] dereferences an empty vector if called with no substitutions. Add a fast return/guard.
void problem_t<i_t, f_t>::substitute_variables(const std::vector<i_t>& var_indices,
const std::vector<i_t>& var_to_substitude_indices,
const std::vector<f_t>& offset_values,
const std::vector<f_t>& coefficient_values)
{
+ if (var_indices.empty()) { return; }🤖 Prompt for AI Agents
In cpp/src/mip/problem/problem.cu around lines 1115 to 1119, the code
unconditionally reads var_indices[0] which will crash when the substitution set
is empty; add an early guard that checks if var_indices.empty() and immediately
return (or no-op) before accessing elements or running subsequent assertions and
device_copy; keep the rest of the logic unchanged so the function safely handles
an empty substitution set without dereferencing the vector.
| [variables = make_span(variables), | ||
| coefficients = make_span(coefficients), | ||
| offsets = make_span(offsets), | ||
| objective_coefficients = make_span(objective_coefficients), | ||
| dummy_substituted_variable] __device__(i_t cstr_idx) { | ||
| i_t offset_begin = offsets[cstr_idx]; | ||
| i_t offset_end = offsets[cstr_idx + 1]; | ||
| i_t duplicate_start_idx = -1; | ||
| while (offset_begin < offset_end - 1) { | ||
| i_t var_idx = variables[offset_begin]; | ||
| i_t next_var_idx = variables[offset_begin + 1]; | ||
| if (var_idx == next_var_idx) { | ||
| if (duplicate_start_idx == -1) { duplicate_start_idx = offset_begin; } | ||
| coefficients[duplicate_start_idx] += coefficients[offset_begin + 1]; | ||
| variables[duplicate_start_idx] = variables[offset_begin + 1]; | ||
| // mark those for elimination | ||
| variables[offset_begin + 1] = dummy_substituted_variable; | ||
| coefficients[offset_begin + 1] = 0.; | ||
| objective_coefficients[next_var_idx] = 0.; | ||
| } else { | ||
| duplicate_start_idx = -1; | ||
| } | ||
| offset_begin++; | ||
| } | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not zero objective coefficient of the surviving variable during duplicate merge
Zeroing objective_coefficients[next_var_idx] drops the objective term for that variable. Duplicate elimination is a row-local operation and must not alter the objective vector. Keep zeroing only the eliminated vars (already handled later for d_var_indices), and remove this line.
Apply this diff:
// mark those for elimination
variables[offset_begin + 1] = dummy_substituted_variable;
coefficients[offset_begin + 1] = 0.;
- objective_coefficients[next_var_idx] = 0.;🤖 Prompt for AI Agents
In cpp/src/mip/problem/problem.cu around lines 1231 to 1255, the kernel
incorrectly zeros objective_coefficients[next_var_idx] when merging duplicate
entries which removes the objective term for the surviving variable; remove the
line that sets objective_coefficients[next_var_idx] = 0.0 so the objective
vector is not modified by this row-local duplicate-merge pass, and ensure only
eliminated entries (the ones marked with dummy_substituted_variable and zeroed
coefficients at offset_begin+1) are treated later when building d_var_indices.
| void problem_t<i_t, f_t>::update_variable_bounds(const std::vector<i_t>& var_indices, | ||
| const std::vector<f_t>& lb_values, | ||
| const std::vector<f_t>& ub_values) | ||
| { | ||
| if (var_indices.size() == 0) { return; } | ||
| // std::lock_guard<std::mutex> lock(problem_mutex); | ||
| cuopt_assert(var_indices.size() == lb_values.size(), "size of variable lower bound mismatch"); | ||
| cuopt_assert(var_indices.size() == ub_values.size(), "size of variable upper bound mismatch"); | ||
| auto d_var_indices = device_copy(var_indices, handle_ptr->get_stream()); | ||
| auto d_lb_values = device_copy(lb_values, handle_ptr->get_stream()); | ||
| auto d_ub_values = device_copy(ub_values, handle_ptr->get_stream()); | ||
| thrust::for_each( | ||
| handle_ptr->get_thrust_policy(), | ||
| thrust::make_counting_iterator(0), | ||
| thrust::make_counting_iterator(0) + d_var_indices.size(), | ||
| [lb_values = make_span(d_lb_values), | ||
| ub_values = make_span(d_ub_values), | ||
| variable_bounds = make_span(variable_bounds), | ||
| var_indices = make_span(d_var_indices)] __device__(auto i) { | ||
| i_t var_idx = var_indices[i]; | ||
| cuopt_assert(variable_bounds[var_idx].x <= lb_values[i], "variable lower bound violation"); | ||
| cuopt_assert(variable_bounds[var_idx].y >= ub_values[i], "variable upper bound violation"); | ||
| variable_bounds[var_idx].x = lb_values[i]; | ||
| variable_bounds[var_idx].y = ub_values[i]; | ||
| }); | ||
| handle_ptr->sync_stream(); | ||
| RAFT_CHECK_CUDA(handle_ptr->get_stream()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent inverted bounds when merging updates
When multiple updates are merged, lb can exceed ub. Enforce lb <= ub (within tolerance) before writing, otherwise throw or skip.
[lb_values = make_span(d_lb_values),
ub_values = make_span(d_ub_values),
variable_bounds = make_span(variable_bounds),
var_indices = make_span(d_var_indices)] __device__(auto i) {
i_t var_idx = var_indices[i];
- cuopt_assert(variable_bounds[var_idx].x <= lb_values[i], "variable lower bound violation");
- cuopt_assert(variable_bounds[var_idx].y >= ub_values[i], "variable upper bound violation");
+ cuopt_assert(variable_bounds[var_idx].x <= lb_values[i], "variable lower bound violation");
+ cuopt_assert(variable_bounds[var_idx].y >= ub_values[i], "variable upper bound violation");
+ // ensure non-inverted bounds
+ if (lb_values[i] > ub_values[i]) {
+ // Either swap, clamp, or signal; recommending signal to avoid silent invalid state.
+ asm("trap;"); // or set a device-side error flag
+ }
variable_bounds[var_idx].x = lb_values[i];
variable_bounds[var_idx].y = ub_values[i];
});Optionally, add a host-side sanity check after updates: check_problem_representation(false) to fail fast. As per coding guidelines.
🤖 Prompt for AI Agents
In cpp/src/mip/problem/problem.cu around lines 1861-1888, the device lambda
writes lb/ub without ensuring lb <= ub when concurrent updates are merged; add a
guard inside the CUDA device kernel to enforce lb <= ub (within a small
tolerance) before assigning variable_bounds: compute local lb_i and ub_i from
the provided spans, test if lb_i <= ub_i + tol, and if not either skip the write
or trigger an error path (e.g., set a device-side flag to indicate invalid
update or call cuopt_assert to fail), so no inverted bounds are written; after
the thrust::for_each and stream sync, optionally run the host-side sanity check
check_problem_representation(false) to fail fast and surface the problem.
We were computing the probing cache but actually were not using the probing implications to change the problem. We were just using the cache during solve. In this PR, we add implications to the problem during presolve. When a probing a range is infeasible, we fix the variable bounds to other range. If both probing ranges imply some global bounds on other variables, we update those global bounds. Also if both probings of a binary variable imply some singletons on some other variable, we can substitute that variable with the current variable(with some coefficients).
This PR also changes the variable ordering to prioritize max possible implications: number of other integers that the variable shares constraint with.
With this PR, we can add more variable substitutions since the function is ready now.
Overall there is a consistent increase in number of feasible: 225-226 compared to before: 223-224(rarely 225). Also somewhere between 0.5-1% improvement on the average objective gap.
Summary by CodeRabbit
New Features
Improvements
Chores