Skip to content

Conversation

@akifcorduk
Copy link
Contributor

@akifcorduk akifcorduk commented Nov 11, 2025

We were computing the probing cache but actually were not using the probing implications to change the problem. We were just using the cache during solve. In this PR, we add implications to the problem during presolve. When a probing a range is infeasible, we fix the variable bounds to other range. If both probing ranges imply some global bounds on other variables, we update those global bounds. Also if both probings of a binary variable imply some singletons on some other variable, we can substitute that variable with the current variable(with some coefficients).

This PR also changes the variable ordering to prioritize max possible implications: number of other integers that the variable shares constraint with.

With this PR, we can add more variable substitutions since the function is ready now.

Overall there is a consistent increase in number of feasible: 225-226 compared to before: 223-224(rarely 225). Also somewhere between 0.5-1% improvement on the average objective gap.

Summary by CodeRabbit

  • New Features

    • In-place variable substitution, atomic bound updates, and row-sorting for problem data.
    • Priority-ordered work queue for timestamped tasks.
    • Optional ID remapping during presolve.
  • Improvements

    • Probing defaults adjusted (larger share and cap) and probing moved to an optional presolve step; solver path reduces early probing.
    • Probing strengthened with per-thread queues, infeasibility reporting, and global-bound inference.
    • Recompute constraint activity before stats and ignore zero-coefficient edges.
  • Chores

    • Increased allowed compressed package size for distribution checks.

@akifcorduk akifcorduk added this to the 25.12 milestone Nov 11, 2025
@akifcorduk akifcorduk requested a review from a team as a code owner November 11, 2025 10:36
@akifcorduk akifcorduk added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Nov 11, 2025
@akifcorduk akifcorduk requested review from aliceb-nv and rg20 November 11, 2025 10:36
@coderabbitai
Copy link

coderabbitai bot commented Nov 11, 2025

Walkthrough

Reorders probing/presolve sequencing and raises default probing time; refactors probing cache to use per-thread modification/substitution queues and dual-probe bound storage; adds variable substitution, bound-update, and CSR-sorting helpers to problem_t; adds a thread-safe work-unit queue and small presolve signature/guard changes.

Changes

Cohort / File(s) Summary
Configuration & Manager
cpp/src/mip/diversity/diversity_config.hpp, cpp/src/mip/diversity/diversity_manager.cu
Defaults changed: time_ratio_of_probing_cache 0.04 → 0.1, max_time_on_probing 15.0 → 60.0. Presolve path may run a time‑bounded probing cache before trivial_presolve; solver path no longer runs that early probing.
Probing cache & flow
cpp/src/mip/presolve/probing_cache.cu
Major refactor: per-thread modification & substitution queues added and threaded through probing; bound storage split for two probes and binary awareness added; infeasibility emits queued modifications; global bound inference and substitutions added; priority computation refactored into compute_priority_indices_by_implied_integers; new helpers apply_modification_queue_to_problem, apply_substitution_queue_to_problem, compute_probing_cache; compute_cache_for_var signature updated to accept queues, timer, device id; host application of queues and sync added.
Multi-probe / presolve stats
cpp/src/mip/presolve/multi_probe.cu
bound_update_loop: when compute_stats, initialize changed constraints on both upd sets, recompute activity via calculate_activity, then call constraint_stats.
Trivial presolve & helpers
cpp/src/mip/presolve/trivial_presolve.cu, cpp/src/mip/presolve/trivial_presolve.cuh, cpp/src/mip/presolve/trivial_presolve_helpers.cuh
trivial_presolve gains remap_cache_ids parameter (default false) and forwards it to update_from_csr; update_from_csr signature updated to accept remap_cache_ids and optionally remap original/reverse IDs; is_variable_free_t::operator() now ignores zero-coefficient edges; explicit template instantiations updated.
Problem class extensions
cpp/src/mip/problem/problem.cuh, cpp/src/mip/problem/problem.cu
Added problem_t methods: substitute_variables(...) (variable substitution with objective offset/coefficient updates and deduplication), update_variable_bounds(...) (validated atomic bound updates and device copies), and sort_rows_by_variables(const raft::handle_t*) (CSR row sorting helper); implementations added in .cu.
Problem utilities / cuSPARSE
cpp/src/mip/problem/problem_helpers.cuh
Added check_cusparse_status and csrsort_cusparse to sort CSR by column using cuSPARSE and thrust::gather; relaxed CSR index validation to allow -1; manages cuSPARSE handle/descriptor/buffer lifecycle and performs post-sort validation.
New utility queue
cpp/src/mip/utilities/work_unit_ordered_queue.cuh
New header-only cuopt::work_unit_ordered_queue_t<T>: mutex-guarded min-heap keyed by float timestamp with push/pop/top/clear/size APIs and entry_t typedef.
Minor packaging
python/libcuopt/pyproject.toml
Increased pydistcheck max_allowed_size_compressed from 595M to 605M.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–75 minutes

  • Areas needing extra attention:
    • cpp/src/mip/presolve/probing_cache.cu — concurrency, correctness of per-thread modification/substitution queues, signature propagation across callers, and correctness of applying queued modifications/substitutions.
    • cpp/src/mip/problem/problem_helpers.cuh — cuSPARSE handle/buffer lifecycle, permutation application via thrust::gather, and post-sort CSR validation.
    • cpp/src/mip/problem/problem.cu/.cuh — correctness and atomicity of substitute_variables and update_variable_bounds, objective offset updates, and device/host synchronization.
    • Callsite updates for trivial_presolve(..., remap_cache_ids) and updated compute_cache_for_var parameter list across translation units.

Poem

🐇 I nudged the queues where two probes meet,
Scoped time to probe before the tidy sweep,
I queued a bound, then rewired a name,
Sorted rows and kept the solver tame,
Hops and tweaks — the hare is pleased.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately captures the primary objectives: it adds probing implications to presolve and improves variable ordering for better implications.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc0caf5 and 8744dbc.

📒 Files selected for processing (12)
  • cpp/src/mip/diversity/diversity_config.hpp (1 hunks)
  • cpp/src/mip/diversity/diversity_manager.cu (1 hunks)
  • cpp/src/mip/local_search/rounding/constraint_prop.cu (1 hunks)
  • cpp/src/mip/presolve/multi_probe.cu (1 hunks)
  • cpp/src/mip/presolve/probing_cache.cu (5 hunks)
  • cpp/src/mip/presolve/trivial_presolve.cu (1 hunks)
  • cpp/src/mip/presolve/trivial_presolve.cuh (3 hunks)
  • cpp/src/mip/presolve/trivial_presolve_helpers.cuh (1 hunks)
  • cpp/src/mip/problem/problem.cu (4 hunks)
  • cpp/src/mip/problem/problem.cuh (1 hunks)
  • cpp/src/mip/problem/problem_helpers.cuh (3 hunks)
  • cpp/src/mip/utilities/work_unit_ordered_queue.cuh (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-22T14:25:22.899Z
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 527
File: cpp/src/mip/diversity/lns/rins.cu:167-175
Timestamp: 2025-10-22T14:25:22.899Z
Learning: In MIP (Mixed Integer Programming) problems in the cuOPT codebase, `n_integer_vars == 0` is impossible by definition—MIP problems must have at least one integer variable. If there are no integer variables, it would be a pure Linear Programming (LP) problem, not a MIP problem.

Applied to files:

  • cpp/src/mip/problem/problem.cu
  • cpp/src/mip/problem/problem_helpers.cuh
🧬 Code graph analysis (5)
cpp/src/mip/local_search/rounding/constraint_prop.cu (1)
cpp/src/utilities/copy_helpers.hpp (4)
  • get_lower (107-110)
  • get_lower (107-107)
  • get_upper (113-116)
  • get_upper (113-113)
cpp/src/mip/presolve/multi_probe.cu (2)
cpp/src/linear_programming/pdlp.cu (2)
  • compute_stats (843-867)
  • compute_stats (843-846)
cpp/src/mip/presolve/bounds_presolve.cu (2)
  • calculate_activity (103-113)
  • calculate_activity (103-103)
cpp/src/mip/problem/problem.cu (1)
cpp/src/utilities/copy_helpers.hpp (16)
  • device_copy (247-253)
  • device_copy (247-248)
  • device_copy (264-270)
  • device_copy (264-266)
  • device_copy (281-287)
  • device_copy (281-282)
  • device_copy (296-313)
  • device_copy (296-296)
  • make_span (332-337)
  • make_span (332-334)
  • make_span (340-345)
  • make_span (340-342)
  • make_span (348-351)
  • make_span (348-348)
  • make_span (354-357)
  • make_span (354-354)
cpp/src/mip/diversity/diversity_manager.cu (2)
cpp/src/mip/presolve/probing_cache.cu (2)
  • compute_probing_cache (708-803)
  • compute_probing_cache (708-710)
cpp/src/mip/presolve/lb_probing_cache.cu (2)
  • compute_probing_cache (320-406)
  • compute_probing_cache (320-322)
cpp/src/mip/presolve/probing_cache.cu (3)
cpp/src/utilities/copy_helpers.hpp (22)
  • make_span (332-337)
  • make_span (332-334)
  • make_span (340-345)
  • make_span (340-342)
  • make_span (348-351)
  • make_span (348-348)
  • make_span (354-357)
  • make_span (354-354)
  • host_copy (128-135)
  • host_copy (128-128)
  • host_copy (146-161)
  • host_copy (146-146)
  • host_copy (172-175)
  • host_copy (172-172)
  • host_copy (185-191)
  • host_copy (185-185)
  • host_copy (202-208)
  • host_copy (202-202)
  • host_copy (219-222)
  • host_copy (219-219)
  • host_copy (233-236)
  • host_copy (233-233)
cpp/src/mip/problem/problem.cu (2)
  • view (917-954)
  • view (917-917)
cpp/src/mip/presolve/trivial_presolve.cu (2)
  • trivial_presolve (24-24)
  • trivial_presolve (28-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, arm64, rockylinux8
  • GitHub Check: checks / check-style

Comment on lines +342 to +376
cusparseCreate(&handle);
cusparseSetStream(handle, stream);

i_t nnz = values.size();
i_t m = rows;
i_t n = cols;

cusparseMatDescr_t matA;
cusparseCreateMatDescr(&matA);
cusparseSetMatIndexBase(matA, CUSPARSE_INDEX_BASE_ZERO);
cusparseSetMatType(matA, CUSPARSE_MATRIX_TYPE_GENERAL);

size_t pBufferSizeInBytes = 0;
check_cusparse_status(cusparseXcsrsort_bufferSizeExt(
handle, m, n, nnz, offsets.data(), indices.data(), &pBufferSizeInBytes));
rmm::device_uvector<uint8_t> pBuffer(pBufferSizeInBytes, stream);
cuopt_assert(((intptr_t)pBuffer.data() % 128) == 0,
"CUSPARSE buffer size is not aligned to 128 bytes");
rmm::device_uvector<i_t> P(nnz, stream);
thrust::sequence(handle_ptr->get_thrust_policy(), P.begin(), P.end());

check_cusparse_status(cusparseXcsrsort(
handle, m, n, nnz, matA, offsets.data(), indices.data(), P.data(), pBuffer.data()));

// apply the permutation to the values
rmm::device_uvector<f_t> values_sorted(nnz, stream);
thrust::gather(
handle_ptr->get_thrust_policy(), P.begin(), P.end(), values.begin(), values_sorted.begin());
thrust::copy(
handle_ptr->get_thrust_policy(), values_sorted.begin(), values_sorted.end(), values.begin());

cusparseDestroyMatDescr(matA);
cusparseDestroy(handle);

check_csr_representation(values, offsets, indices, handle_ptr, cols, rows);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Always check cuSPARSE return codes

cusparseCreate, cusparseSetStream, cusparseCreateMatDescr, and the various cusparseSet* / cusparseDestroy* calls all return a status. Right now we ignore those, so if any allocation fails (OOM, bad handle reuse, etc.) we keep running with an invalid handle/descriptor and the later sort call will UB or misbehave before the guarded cusparseXcsrsort. Please route every cuSPARSE invocation through check_cusparse_status so we fail fast and cleanly.

-  cusparseCreate(&handle);
-  cusparseSetStream(handle, stream);
+  check_cusparse_status(cusparseCreate(&handle));
+  check_cusparse_status(cusparseSetStream(handle, stream));
...
-  cusparseCreateMatDescr(&matA);
-  cusparseSetMatIndexBase(matA, CUSPARSE_INDEX_BASE_ZERO);
-  cusparseSetMatType(matA, CUSPARSE_MATRIX_TYPE_GENERAL);
+  check_cusparse_status(cusparseCreateMatDescr(&matA));
+  check_cusparse_status(cusparseSetMatIndexBase(matA, CUSPARSE_INDEX_BASE_ZERO));
+  check_cusparse_status(cusparseSetMatType(matA, CUSPARSE_MATRIX_TYPE_GENERAL));
...
-  cusparseDestroyMatDescr(matA);
-  cusparseDestroy(handle);
+  check_cusparse_status(cusparseDestroyMatDescr(matA));
+  check_cusparse_status(cusparseDestroy(handle));
🤖 Prompt for AI Agents
In cpp/src/mip/problem/problem_helpers.cuh around lines 342-376 the code calls
several cuSPARSE APIs (cusparseCreate, cusparseSetStream,
cusparseCreateMatDescr, cusparseSetMatIndexBase, cusparseSetMatType,
cusparseDestroyMatDescr, cusparseDestroy and any other cusparseSet*/Destroy*
calls) without checking their return status; update the code to route every
cuSPARSE invocation through check_cusparse_status (e.g.
check_cusparse_status(cusparseCreate(&handle)); etc.) so failures are detected
immediately and handled cleanly before proceeding to calls like
cusparseXcsrsort. Ensure all existing cusparse calls in this block and their
corresponding destroy calls are wrapped with check_cusparse_status.

Comment on lines 1112 to 1160
cuopt_assert(var_indices.size() == coefficient_values.size(), "size mismatch");
auto d_var_indices = device_copy(var_indices, handle_ptr->get_stream());
auto d_var_to_substitude_indices =
device_copy(var_to_substitude_indices, handle_ptr->get_stream());
auto d_offset_values = device_copy(offset_values, handle_ptr->get_stream());
auto d_coefficient_values = device_copy(coefficient_values, handle_ptr->get_stream());
fixing_helpers.reduction_in_rhs.resize(n_constraints, handle_ptr->get_stream());
fixing_helpers.variable_fix_mask.resize(n_variables, handle_ptr->get_stream());
thrust::fill(handle_ptr->get_thrust_policy(),
fixing_helpers.reduction_in_rhs.begin(),
fixing_helpers.reduction_in_rhs.end(),
0);
thrust::fill(handle_ptr->get_thrust_policy(),
fixing_helpers.variable_fix_mask.begin(),
fixing_helpers.variable_fix_mask.end(),
-1);

rmm::device_scalar<f_t> objective_offset(0., handle_ptr->get_stream());
thrust::for_each(handle_ptr->get_thrust_policy(),
thrust::make_counting_iterator(0),
thrust::make_counting_iterator(0) + d_var_indices.size(),
[variable_fix_mask = make_span(fixing_helpers.variable_fix_mask),
var_indices = make_span(d_var_indices),
n_variables = n_variables,
substitute_coefficient = make_span(d_coefficient_values),
substitute_offset = make_span(d_offset_values),
var_to_substitude_indices = make_span(d_var_to_substitude_indices),
objective_coefficients = make_span(objective_coefficients),
objective_offset = objective_offset.data()] __device__(i_t idx) {
i_t var_idx = var_indices[idx];
i_t substituting_var_idx = var_to_substitude_indices[idx];
variable_fix_mask[var_idx] = idx;
f_t objective_offset_difference =
objective_coefficients[var_idx] * substitute_offset[idx];
atomicAdd(objective_offset, objective_offset_difference);
objective_coefficients[var_idx] +=
objective_coefficients[var_idx] * substitute_coefficient[idx];
});
f_t objective_offset_value = objective_offset.value(handle_ptr->get_stream());
presolve_data.objective_offset += objective_offset_value;
const i_t num_segments = n_constraints;
f_t initial_value{0.};

auto input_transform_it = thrust::make_transform_iterator(
thrust::make_counting_iterator(0),
[coefficients = make_span(coefficients),
variables = make_span(variables),
variable_fix_mask = make_span(fixing_helpers.variable_fix_mask),
substitute_coefficient = make_span(d_coefficient_values),
Copy link

@coderabbitai coderabbitai bot Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Objective mass is lost during substitution

When we substitute var_indices[idx] = offset + coeff * var_to_substitude, the objective term c_i * var_indices[idx] must become c_i * offset (constant) plus c_i * coeff * var_to_substitude. The current code adds c_i * offset but never transfers c_i * coeff to objective_coefficients[var_to_substitude_indices[idx]]; instead, it briefly adds it to the eliminated variable and later zeroes that coefficient when removing duplicates. As a result, the contribution c_i * coeff * var_to_substitude disappears entirely, yielding an incorrect objective after substitution. Please accumulate the scaled coefficient onto the surviving variable (and then zero the eliminated one) so that the objective remains equivalent. (en.wikipedia.org)

🤖 Prompt for AI Agents
In cpp/src/mip/problem/problem.cu around lines 1112-1160, the objective
contribution from substituting var_indices[idx] = offset + coeff *
var_to_substitude is lost because the code adds the scaled coefficient back onto
the eliminated variable instead of the surviving variable. Replace the current
in-kernel update that does objective_coefficients[var_idx] +=
objective_coefficients[var_idx] * substitute_coefficient[idx]; with an atomic
accumulation of the scaled coefficient onto the surviving variable
(atomicAdd(&objective_coefficients[substituting_var_idx], scaled_value)) and
then ensure the eliminated variable's objective coefficient is set to zero
(atomicExch or assignment after kernel) so duplicates removal doesn't drop the
contribution; use the appropriate device atomic for f_t and keep
objective_offset atomicAdd as-is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I handled your review, can you check again?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Extract the substitute_variables method to check the current implementation
# Focus on the objective coefficient update logic

# Show the substitute_variables method implementation
sed -n '/^void problem_t<i_t, f_t>::substitute_variables/,/^}/p' cpp/src/mip/problem/problem.cu | head -100

Length of output: 5888


🏁 Script executed:

#!/bin/bash
# Check if the eliminated variable's objective coefficient is zeroed out
# Look for the part where duplicates are removed and objective coefficients are zeroed

sed -n '/substitute_variables/,/^}/p' cpp/src/mip/problem/problem.cu | grep -A 5 -B 5 "objective_coefficients.*= 0"

Length of output: 633


@akifcorduk, great progress! The fix is mostly correct now. Line 1147 properly transfers the scaled coefficient to the surviving variable (substituting_var_idx). However, there are a few remaining issues:

Issues to address:

  1. Remove debug printf on line 1109:

    printf("dummy_substituted_variable %d\n", dummy_substituted_variable);
  2. Zero out eliminated variable's objective coefficient: After the first thrust::for_each kernel (around line 1150), you should explicitly zero out objective_coefficients[var_idx] for all eliminated variables. Currently, the coefficient is only zeroed during duplicate removal, but if the eliminated variable doesn't create duplicates, its coefficient remains non-zero. Add this after line 1150:

    // Zero out objective coefficients for eliminated variables
    thrust::for_each(handle_ptr->get_thrust_policy(),
                     d_var_indices.begin(),
                     d_var_indices.end(),
                     [objective_coefficients = make_span(objective_coefficients)] __device__(i_t var_idx) {
                       objective_coefficients[var_idx] = 0.;
                     });
  3. Thread safety consideration: Line 1147 uses += which is not thread-safe if multiple variables substitute to the same target. Consider using atomicAdd instead:

    atomicAdd(&objective_coefficients[substituting_var_idx], 
              objective_coefficients[var_idx] * substitute_coefficient[idx]);

The core logic is now correct—you're successfully preserving the objective's mathematical equivalence after substitution! 🎉


🧬 Code Graph Analysis Results

[cpp/src/mip/problem/problem.cu] lines 100-150
template <typename i_t, typename f_t> i_t problem_t<i_t, f_t>::get_n_binary_variables()
{
n_binary_vars = thrust::count_if(handle_ptr->get_thrust_policy(),
is_binary_variable.begin(),
is_binary_variable.end(),
cuda::std::identity{});
return n_binary_vars;
}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
cpp/src/mip/diversity/diversity_manager.cu (2)

18-18: Consider making fj_only_run a runtime configuration.

The compile-time constant limits flexibility. If different problem instances might benefit from skipping probing cache during presolve, consider making this a runtime configuration parameter in diversity_config.


188-191: Add early exit when insufficient time remains for probing.

The concern is valid. Although compute_probing_cache internally guards with timer.check_time_limit() and exits immediately, it still incurs function call and parallel setup overhead. When time_for_probing_cache is zero or negative (which can occur if time_limit is small or nearly exhausted), the function would exit immediately on its first check without performing any useful work.

Following the established codebase pattern (seen throughout diverse_solver.hpp), add a guard at the call site to skip the function entirely when time budget is insufficient:

if (time_for_probing_cache > 0.0) {
  timer_t probing_timer{time_for_probing_cache};
  compute_probing_cache(ls.constraint_prop.bounds_update, *problem_ptr, probing_timer);
}

This avoids unnecessary overhead and aligns with how time-limited operations are handled elsewhere in the codebase.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8744dbc and 33a67d0.

📒 Files selected for processing (2)
  • cpp/src/mip/diversity/diversity_config.hpp (1 hunks)
  • cpp/src/mip/diversity/diversity_manager.cu (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • cpp/src/mip/diversity/diversity_config.hpp
🧰 Additional context used
🧬 Code graph analysis (1)
cpp/src/mip/diversity/diversity_manager.cu (2)
cpp/src/mip/presolve/probing_cache.cu (2)
  • compute_probing_cache (698-793)
  • compute_probing_cache (698-700)
cpp/src/mip/presolve/lb_probing_cache.cu (2)
  • compute_probing_cache (310-396)
  • compute_probing_cache (310-312)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: conda-cpp-build / 12.9.1, 3.10, amd64, rockylinux8
  • GitHub Check: conda-cpp-build / 13.0.2, 3.10, amd64, rockylinux8
  • GitHub Check: conda-cpp-build / 12.9.1, 3.10, arm64, rockylinux8
  • GitHub Check: conda-cpp-build / 13.0.2, 3.10, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-sh-client / 13.0.2, 3.10, amd64, rockylinux8
🔇 Additional comments (1)
cpp/src/mip/diversity/diversity_manager.cu (1)

191-193: The double invocation of trivial_presolve is intentional, not redundant.

The two calls serve different purposes:

  • Inside compute_probing_cache (probing_cache.cu:790): trivial_presolve(problem, true) remaps cache IDs
  • After compute_probing_cache returns (diversity_manager.cu:193): trivial_presolve(*problem_ptr) performs cleanup without remapping (remap_cache_ids defaults to false)

The different parameter values indicate distinct responsibilities—the first call handles cache ID remapping as part of probing, while the second performs final presolve operations. No changes needed.

Note: The review references line 793, but the actual call is at line 790 in probing_cache.cu.

if (!problem_ptr->empty && !check_bounds_sanity(*problem_ptr)) { return false; }
}
if (!fj_only_run) {
// before probing cache or LP, run FJ to generate initial primal feasible solution
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix misleading comment.

The comment states "run FJ" but the code executes probing cache computation, not FJ (Feasibility Jump).

Apply this diff to correct the comment:

-  // before probing cache or LP, run FJ to generate initial primal feasible solution
+  // Run probing cache before trivial presolve to discover variable implications
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// before probing cache or LP, run FJ to generate initial primal feasible solution
// Run probing cache before trivial presolve to discover variable implications
🤖 Prompt for AI Agents
In cpp/src/mip/diversity/diversity_manager.cu around line 185, the comment
incorrectly says "run FJ" while the code actually performs probing cache
computation; update the comment text to accurately describe that probing cache
computation (or probing) is executed before probing the cache or LP to generate
an initial primal feasible solution so the comment reflects the implemented
action.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
cpp/src/mip/problem/problem.cu (2)

1120-1139: Fix lost objective mass and race during substitution updates

We’re still dropping objective contribution when substituting: multiple substitutions targeting the same survivor race on objective_coefficients[substituting_var_idx], and the eliminated variable’s coefficient stays non-zero, so its term survives in the objective. This is the same bug flagged earlier—the data race plus the lingering coefficient wipe out the intended c_i * coeff * x_sub. Please atomically accumulate onto the surviving variable and zero the eliminated one.

-                    f_t objective_offset_difference =
-                      objective_coefficients[var_idx] * substitute_offset[idx];
-                    atomicAdd(objective_offset, objective_offset_difference);
-                    objective_coefficients[substituting_var_idx] +=
-                      objective_coefficients[var_idx] * substitute_coefficient[idx];
+                    f_t coeff = objective_coefficients[var_idx];
+                    f_t objective_offset_difference = coeff * substitute_offset[idx];
+                    atomicAdd(objective_offset, objective_offset_difference);
+                    atomicAdd(objective_coefficients.data() + substituting_var_idx,
+                              coeff * substitute_coefficient[idx]);
+                    objective_coefficients[var_idx] = 0.;

1226-1233: Do not zero the surviving variable’s objective coefficient

This line zeros objective_coefficients[next_var_idx], i.e. the coefficient you just transferred mass onto. The surviving variable loses all objective weight, reintroducing the bug from the earlier review. Drop this assignment—the eliminated variable is already nulled out by the fix above.

-                         variables[offset_begin + 1]          = dummy_substituted_variable;
-                         coefficients[offset_begin + 1]       = 0.;
-                         objective_coefficients[next_var_idx] = 0.;
+                         variables[offset_begin + 1]    = dummy_substituted_variable;
+                         coefficients[offset_begin + 1] = 0.;
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 33a67d0 and d772347.

📒 Files selected for processing (2)
  • cpp/src/mip/presolve/probing_cache.cu (5 hunks)
  • cpp/src/mip/problem/problem.cu (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-22T14:25:22.899Z
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 527
File: cpp/src/mip/diversity/lns/rins.cu:167-175
Timestamp: 2025-10-22T14:25:22.899Z
Learning: In MIP (Mixed Integer Programming) problems in the cuOPT codebase, `n_integer_vars == 0` is impossible by definition—MIP problems must have at least one integer variable. If there are no integer variables, it would be a pure Linear Programming (LP) problem, not a MIP problem.

Applied to files:

  • cpp/src/mip/problem/problem.cu
🧬 Code graph analysis (2)
cpp/src/mip/problem/problem.cu (1)
cpp/src/utilities/copy_helpers.hpp (16)
  • device_copy (237-243)
  • device_copy (237-238)
  • device_copy (254-260)
  • device_copy (254-256)
  • device_copy (271-277)
  • device_copy (271-272)
  • device_copy (286-303)
  • device_copy (286-286)
  • make_span (322-327)
  • make_span (322-324)
  • make_span (330-335)
  • make_span (330-332)
  • make_span (338-341)
  • make_span (338-338)
  • make_span (344-347)
  • make_span (344-344)
cpp/src/mip/presolve/probing_cache.cu (3)
cpp/src/utilities/copy_helpers.hpp (22)
  • make_span (322-327)
  • make_span (322-324)
  • make_span (330-335)
  • make_span (330-332)
  • make_span (338-341)
  • make_span (338-338)
  • make_span (344-347)
  • make_span (344-344)
  • host_copy (118-125)
  • host_copy (118-118)
  • host_copy (136-151)
  • host_copy (136-136)
  • host_copy (162-165)
  • host_copy (162-162)
  • host_copy (175-181)
  • host_copy (175-175)
  • host_copy (192-198)
  • host_copy (192-192)
  • host_copy (209-212)
  • host_copy (209-209)
  • host_copy (223-226)
  • host_copy (223-223)
cpp/src/mip/problem/problem.cu (2)
  • view (907-944)
  • view (907-907)
cpp/src/mip/presolve/trivial_presolve.cu (2)
  • trivial_presolve (14-14)
  • trivial_presolve (18-18)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
cpp/src/mip/presolve/probing_cache.cu (1)

686-693: Critical: std::find on device memory causes undefined behavior.

count_per_variable is a device_uvector, so its iterators point to GPU memory. Calling std::find on device iterators from host code is undefined behavior and will either crash or return garbage.

Copy count_per_variable to host first, then find the cutoff:

   auto h_priority_indices = host_copy(priority_indices);
+  auto h_counts           = host_copy(count_per_variable);
   problem.handle_ptr->sync_stream();
-  // Find the index of the first 0 element in h_priority_indices
-  auto first_zero_it      = std::find(count_per_variable.begin(), count_per_variable.end(), 0);
+  // Find the index of the first 0 element in counts
+  auto first_zero_it      = std::find(h_counts.begin(), h_counts.end(), 0);
   size_t first_zero_index = (first_zero_it != count_per_variable.end())
-                              ? std::distance(count_per_variable.begin(), first_zero_it)
-                              : count_per_variable.size();
+                              ? std::distance(h_counts.begin(), first_zero_it)
+                              : h_counts.size();
   h_priority_indices.erase(h_priority_indices.begin() + first_zero_index, h_priority_indices.end());

Note: This is the same issue flagged in the past review comment for lines 686-694, indicating it was not fully addressed.

Based on past review comment

🧹 Nitpick comments (6)
cpp/src/mip/presolve/probing_cache.cu (6)

499-500: Expensive assertions in hot path.

These bound-validation assertions execute once per variable per probe in a multi-threaded loop. In debug builds this adds significant overhead; in release builds they vanish, hiding potential bugs.

Consider replacing with conditional error handling that logs and gracefully degrades:

-    cuopt_assert(h_var_bounds[i].x <= lower_bound, "lower bound violation");
-    cuopt_assert(h_var_bounds[i].y >= upper_bound, "upper bound violation");
+    if (h_var_bounds[i].x > lower_bound || h_var_bounds[i].y < upper_bound) {
+      CUOPT_LOG_WARNING("Bound violation for var %d: inferred [%f,%f] vs current [%f,%f]",
+                        i, lower_bound, upper_bound, h_var_bounds[i].x, h_var_bounds[i].y);
+      continue;  // Skip this update
+    }

568-597: First-substitution-wins policy may be suboptimal.

When multiple substitutions target the same variable, the code arbitrarily keeps the first one (line 578). If later substitutions have better coefficients or fewer dependencies, this heuristic might miss opportunities.

Consider documenting this policy explicitly or evaluating substitutions by a quality metric (e.g., smallest coefficient magnitude, fewest cascading dependencies).


720-722: Modification/substitution vectors accumulate across steps.

The per-thread modification_vector_pool and substitution_vector_pool are never cleared after apply_modification_queue_to_problem (line 771) or between steps. While modifications are applied incrementally, the vectors continue to grow, holding stale entries.

Clear the modification pool after applying, or document if accumulation across steps is intentional for replay/debugging:

     {
       // TODO when we have determinism, check current threads work/time counter and filter queue
       // items that are smaller or equal to that
       apply_modification_queue_to_problem(modification_vector_pool, problem);
+      // Clear applied modifications
+      for (auto& vec : modification_vector_pool) { vec.clear(); }

736-736: Hardcoded step size may not suit all problem scales.

The step size is fixed at min(2048, priority_indices.size()). For very large problems (millions of variables) this creates many sequential steps, reducing parallelism. For tiny problems it's a no-op but adds loop overhead.

Consider scaling step size by available parallelism or making it configurable:

-  const size_t step_size = min((size_t)2048, priority_indices.size());
+  const size_t step_size = std::min(priority_indices.size(),
+                                     std::max((size_t)256, priority_indices.size() / (max_threads * 4)));

728-728: Unused compute_stats flag.

compute_stats is set to true for each worker, but the stats are never collected, aggregated, or logged. This suggests incomplete instrumentation.

Do you want me to generate code to collect and report probing statistics (e.g., infeasibility counts, bound improvements per probe, wall-clock time per variable)?


704-705: Commented-out call to old priority computation.

The old compute_prioritized_integer_indices path is commented out. If it's deprecated, remove it; if it's an alternative heuristic, document when to use which.

Remove the commented line or convert it to a configuration option if both heuristics have merit.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d772347 and 2baeae1.

📒 Files selected for processing (1)
  • cpp/src/mip/presolve/probing_cache.cu (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
cpp/src/mip/presolve/probing_cache.cu (3)
cpp/src/utilities/copy_helpers.hpp (22)
  • make_span (322-327)
  • make_span (322-324)
  • make_span (330-335)
  • make_span (330-332)
  • make_span (338-341)
  • make_span (338-338)
  • make_span (344-347)
  • make_span (344-344)
  • host_copy (118-125)
  • host_copy (118-118)
  • host_copy (136-151)
  • host_copy (136-136)
  • host_copy (162-165)
  • host_copy (162-162)
  • host_copy (175-181)
  • host_copy (175-175)
  • host_copy (192-198)
  • host_copy (192-192)
  • host_copy (209-212)
  • host_copy (209-209)
  • host_copy (223-226)
  • host_copy (223-223)
cpp/src/mip/problem/problem.cu (2)
  • view (907-944)
  • view (907-907)
cpp/src/mip/presolve/trivial_presolve.cu (2)
  • trivial_presolve (14-14)
  • trivial_presolve (18-18)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-sh-client / 13.0.2, 3.10, amd64, rockylinux8
🔇 Additional comments (5)
cpp/src/mip/presolve/probing_cache.cu (5)

9-9: LGTM!

The new includes support the refactored probing workflow. trivial_presolve.cuh is invoked at line 790, and work_unit_ordered_queue.cuh aligns with the new per-thread work-unit processing described in the PR objectives.

Also applies to: 13-13


534-566: LGTM!

The modification queue correctly merges per-thread updates by taking the intersection of bounds (max of lower, min of upper), ensuring conservative tightening when multiple threads propose conflicting modifications for the same variable.


381-382: ****

The codebase does not contain a problem.is_binary_variable field, and the current implementation is semantically correct. The is_binary check at line 382 appropriately distinguishes between variables with exact [0,1] bounds versus any variable with bound width of 1. This distinction is intentional and properly explained by the comment—the variable substitution logic at line 513 requires specifically binary-constrained variables [0,1], not just any unit-interval variable. The comment already clarifies this nuance.

Likely an incorrect or invalid review comment.


771-771: The review comment is incorrect and based on a false premise.

Modifications in apply_modification_queue_to_problem only update variable bounds via problem.update_variable_bounds() — they do not eliminate variables or change variable indices. The tuple structure <time, var_idx, lb, ub> confirms modifications only tighten bounds.

Since modifications cannot eliminate variables, substitutions applied later (line 785) safely reference the indices collected during the loop. The indices remain valid. The asymmetry between modifications (in-loop) and substitutions (after-loop) poses no risk of stale references.

trivial_presolve with remap_cache_ids=true handles any subsequent variable elimination and remapping after substitutions are complete.

Likely an incorrect or invalid review comment.


646-670: The review comment is incorrect. The code properly handles CUB's runtime buffer sizing.

CUB's temp_storage_bytes requirement depends on the operator, data type, segment layout, number of segments and device configuration, with no closed-form formula. The two reductions operate on fundamentally different segment structures:

  • Reduction 1: n_constraints segments using problem.offsets
  • Reduction 2: n_variables segments using problem.reverse_offsets

Since segment count and layout differ, the buffer requirements are independent and can differ. The resize at line 657 is necessary, not redundant. The code correctly implements CUB's pattern: call with d_temp_storage == nullptr to query the requirement, allocate, then call again with the allocated storage.

Likely an incorrect or invalid review comment.

Comment on lines +438 to +440
cuopt_assert(multi_probe_presolve.infeas_constraints_count_0 == 0 ||
multi_probe_presolve.infeas_constraints_count_1 == 0,
"At least one of the probes must be feasible");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Unsafe assumption: both probes can be infeasible simultaneously.

The assertion requires at least one probe to be feasible, but if the variable's domain is genuinely infeasible (e.g., due to conflicting constraints discovered during probing), both probes can fail. This assertion will crash in production builds or be silently ignored in release builds, masking a real infeasibility.

Replace the assertion with proper infeasibility propagation:

-  cuopt_assert(multi_probe_presolve.infeas_constraints_count_0 == 0 ||
-                 multi_probe_presolve.infeas_constraints_count_1 == 0,
-               "At least one of the probes must be feasible");
+  if (multi_probe_presolve.infeas_constraints_count_0 > 0 &&
+      multi_probe_presolve.infeas_constraints_count_1 > 0) {
+    CUOPT_LOG_WARNING("Both probes infeasible for var %d - variable domain is infeasible", var_idx);
+    // Signal global infeasibility or handle as needed
+    return;
+  }
🤖 Prompt for AI Agents
In cpp/src/mip/presolve/probing_cache.cu around lines 438 to 440, the code
currently asserts that at least one probe is feasible which is unsafe because
both probes can legitimately be infeasible; replace the assertion with proper
infeasibility propagation by detecting the case where both
infeasibilities_count_0 and infeasibilities_count_1 are non‑zero, mark the
variable/branch as globally infeasible (propagate infeasibility to the
presolver/solver state or return an error code), and otherwise continue handling
the single-feasible-probe path; ensure you log or record the infeasibility cause
and avoid using an assertion so production builds don’t crash or silently ignore
the condition.

@akifcorduk akifcorduk requested a review from a team as a code owner November 11, 2025 12:04
@akifcorduk akifcorduk requested a review from vyasr November 11, 2025 12:04
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
cpp/src/mip/presolve/probing_cache.cu (2)

439-441: Replace assertion with proper infeasibility propagation.

The assertion requires at least one probe to be feasible, but both probes can legitimately fail when the variable's domain is genuinely infeasible due to conflicting constraints discovered during probing. This assertion will crash in debug builds or be silently ignored in release builds, masking real infeasibility.

Replace the assertion with proper infeasibility detection and propagation:

-  cuopt_assert(multi_probe_presolve.infeas_constraints_count_0 == 0 ||
-                 multi_probe_presolve.infeas_constraints_count_1 == 0,
-               "At least one of the probes must be feasible");
+  if (multi_probe_presolve.infeas_constraints_count_0 > 0 &&
+      multi_probe_presolve.infeas_constraints_count_1 > 0) {
+    CUOPT_LOG_WARNING("Both probes infeasible for var %d - variable domain is infeasible", var_idx);
+    // Signal global infeasibility or propagate to caller
+    return;
+  }

Based on past review comments.


658-671: Remove redundant segmented reduce call.

Lines 647-656 and 662-671 call cub::DeviceSegmentedReduce::Reduce with identical parameters, both writing to count_per_variable. The second call simply overwrites the result of the first, making lines 647-656 redundant.

Remove the redundant first reduce call:

   cub::DeviceSegmentedReduce::Reduce(d_temp_storage,
                                      temp_storage_bytes,
                                      input_transform_it_2,
                                      count_per_variable.data(),
                                      problem.n_variables,
                                      problem.reverse_offsets.data(),
                                      problem.reverse_offsets.data() + 1,
                                      cuda::std::plus<>{},
                                      0,
                                      problem.handle_ptr->get_stream());
-
-  temp_storage.resize(temp_storage_bytes, problem.handle_ptr->get_stream());
-  d_temp_storage = thrust::raw_pointer_cast(temp_storage.data());
-
-  // Run reduction
-  cub::DeviceSegmentedReduce::Reduce(d_temp_storage,
-                                     temp_storage_bytes,
-                                     input_transform_it_2,
-                                     count_per_variable.data(),
-                                     problem.n_variables,
-                                     problem.reverse_offsets.data(),
-                                     problem.reverse_offsets.data() + 1,
-                                     cuda::std::plus<>{},
-                                     0,
-                                     problem.handle_ptr->get_stream());
cpp/src/mip/problem/problem.cu (1)

1119-1138: Zero out eliminated variables' objective coefficients.

The kernel correctly transfers the scaled objective coefficient to the surviving variable using atomicAdd (lines 1136-1137). However, the eliminated variable's objective coefficient (objective_coefficients[var_idx]) is never explicitly zeroed unless it creates duplicates (see line 1232). If an eliminated variable doesn't produce duplicates in any constraint, its coefficient remains non-zero and will incorrectly affect the objective value.

Add this kernel after line 1138 to zero out all eliminated variables' objective coefficients:

+  thrust::for_each(handle_ptr->get_thrust_policy(),
+                   d_var_indices.begin(),
+                   d_var_indices.end(),
+                   [objective_coefficients = make_span(objective_coefficients)] __device__(i_t var_idx) {
+                     objective_coefficients[var_idx] = 0.;
+                   });

Based on past review comments.

🧹 Nitpick comments (1)
cpp/src/mip/problem/problem.cu (1)

1856-1857: Clarify assertion messages for bound tightening.

The assertion logic is correct (verifying that new bounds tighten existing bounds), but the error messages could be clearer about what constitutes a violation.

Consider updating the messages to be more descriptive:

-      cuopt_assert(variable_bounds[var_idx].x <= lb_values[i], "variable lower bound violation");
-      cuopt_assert(variable_bounds[var_idx].y >= ub_values[i], "variable upper bound violation");
+      cuopt_assert(variable_bounds[var_idx].x <= lb_values[i], 
+                   "New lower bound must be >= existing lower bound (bound tightening)");
+      cuopt_assert(variable_bounds[var_idx].y >= ub_values[i], 
+                   "New upper bound must be <= existing upper bound (bound tightening)");
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2baeae1 and 19b993b.

📒 Files selected for processing (3)
  • cpp/src/mip/presolve/probing_cache.cu (5 hunks)
  • cpp/src/mip/problem/problem.cu (4 hunks)
  • python/libcuopt/pyproject.toml (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-22T14:25:22.899Z
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 527
File: cpp/src/mip/diversity/lns/rins.cu:167-175
Timestamp: 2025-10-22T14:25:22.899Z
Learning: In MIP (Mixed Integer Programming) problems in the cuOPT codebase, `n_integer_vars == 0` is impossible by definition—MIP problems must have at least one integer variable. If there are no integer variables, it would be a pure Linear Programming (LP) problem, not a MIP problem.

Applied to files:

  • cpp/src/mip/problem/problem.cu
  • cpp/src/mip/presolve/probing_cache.cu
🧬 Code graph analysis (2)
cpp/src/mip/problem/problem.cu (1)
cpp/src/utilities/copy_helpers.hpp (16)
  • device_copy (237-243)
  • device_copy (237-238)
  • device_copy (254-260)
  • device_copy (254-256)
  • device_copy (271-277)
  • device_copy (271-272)
  • device_copy (286-303)
  • device_copy (286-286)
  • make_span (322-327)
  • make_span (322-324)
  • make_span (330-335)
  • make_span (330-332)
  • make_span (338-341)
  • make_span (338-338)
  • make_span (344-347)
  • make_span (344-344)
cpp/src/mip/presolve/probing_cache.cu (3)
cpp/src/utilities/copy_helpers.hpp (22)
  • make_span (322-327)
  • make_span (322-324)
  • make_span (330-335)
  • make_span (330-332)
  • make_span (338-341)
  • make_span (338-338)
  • make_span (344-347)
  • make_span (344-344)
  • host_copy (118-125)
  • host_copy (118-118)
  • host_copy (136-151)
  • host_copy (136-136)
  • host_copy (162-165)
  • host_copy (162-162)
  • host_copy (175-181)
  • host_copy (175-175)
  • host_copy (192-198)
  • host_copy (192-192)
  • host_copy (209-212)
  • host_copy (209-209)
  • host_copy (223-226)
  • host_copy (223-223)
cpp/src/mip/problem/problem.cu (2)
  • view (907-944)
  • view (907-907)
cpp/src/mip/presolve/trivial_presolve.cu (2)
  • trivial_presolve (14-14)
  • trivial_presolve (18-18)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: wheel-build-cuopt-sh-client / 13.0.2, 3.10, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, arm64, rockylinux8
🔇 Additional comments (6)
python/libcuopt/pyproject.toml (1)

57-57: Verify that the updated package size threshold was validated during testing.

The max_allowed_size_compressed threshold increased by 10MB (1.7%) to accommodate the new features (probing implications, variable substitution, work-queue utilities). This appears proportionate given the substantial backend additions described in the PR.

Ensure the package was built and its compressed size verified to confirm it fits within 605M during your local testing.

cpp/src/mip/problem/problem.cu (2)

1701-1701: Clarify the commented-out mutex lock.

The commented-out std::lock_guard raises questions about thread safety in get_host_user_problem. If concurrent access is possible and thread safety is required, the mutex should be uncommented. If the mutex is unnecessary, the comment should be removed to avoid confusion.

Could you clarify whether:

  1. Thread safety is needed for this method?
  2. If yes, why is the mutex commented out?
  3. If no, can the comment be removed?

1328-1332: LGTM - Clean CSR sorting wrapper.

The function provides a clean interface to sort constraint rows by variable indices using cuSPARSE.

cpp/src/mip/presolve/probing_cache.cu (3)

535-567: LGTM - Correct bound tightening logic.

The function correctly merges bound modifications by taking the maximum lower bound and minimum upper bound for each variable, which is commutative and ensures determinism regardless of processing order.


569-598: LGTM - Proper substitution deduplication.

The function correctly handles potential duplicate substitutions by taking only the first substitution for each variable (line 579), preventing conflicting substitutions. The determinism depends on the queue order being deterministic, which appears to be ensured by timestamp ordering.


689-697: LGTM - Correctly implemented priority computation and early exit.

The code correctly:

  1. Uses thrust::lower_bound with thrust::greater comparator to find the first zero in the descending priority sequence (lines 689-697)
  2. Ensures a non-zero early-exit threshold using std::max(2, ...) to prevent premature termination on small problems (line 783)

Both improvements address issues from past reviews.

Also applies to: 782-784

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
cpp/src/mip/presolve/probing_cache.cu (2)

439-441: Both probes can be infeasible; replace assert with handling

The assert can trip on genuinely infeasible domains. Handle and propagate infeasibility instead of asserting. This was flagged previously.

-    cuopt_assert(multi_probe_presolve.infeas_constraints_count_0 == 0 ||
-                   multi_probe_presolve.infeas_constraints_count_1 == 0,
-                 "At least one of the probes must be feasible");
+    if (multi_probe_presolve.infeas_constraints_count_0 > 0 &&
+        multi_probe_presolve.infeas_constraints_count_1 > 0) {
+      CUOPT_LOG_WARNING("Both probes infeasible for var %d; domain infeasible", var_idx);
+      // Option 1: enqueue global infeasibility signal and return
+      return;
+    }

538-571: Bound-merge can produce inverted intervals; guard before applying

Merging via max(lb), min(ub) may yield lb>ub. Validate and either drop the conflicting update or clamp with tolerance before calling update_variable_bounds.

   for (const auto& [var_idx, modifications] : var_bounds_modifications) {
-    var_indices.push_back(var_idx);
-    lb_values.push_back(modifications.first);
-    ub_values.push_back(modifications.second);
+    auto lb = modifications.first;
+    auto ub = modifications.second;
+    if (lb <= ub) {
+      var_indices.push_back(var_idx);
+      lb_values.push_back(lb);
+      ub_values.push_back(ub);
+    } else {
+      CUOPT_LOG_WARNING("Skipping conflicting bounds for var %d: [%f, %f]", var_idx, lb, ub);
+    }
   }
🧹 Nitpick comments (2)
cpp/src/mip/problem/problem.cu (1)

1088-1101: Simplify template and call-site for are_exclusive

are_exclusive doesn’t use f_t. Drop the extra template parameter and update the call to avoid confusion.

-template <typename i_t, typename f_t>
-bool are_exclusive(const std::vector<i_t>& var_indices,
-                   const std::vector<i_t>& var_to_substitude_indices)
+template <typename i_t>
+bool are_exclusive(const std::vector<i_t>& var_indices,
+                   const std::vector<i_t>& var_to_substitude_indices)

And:

- cuopt_assert((are_exclusive<i_t, f_t>(var_indices, var_to_substitude_indices)),
+ cuopt_assert((are_exclusive<i_t>(var_indices, var_to_substitude_indices)),
cpp/src/mip/presolve/probing_cache.cu (1)

692-701: Trim priority list by zero counts using host data for simplicity

Current device-side lower_bound is more complex than needed and mixes device iterators with host erase. Copy counts to host and trim by the first zero position to avoid policy/iterator pitfalls.

-  // Find the index of the first 0 element in count_per_variable
-  auto first_zero_it      = thrust::lower_bound(problem.handle_ptr->get_thrust_policy(),
-                                           count_per_variable.begin(),
-                                           count_per_variable.end(),
-                                           0,
-                                           thrust::greater<i_t>());
-  size_t first_zero_index = (first_zero_it != count_per_variable.end())
-                              ? thrust::distance(count_per_variable.begin(), first_zero_it)
-                              : count_per_variable.size();
+  auto h_counts = host_copy(count_per_variable);
+  auto first_zero_it = std::find(h_counts.begin(), h_counts.end(), 0);
+  size_t first_zero_index = (first_zero_it != h_counts.end())
+                              ? std::distance(h_counts.begin(), first_zero_it)
+                              : h_counts.size();
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19b993b and 349811d.

📒 Files selected for processing (2)
  • cpp/src/mip/presolve/probing_cache.cu (5 hunks)
  • cpp/src/mip/problem/problem.cu (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-22T14:25:22.899Z
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 527
File: cpp/src/mip/diversity/lns/rins.cu:167-175
Timestamp: 2025-10-22T14:25:22.899Z
Learning: In MIP (Mixed Integer Programming) problems in the cuOPT codebase, `n_integer_vars == 0` is impossible by definition—MIP problems must have at least one integer variable. If there are no integer variables, it would be a pure Linear Programming (LP) problem, not a MIP problem.

Applied to files:

  • cpp/src/mip/problem/problem.cu
  • cpp/src/mip/presolve/probing_cache.cu
🧬 Code graph analysis (2)
cpp/src/mip/problem/problem.cu (1)
cpp/src/utilities/copy_helpers.hpp (16)
  • device_copy (237-243)
  • device_copy (237-238)
  • device_copy (254-260)
  • device_copy (254-256)
  • device_copy (271-277)
  • device_copy (271-272)
  • device_copy (286-303)
  • device_copy (286-286)
  • make_span (322-327)
  • make_span (322-324)
  • make_span (330-335)
  • make_span (330-332)
  • make_span (338-341)
  • make_span (338-338)
  • make_span (344-347)
  • make_span (344-344)
cpp/src/mip/presolve/probing_cache.cu (3)
cpp/src/utilities/copy_helpers.hpp (22)
  • make_span (322-327)
  • make_span (322-324)
  • make_span (330-335)
  • make_span (330-332)
  • make_span (338-341)
  • make_span (338-338)
  • make_span (344-347)
  • make_span (344-344)
  • host_copy (118-125)
  • host_copy (118-118)
  • host_copy (136-151)
  • host_copy (136-136)
  • host_copy (162-165)
  • host_copy (162-162)
  • host_copy (175-181)
  • host_copy (175-175)
  • host_copy (192-198)
  • host_copy (192-192)
  • host_copy (209-212)
  • host_copy (209-209)
  • host_copy (223-226)
  • host_copy (223-223)
cpp/src/mip/problem/problem.cu (4)
  • integer_equal (749-752)
  • integer_equal (749-749)
  • view (907-944)
  • view (907-907)
cpp/src/mip/presolve/trivial_presolve.cu (2)
  • trivial_presolve (14-14)
  • trivial_presolve (18-18)
🔇 Additional comments (2)
cpp/src/mip/problem/problem.cu (1)

1135-1156: Objective mass transfer: LGTM

Atomic accumulation onto the surviving variable and the objective offset fix the earlier loss-of-mass bug under parallel substitution.

If double precision may run on pre-sm_60 GPUs, confirm atomicAdd(double*, ...) availability or use a fallback (e.g., RAFT/CUB atomics).

cpp/src/mip/presolve/probing_cache.cu (1)

792-798: Good integration: apply substitutions then remap via trivial presolve

Applying substitutions before trivial_presolve(problem, remap_cache_ids) aligns with the new workflow and should maximize reductions.

Comment on lines +383 to 386
bool is_binary = ub == 1 && lb == 0;
for (i_t i = 0; i < 2; ++i) {
auto& probe_val = i == 0 ? probe_vals.first : probe_vals.second;
// if binary, probe both values
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Binary check should use tolerance or range-1 test

bool is_binary = (ub == 1 && lb == 0); misses near-binary cases and range-1 binaries. Use the established tolerance logic.

-  bool is_binary = ub == 1 && lb == 0;
+  bool is_binary = problem.integer_equal(ub - lb, 1.) &&
+                   problem.integer_equal(lb, 0.);

Or, if available, consult is_binary_variable[var_idx] from the problem view.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In cpp/src/mip/presolve/probing_cache.cu around lines 383 to 386, the current
binary detection uses strict equality bool is_binary = (ub == 1 && lb == 0);
which misses near-binary or range-1 cases; change it to use the established
tolerance/range-1 logic (e.g., fabs(ub - 1.0) <= tol && fabs(lb - 0.0) <= tol or
(ub - lb) <= 1 + tol) or, if available in this context, consult the problem view
flag is_binary_variable[var_idx]; update the condition to use that
tolerant/range-1 test or the is_binary_variable lookup so near-binary variables
are treated as binary.

Comment on lines +650 to +674
cub::DeviceSegmentedReduce::Reduce(d_temp_storage,
temp_storage_bytes,
input_transform_it_2,
count_per_variable.data(),
problem.n_variables,
problem.reverse_offsets.data(),
problem.reverse_offsets.data() + 1,
cuda::std::plus<>{},
0,
problem.handle_ptr->get_stream());
temp_storage.resize(temp_storage_bytes, problem.handle_ptr->get_stream());
d_temp_storage = thrust::raw_pointer_cast(temp_storage.data());
// Run reduction
cub::DeviceSegmentedReduce::Reduce(d_temp_storage,
temp_storage_bytes,
input_transform_it_2,
count_per_variable.data(),
problem.n_variables,
problem.reverse_offsets.data(),
problem.reverse_offsets.data() + 1,
cuda::std::plus<>{},
0,
problem.handle_ptr->get_stream());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

CUB segmented-reduce invoked twice without a fresh size query

The second reduction for count_per_variable is executed twice and reuses temp_storage_bytes computed for a different reduce. This is redundant and can be unsafe if temp storage is insufficient.

-  cub::DeviceSegmentedReduce::Reduce(d_temp_storage,
-                                     temp_storage_bytes,
-                                     input_transform_it_2,
-                                     count_per_variable.data(),
-                                     problem.n_variables,
-                                     problem.reverse_offsets.data(),
-                                     problem.reverse_offsets.data() + 1,
-                                     cuda::std::plus<>{},
-                                     0,
-                                     problem.handle_ptr->get_stream());
-
-  temp_storage.resize(temp_storage_bytes, problem.handle_ptr->get_stream());
-  d_temp_storage = thrust::raw_pointer_cast(temp_storage.data());
-
-  // Run reduction
-  cub::DeviceSegmentedReduce::Reduce(d_temp_storage,
+  // Query temp storage size for this reduce
+  d_temp_storage = nullptr;
+  temp_storage_bytes = 0;
+  cub::DeviceSegmentedReduce::Reduce(d_temp_storage,
+                                     temp_storage_bytes,
+                                     input_transform_it_2,
+                                     count_per_variable.data(),
+                                     problem.n_variables,
+                                     problem.reverse_offsets.data(),
+                                     problem.reverse_offsets.data() + 1,
+                                     cuda::std::plus<>{},
+                                     0,
+                                     problem.handle_ptr->get_stream());
+  temp_storage.resize(temp_storage_bytes, problem.handle_ptr->get_stream());
+  d_temp_storage = thrust::raw_pointer_cast(temp_storage.data());
+  // Run reduction once
+  cub::DeviceSegmentedReduce::Reduce(d_temp_storage,
                                      temp_storage_bytes,
                                      input_transform_it_2,
                                      count_per_variable.data(),
                                      problem.n_variables,
                                      problem.reverse_offsets.data(),
                                      problem.reverse_offsets.data() + 1,
                                      cuda::std::plus<>{},
                                      0,
                                      problem.handle_ptr->get_stream());
🤖 Prompt for AI Agents
cpp/src/mip/presolve/probing_cache.cu lines 650-674: the code calls
cub::DeviceSegmentedReduce::Reduce twice with the same temp_storage_bytes but
without performing a proper size-query for the actual reduction, which is
redundant and unsafe if the buffer sizes differ; change the first call to be a
size-query (pass d_temp_storage = nullptr) to populate temp_storage_bytes, then
resize temp_storage to that size and perform a single actual Reduce call (with
d_temp_storage set to raw pointer); remove the duplicated Reduce invocation and
ensure the parameters used in the size-query match the later actual call.

Comment on lines +1115 to +1119
const i_t dummy_substituted_variable = var_indices[0];
cuopt_assert(var_indices.size() == var_to_substitude_indices.size(), "size mismatch");
cuopt_assert(var_indices.size() == offset_values.size(), "size mismatch");
cuopt_assert(var_indices.size() == coefficient_values.size(), "size mismatch");
auto d_var_indices = device_copy(var_indices, handle_ptr->get_stream());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against empty substitution set

dummy_substituted_variable = var_indices[0] dereferences an empty vector if called with no substitutions. Add a fast return/guard.

 void problem_t<i_t, f_t>::substitute_variables(const std::vector<i_t>& var_indices,
                                                const std::vector<i_t>& var_to_substitude_indices,
                                                const std::vector<f_t>& offset_values,
                                                const std::vector<f_t>& coefficient_values)
 {
+  if (var_indices.empty()) { return; }
🤖 Prompt for AI Agents
In cpp/src/mip/problem/problem.cu around lines 1115 to 1119, the code
unconditionally reads var_indices[0] which will crash when the substitution set
is empty; add an early guard that checks if var_indices.empty() and immediately
return (or no-op) before accessing elements or running subsequent assertions and
device_copy; keep the rest of the logic unchanged so the function safely handles
an empty substitution set without dereferencing the vector.

Comment on lines +1231 to +1255
[variables = make_span(variables),
coefficients = make_span(coefficients),
offsets = make_span(offsets),
objective_coefficients = make_span(objective_coefficients),
dummy_substituted_variable] __device__(i_t cstr_idx) {
i_t offset_begin = offsets[cstr_idx];
i_t offset_end = offsets[cstr_idx + 1];
i_t duplicate_start_idx = -1;
while (offset_begin < offset_end - 1) {
i_t var_idx = variables[offset_begin];
i_t next_var_idx = variables[offset_begin + 1];
if (var_idx == next_var_idx) {
if (duplicate_start_idx == -1) { duplicate_start_idx = offset_begin; }
coefficients[duplicate_start_idx] += coefficients[offset_begin + 1];
variables[duplicate_start_idx] = variables[offset_begin + 1];
// mark those for elimination
variables[offset_begin + 1] = dummy_substituted_variable;
coefficients[offset_begin + 1] = 0.;
objective_coefficients[next_var_idx] = 0.;
} else {
duplicate_start_idx = -1;
}
offset_begin++;
}
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Do not zero objective coefficient of the surviving variable during duplicate merge

Zeroing objective_coefficients[next_var_idx] drops the objective term for that variable. Duplicate elimination is a row-local operation and must not alter the objective vector. Keep zeroing only the eliminated vars (already handled later for d_var_indices), and remove this line.

Apply this diff:

                      // mark those for elimination
                      variables[offset_begin + 1]          = dummy_substituted_variable;
                      coefficients[offset_begin + 1]       = 0.;
-                     objective_coefficients[next_var_idx] = 0.;
🤖 Prompt for AI Agents
In cpp/src/mip/problem/problem.cu around lines 1231 to 1255, the kernel
incorrectly zeros objective_coefficients[next_var_idx] when merging duplicate
entries which removes the objective term for the surviving variable; remove the
line that sets objective_coefficients[next_var_idx] = 0.0 so the objective
vector is not modified by this row-local duplicate-merge pass, and ensure only
eliminated entries (the ones marked with dummy_substituted_variable and zeroed
coefficients at offset_begin+1) are treated later when building d_var_indices.

Comment on lines +1861 to +1888
void problem_t<i_t, f_t>::update_variable_bounds(const std::vector<i_t>& var_indices,
const std::vector<f_t>& lb_values,
const std::vector<f_t>& ub_values)
{
if (var_indices.size() == 0) { return; }
// std::lock_guard<std::mutex> lock(problem_mutex);
cuopt_assert(var_indices.size() == lb_values.size(), "size of variable lower bound mismatch");
cuopt_assert(var_indices.size() == ub_values.size(), "size of variable upper bound mismatch");
auto d_var_indices = device_copy(var_indices, handle_ptr->get_stream());
auto d_lb_values = device_copy(lb_values, handle_ptr->get_stream());
auto d_ub_values = device_copy(ub_values, handle_ptr->get_stream());
thrust::for_each(
handle_ptr->get_thrust_policy(),
thrust::make_counting_iterator(0),
thrust::make_counting_iterator(0) + d_var_indices.size(),
[lb_values = make_span(d_lb_values),
ub_values = make_span(d_ub_values),
variable_bounds = make_span(variable_bounds),
var_indices = make_span(d_var_indices)] __device__(auto i) {
i_t var_idx = var_indices[i];
cuopt_assert(variable_bounds[var_idx].x <= lb_values[i], "variable lower bound violation");
cuopt_assert(variable_bounds[var_idx].y >= ub_values[i], "variable upper bound violation");
variable_bounds[var_idx].x = lb_values[i];
variable_bounds[var_idx].y = ub_values[i];
});
handle_ptr->sync_stream();
RAFT_CHECK_CUDA(handle_ptr->get_stream());
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Prevent inverted bounds when merging updates

When multiple updates are merged, lb can exceed ub. Enforce lb <= ub (within tolerance) before writing, otherwise throw or skip.

     [lb_values       = make_span(d_lb_values),
      ub_values       = make_span(d_ub_values),
      variable_bounds = make_span(variable_bounds),
      var_indices     = make_span(d_var_indices)] __device__(auto i) {
       i_t var_idx = var_indices[i];
-      cuopt_assert(variable_bounds[var_idx].x <= lb_values[i], "variable lower bound violation");
-      cuopt_assert(variable_bounds[var_idx].y >= ub_values[i], "variable upper bound violation");
+      cuopt_assert(variable_bounds[var_idx].x <= lb_values[i], "variable lower bound violation");
+      cuopt_assert(variable_bounds[var_idx].y >= ub_values[i], "variable upper bound violation");
+      // ensure non-inverted bounds
+      if (lb_values[i] > ub_values[i]) {
+        // Either swap, clamp, or signal; recommending signal to avoid silent invalid state.
+        asm("trap;"); // or set a device-side error flag
+      }
       variable_bounds[var_idx].x = lb_values[i];
       variable_bounds[var_idx].y = ub_values[i];
     });

Optionally, add a host-side sanity check after updates: check_problem_representation(false) to fail fast. As per coding guidelines.

🤖 Prompt for AI Agents
In cpp/src/mip/problem/problem.cu around lines 1861-1888, the device lambda
writes lb/ub without ensuring lb <= ub when concurrent updates are merged; add a
guard inside the CUDA device kernel to enforce lb <= ub (within a small
tolerance) before assigning variable_bounds: compute local lb_i and ub_i from
the provided spans, test if lb_i <= ub_i + tol, and if not either skip the write
or trigger an error path (e.g., set a device-side flag to indicate invalid
update or call cuopt_assert to fail), so no inverted bounds are written; after
the thrust::for_each and stream sync, optionally run the host-side sanity check
check_problem_representation(false) to fail fast and surface the problem.

@rgsl888prabhu rgsl888prabhu changed the base branch from main to release/25.12 November 17, 2025 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant