Skip to content

Conversation

@Kh4ster
Copy link
Contributor

@Kh4ster Kh4ster commented Jul 24, 2025

This is still under a lot of work.

This is to allow preliminary reviews.

Kh4ster added 30 commits July 2, 2025 18:33
…r of primal step size and dual step size, update the kernels to launch multiple threads and support a very wide batch size accordingly
… if batch is called with trust region restart
@Kh4ster Kh4ster added feature request New feature or request non-breaking Introduces a non-breaking change labels Jul 24, 2025
@Kh4ster Kh4ster requested review from hlinsen and kaatish July 24, 2025 16:48
@Kh4ster Kh4ster added the pdlp label Jul 24, 2025
@Kh4ster Kh4ster marked this pull request as draft July 24, 2025 16:49
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jul 24, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@Kh4ster Kh4ster removed request for hlinsen and kaatish July 24, 2025 16:50
@Kh4ster Kh4ster self-assigned this Jul 24, 2025
namespace cuopt::linear_programming::detail {

// This class is used to start a batched dot product
// With large problem size (>10K) and small batch size (<100), this is faster than using Segmented Reduce
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to think of it I'm not surprised, iirc SegmentedReduce does a 1 block:1 segment mapping and in your case that's pretty terrible, I'm not surprised parallel device-wide blasdot calls beats it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I just realized they added a new overload optimized for fixed sizes, I wasn't aware of it, maybe this performs better?
NVIDIA/cccl#3969

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good catch!! I will test that right away. It might make my life way simpler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still slower than using multiple dot products :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dang .-.
Looking at their benchmarks they only test for segment sizes up to 1024 so I guess they don't optimize whatsoever for few-segments scenarios. Would be nice if they said so in their docs!

@tmckayus
Copy link
Contributor

This is possibly a candidate for 25.10 but may still make 25.08

@rgsl888prabhu
Copy link
Collaborator

@Kh4ster Shall we move this to 25.10

@tmckayus
Copy link
Contributor

@Kh4ster Shall we move this to 25.10

I'm going to move this to 25.10, we can move it back if it gets finished

@tmckayus tmckayus modified the milestones: 25.08, 25.10 Jul 31, 2025
@anandhkb anandhkb modified the milestones: 25.10, 25.12 Sep 17, 2025
@anandhkb
Copy link
Contributor

De-prioritized for 25.10 and slating for 25.12 release

@rgsl888prabhu rgsl888prabhu changed the base branch from branch-25.08 to main October 22, 2025 17:02
@rgsl888prabhu rgsl888prabhu changed the base branch from main to release/25.12 November 17, 2025 21:38
@chris-maes
Copy link
Contributor

@Kh4ster is this fine to move out of the 25.12 release?

@chris-maes chris-maes modified the milestones: 25.12, 26.02 Nov 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Introduces a non-breaking change pdlp

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants