[P2P] Added congestion control support (Timely, Swift) by andrzej-k · Pull Request #837 · uccl-project/uccl

andrzej-k · 2026-03-24T12:40:01Z

Description

The intention is to allow RoCE EP and P2P to use congestion control algorithms in addition to currently supported flow control. That would be useful in environments without PFC support.

This PR is a first step to enable that. The overall plan:

Add Timely and Swift support in P2P, as part of that moved CC algorithms (Timely, Swift, EQDS) out of collectives to shared location.
Add Timely and Swift support in EP
Add EQDS support in P2P first, then EP.

Test results - two node setup:

export UCCL_P2P_RDMA_CC=swift

$ torchrun --nnodes=2 --nproc_per_node=1 --node-rank=0   --master_addr=<IP> --master_port=12355 p2p/benchmarks/benchmark_uccl.py
UCCL P2P Benchmark — mode: Standard | API: Sync | role: client
Number of key-value blocks per message: 1
Message sizes: 256 B, 1.0 KB, 4.0 KB, 16.0 KB, 64.0 KB, 256.0 KB, 1.0 MB, 10.0 MB, 16.0 MB, 100.0 MB
Device: gpu | Local GPU idx: 0 | Iterations: 10
Creating Engine with GPU index: 0
RdmaDeviceManager: Found 5 RDMA device(s)
  [0] 
  [1] 
  [2] 
  [3] 
  [4] irdma-mkp0
RdmaDeviceManager: Initialization complete
GPU 0 uses device 4 (irdma-mkp0)
System assigned port: 38717
Engine initialized for GPU 0
Endpoint initialized successfully
Attempting to connect to <IP>:0 via port 41241
Connected to <IP>:41241 (fd=63)
Accepted connection fd=64 from <IP>:57828
[Client] Connected to <IP>:41241 (GPU 0) conn_id=0
[Client]    256 B :   0.13 Gbps |   0.02 GB/s  | 0.000016 s
[Client]   1.0 KB :   0.52 Gbps |   0.07 GB/s  | 0.000016 s
[Client]   4.0 KB :   2.03 Gbps |   0.25 GB/s  | 0.000016 s
[Client]  16.0 KB :   8.03 Gbps |   1.00 GB/s  | 0.000016 s
[Client]  64.0 KB :  28.32 Gbps |   3.54 GB/s  | 0.000019 s
[Client] 256.0 KB :  76.63 Gbps |   9.58 GB/s  | 0.000027 s
[Client]   1.0 MB : 139.22 Gbps |  17.40 GB/s  | 0.000060 s
[Client]  10.0 MB : 184.99 Gbps |  23.12 GB/s  | 0.000453 s
[Client]  16.0 MB : 186.98 Gbps |  23.37 GB/s  | 0.000718 s
[Client] 100.0 MB : 186.41 Gbps |  23.30 GB/s  | 0.004500 s
[Client] Benchmark complete
Destroying Engine...
Engine destroyed

export UCCL_P2P_RDMA_CC=timely

$ torchrun --nnodes=2 --nproc_per_node=1 --node-rank=0   --master_addr=<IP> --master_port=12355 p2p/benchmarks/benchmark_uccl.py
UCCL P2P Benchmark — mode: Standard | API: Sync | role: client
Number of key-value blocks per message: 1
Message sizes: 256 B, 1.0 KB, 4.0 KB, 16.0 KB, 64.0 KB, 256.0 KB, 1.0 MB, 10.0 MB, 16.0 MB, 100.0 MB
Device: gpu | Local GPU idx: 0 | Iterations: 10
Creating Engine with GPU index: 0
RdmaDeviceManager: Found 5 RDMA device(s)
  [0]
  [1] 
  [2] 
  [3] 
  [4] irdma-mkp0
RdmaDeviceManager: Initialization complete
GPU 0 uses device 4 (irdma-mkp0)
System assigned port: 38567
Engine initialized for GPU 0
Endpoint initialized successfully
Attempting to connect to <IP>:0 via port 34689
Connected to <IP>:34689 (fd=59)
Accepted connection fd=64 from <IP>:36800
[Client] Connected to <IP>:34689 (GPU 0) conn_id=0
[Client]    256 B :   0.13 Gbps |   0.02 GB/s  | 0.000016 s
[Client]   1.0 KB :   0.53 Gbps |   0.07 GB/s  | 0.000016 s
[Client]   4.0 KB :   2.03 Gbps |   0.25 GB/s  | 0.000016 s
[Client]  16.0 KB :   8.10 Gbps |   1.01 GB/s  | 0.000016 s
[Client]  64.0 KB :  28.11 Gbps |   3.51 GB/s  | 0.000019 s
[Client] 256.0 KB :  75.48 Gbps |   9.44 GB/s  | 0.000028 s
[Client]   1.0 MB : 140.82 Gbps |  17.60 GB/s  | 0.000060 s
[Client]  10.0 MB : 184.47 Gbps |  23.06 GB/s  | 0.000455 s
[Client]  16.0 MB : 187.26 Gbps |  23.41 GB/s  | 0.000717 s
[Client] 100.0 MB : 186.34 Gbps |  23.29 GB/s  | 0.004502 s
[Client] Benchmark complete
Destroying Engine...
Engine destroyed

unset UCCL_P2P_RDMA_CC

$ torchrun --nnodes=2 --nproc_per_node=1 --node-rank=0   --master_addr=<IP> --master_port=12355 p2p/benchmarks/benchmark_uccl.py
UCCL P2P Benchmark — mode: Standard | API: Sync | role: client
Number of key-value blocks per message: 1
Message sizes: 256 B, 1.0 KB, 4.0 KB, 16.0 KB, 64.0 KB, 256.0 KB, 1.0 MB, 10.0 MB, 16.0 MB, 100.0 MB
Device: gpu | Local GPU idx: 0 | Iterations: 10
Creating Engine with GPU index: 0
RdmaDeviceManager: Found 5 RDMA device(s)
  [0] 
  [1]
  [2]
  [3] 
  [4] irdma-mkp0
RdmaDeviceManager: Initialization complete
GPU 0 uses device 4 (irdma-mkp0)
System assigned port: 46773
Engine initialized for GPU 0
Endpoint initialized successfully
Attempting to connect to <IP>:0 via port 36325
Connected to <IP>:36325 (fd=63)
Accepted connection fd=64 from <IP>:35360
[Client] Connected to <IP>:36325 (GPU 0) conn_id=0
[Client]    256 B :   0.13 Gbps |   0.02 GB/s  | 0.000016 s
[Client]   1.0 KB :   0.52 Gbps |   0.06 GB/s  | 0.000016 s
[Client]   4.0 KB :   2.07 Gbps |   0.26 GB/s  | 0.000016 s
[Client]  16.0 KB :   8.05 Gbps |   1.01 GB/s  | 0.000016 s
[Client]  64.0 KB :  28.73 Gbps |   3.59 GB/s  | 0.000018 s
[Client] 256.0 KB :  76.76 Gbps |   9.59 GB/s  | 0.000027 s
[Client]   1.0 MB : 141.41 Gbps |  17.68 GB/s  | 0.000059 s
[Client]  10.0 MB : 185.05 Gbps |  23.13 GB/s  | 0.000453 s
[Client]  16.0 MB : 187.25 Gbps |  23.41 GB/s  | 0.000717 s
[Client] 100.0 MB : 186.45 Gbps |  23.31 GB/s  | 0.004499 s
[Client] Benchmark complete
Destroying Engine...
Engine destroyed

Type of Change

Bug fix
New feature
Documentation update

How Has This Been Tested?

Include any tests here.

Unit tests
Integration tests
Manual testing

Checklist

I have run format.sh to follow the style guidelines.
I have run build.sh to verify compilation.
I have removed redundant variables and comments.
I have updated the documentation.
I have added tests.

* Moved CC algos to shared location. * In P2P added support for Timely and Swift. Signed-off-by: Andrzej Kuriata <andrzej.kuriata@intel.com>

Congestion control chages

1a7f370

* Moved CC algos to shared location. * In P2P added support for Timely and Swift. Signed-off-by: Andrzej Kuriata <andrzej.kuriata@intel.com>

andrzej-k force-pushed the ak_cc_mod branch from d155dda to 1a7f370 Compare March 30, 2026 07:43

andrzej-k changed the title ~~Allow EP and P2P to use congestion control - part 1 (extract CC algos)~~ [P2P] Added congestion control support (Timely, Swift) Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[P2P] Added congestion control support (Timely, Swift)#837

[P2P] Added congestion control support (Timely, Swift)#837
andrzej-k wants to merge 1 commit intouccl-project:mainfrom
andrzej-k:ak_cc_mod

andrzej-k commented Mar 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andrzej-k commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

How Has This Been Tested?

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

andrzej-k commented Mar 24, 2026 •

edited

Loading