Skip to content

Conversation

fengjica
Copy link
Contributor

This change:

  • passes the number of rail endpoints to others via serialization,
  • added all remote endpoints for each local rails into addresses,
  • round-robin pick local rails and remote endpoints for data transfer.

What?

Fix the assumption in the code today that rails configuration is the same across nodes.

Why?

Users intend to use AWS p5en and p6 nodes together.

How?

See changes described above.

Copy link

copy-pr-bot bot commented Oct 14, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link

👋 Hi fengjica! Thank you for contributing to ai-dynamo/nixl.

Your PR reviewers will review your contribution then trigger the CI to test your changes.

🚀

@fengjica fengjica force-pushed the dev/rails_mismatch branch 2 times, most recently from f3b6431 to 072697c Compare October 14, 2025 22:49
Copy link
Collaborator

@yexiang-aws yexiang-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logics look good to me. Just have some cosmetic comments

@ovidiusm
Copy link
Contributor

/build

@ovidiusm
Copy link
Contributor

/ok to test 807bbbe

@ovidiusm
Copy link
Contributor

/build

This change:
 - passes the number of rail endpoints to others via serialization,
 - added all remote endpoints for each local rails into addresses,
 - round-robin pick local rails and remote endpoints for data transfer.

Signed-off-by: Feng Ji <[email protected]>
@ovidiusm
Copy link
Contributor

/build

@ovidiusm
Copy link
Contributor

/ok to test 99d2627

@ovidiusm
Copy link
Contributor

/build

@ovidiusm
Copy link
Contributor

/ok to test e656843

@ovidiusm ovidiusm merged commit 2587801 into ai-dynamo:main Oct 17, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants