Skip to content

Conversation

@schythanyaku
Copy link

@schythanyaku schythanyaku commented Oct 14, 2025

NVIDIA: SAUCE: MEDIATEK: pinctrl: mediatek: Add EINT Driver for CX7 hot plug on DGX Spark

This driver is used to manage PCIe link for NVIDIA ConnectX-7 (CX7)
hot-plug/unplug on DGX Spark. We need to disable PCIe link when
CX7 cable plug out happens and enable pcie link when
CX7 cable plug in happens.

It also creates a sysfs entry to emulate cable plug in/out
behavior as below:

plug in - echo 1 > /sys/devices/platform/MTKP0001:00/cx7_dbg/plugin
plug out - echo 0 > /sys/devices/platform/MTKP0001:00/cx7_dbg/plugin

We also implement uevent to notify user-space applications when a
cable is plugged in or removed. Below are the details of our process:

  • cable plug-in:
    Report plug-in uevent (driver)
    Power on CX7
    Enable PCIe link (application)
    Rescan CX7 devices (application)

  • cable removal:
    Report removal uevent (driver)
    Remove CX7 devices (application)
    Disable PCIe link (application)
    Power off CX7

Signed-off-by: Vaibhav Vyas [email protected]
Signed-off-by: Jerry.Guo [email protected]
Signed-off-by: Yenchia Chen [email protected]
Signed-off-by: Shubhi Garg [email protected]
Signed-off-by: Abhishek Sahu [email protected]
Signed-off-by: Surabhi Chythanya Kumar [email protected]

@schythanyaku schythanyaku force-pushed the 6.14_eint_cx7_hp_driver_13Oct branch from a26a8cc to 8657477 Compare October 14, 2025 03:06
@schythanyaku schythanyaku changed the title added cx-7 driver changes [nvidia-6.14-next] Add Mediatek EINT driver for NVIDIA CX7 hotplug management Oct 14, 2025
@nirmoy
Copy link
Collaborator

nirmoy commented Oct 14, 2025

The PR conatins patch descriptions and sign off but the patch itself is missing sigeoff 8657477 and the author is root. Please fix that. ./scripts/checkpatch.pl <patch> will help detech such issues

@khfeng
Copy link
Collaborator

khfeng commented Oct 14, 2025

Can you please point out where the user-space application locates at?

I feel the entire process should be contained in the kernel.

@clsotog
Copy link
Collaborator

clsotog commented Oct 14, 2025

Like Nirmoy said please fix the commit.
Whats the difference between this PR and these PRs: #186 and #175.
When we sent those PRs to Canonical, they have concerns. Do we have handle all their concerns?

@schythanyaku schythanyaku force-pushed the 6.14_eint_cx7_hp_driver_13Oct branch from 8657477 to 65fc0d0 Compare October 14, 2025 19:43
@nvidia-bfigg nvidia-bfigg force-pushed the 24.04_linux-nvidia-6.14-next branch from a487a07 to be01c90 Compare October 16, 2025 15:01
@schythanyaku
Copy link
Author

Can you please point out where the user-space application locates at?

I feel the entire process should be contained in the kernel.
The user-space application will be included in the next PR I submit.

Like Nirmoy said please fix the commit. Whats the difference between this PR and these PRs: #186 and #175. When we sent those PRs to Canonical, they have concerns. Do we have handle all their concerns?
I’ve fixed the commit. Those are previous PRs targeting older Linux kernel versions. This is the most current and complete version that needs to be merged.
We’ve addressed all of Canonical’s concerns, and are preparing to upstream this driver for next subsequent Linux kernel release.

@clsotog
Copy link
Collaborator

clsotog commented Oct 17, 2025

I do not see the commit fixed and the PR needs to rebase.

@clsotog
Copy link
Collaborator

clsotog commented Oct 21, 2025

Please fix the commit it does not even have Sign-off-by tag and rebase.

@schythanyaku schythanyaku force-pushed the 6.14_eint_cx7_hp_driver_13Oct branch 2 times, most recently from 774bd6a to ff54b4a Compare October 22, 2025 23:52
@schythanyaku
Copy link
Author

Please fix the commit it does not even have Sign-off-by tag and rebase.

Thanks for the feedback. I have rebased and updated the commit.

@nvmochs
Copy link
Collaborator

nvmochs commented Oct 23, 2025

d07e29d firmware: arm_ffa: Add support for {un,}registration of framework notifications

I’m pretty sure you picked this from this commit in linux-nvidia-6.11
af420f8 firmware: arm_ffa: Add support for {un,}registration of framework notifications

That needs to be indicated, i.e. you need to pick with -x -s and then fix up the tag to indicate where it was picked.

e.g.
(cherry picked from commit af420f8 linux-nvidia-6.11)

Note: You’ll want to keep the original provenance as well, i.e. there will end up being 2 cherry-picked tags (1 from the upstream pick in 6.11 and 1 for this pick from 6.11).

All that said, I’m very confused by this. There are large sections of code missing compared to the original commit. If that is intended, then that needs to be spelled out in the PR and this also becomes a backport instead of just a cherry pick and requires notes in the commit message.


7d0eaad NVIDIA: SAUCE: PCI: Remove duplication in calling pci_acs_ctrl_enabled()

This looks like you picked it from somewhere? linux-nvidia-6.11? It needs a pick tag so I know where it came from.


88564c7 iommu: Store either domain or handle in group->pasid_array

Why are you picking this? And it’s removing arm_ffa content?


7133aad Revert "NVIDIA: SAUCE: PCI: Remove duplication in calling pci_acs_ctrl_enabled()”

You’re reverting the patch that you just added?


ff54b4a NVIDIA: SAUCE: MEDIATEK: pinctrl: mediatek: Add eint hotplug driver This driver is used to manage PCIe link for NVIDIA ConnectX-7 (CX7) hot-plug/unplug on DGX Spark. We need to disable PCIe link when CX7 cable plug out happens and enable pcie link when CX7 cable plug in happens.

This commit message is way too long.

I suspect the intention is for this PR to only contain this commit and the others were due to the branch not being properly rebased?

Is this driver upstream? Has it been posted to LKML?

@khfeng
Copy link
Collaborator

khfeng commented Oct 23, 2025

And please add the userspace application too.

@schythanyaku
Copy link
Author

ff54b4a NVIDIA: SAUCE: MEDIATEK: pinctrl: mediatek: Add eint hotplug driver This driver is used to manage PCIe link for NVIDIA ConnectX-7 (CX7) hot-plug/unplug on DGX Spark. We need to disable PCIe link when CX7 cable plug out happens and enable pcie link when CX7 cable plug in happens.

This commit message is way too long.
I see. Should a commit message include only the sign-off and a brief summary of the changes? Could you clarify what elements should be present in a commit message?

I suspect the intention is for this PR to only contain this commit and the others were due to the branch not being properly rebased?
yes. This is the only commit that is intended. Others were due to rebasing the PR.
Is this driver upstream? Has it been posted to LKML?
We will be upstreaming this driver on subsequent Linux kernel release. I'm not familiar with LKML. Canonical is aware of this.

@schythanyaku
Copy link
Author

And please add the userspace application too.
I will be including the user space application in the next separate PR.

@clsotog
Copy link
Collaborator

clsotog commented Oct 23, 2025

Please fix the commit it does not even have Sign-off-by tag and rebase.

Thanks for the feedback. I have rebased and updated the commit.

Why I see like 5 commits, it should be just one commit. Can you check your rebase?
Which userspace tool you are making PR? I thought we just need the MR from Jamie for dgx-spark-mlnx-hotplug and that is already merged at BaseOS. Can you elaborate a little more for the userspace?
I think the commit still missing the Signed-off-by with your name.

@nvmochs nvmochs self-requested a review October 23, 2025 14:03
@nvmochs
Copy link
Collaborator

nvmochs commented Oct 23, 2025

ff54b4a NVIDIA: SAUCE: MEDIATEK: pinctrl: mediatek: Add eint hotplug driver This driver is used to manage PCIe link for NVIDIA ConnectX-7 (CX7) hot-plug/unplug on DGX Spark. We need to disable PCIe link when CX7 cable plug out happens and enable pcie link when CX7 cable plug in happens.
This commit message is way too long.
I see. Should a commit message include only the sign-off and a brief summary of the changes? Could you clarify what elements should be present in a commit message?

The commit title/message in the previous PRs for this patch (#186 and #175) can be used as an example.

I suspect the intention is for this PR to only contain this commit and the others were due to the branch not being properly rebased?
yes. This is the only commit that is intended. Others were due to rebasing the PR.

Please rebase to the current HEAD of the branch and only include this single patch. Feel free to ping me on slack if you run into issues with that.

Is this driver upstream? Has it been posted to LKML?
We will be upstreaming this driver on subsequent Linux kernel release. I'm not familiar with LKML. Canonical is aware of this.

LKML is Linux Kernel Mailing List.

@schythanyaku schythanyaku force-pushed the 6.14_eint_cx7_hp_driver_13Oct branch from ff54b4a to 5150b00 Compare October 23, 2025 20:48
@coderabbitai
Copy link

coderabbitai bot commented Oct 23, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@schythanyaku schythanyaku force-pushed the 6.14_eint_cx7_hp_driver_13Oct branch from 5150b00 to e56b71d Compare October 23, 2025 21:03
@schythanyaku
Copy link
Author

schythanyaku commented Oct 23, 2025

Please fix the commit it does not even have Sign-off-by tag and rebase.

Thanks for the feedback. I have rebased and updated the commit.

Why I see like 5 commits, it should be just one commit. Can you check your rebase? Which userspace tool you are making PR? I thought we just need the MR from Jamie for dgx-spark-mlnx-hotplug and that is already merged at BaseOS. Can you elaborate a little more for the userspace? I think the commit still missing the Signed-off-by with your name.

Thanks for pointing this out. I initially misunderstood the status of the udev userspace changes. I had believed they hadn’t been merged yet. However, it appears that the primary udev update for the hot-plug driver is already incorporated into the base OS.
What I was specifically referring to is a userspace test script(requested by Canonical), that helps automate testing of the hot-plug driver without manual intervention. @clsotog helped clarify that this script is included in the merged dgx-spark-mlnx-hotplug package.
Given this, I don't believe a separate pull request is necessary for the userspace changes, as they have already been addressed.
I have updated the PR to fix the rebase and the commit.
Could you please take another look when you get a chance?Thank you!

Copy link
Collaborator

@nvmochs nvmochs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments and fixing up the source branch.

I have compared this commit with the version from PR 175 that I already ack'd and confirmed no differences exist. No further issues from me.

Acked-by: Matthew R. Ochs <[email protected]>

@schythanyaku schythanyaku force-pushed the 6.14_eint_cx7_hp_driver_13Oct branch from e56b71d to b43ea9a Compare October 24, 2025 02:33
@schythanyaku
Copy link
Author

Updated commit message- Canonical had provided feedback requesting confirmation of a plan and timeframe to migrate to the upstream driver. I’ve updated the commit message and comment accordingly in the PR: it now includes that migration plan and timeframe confirmation. I also added instructions for automated way of testing the driver which was also requested by canonical and additionally included some more details about the driver.

@nvmochs
Copy link
Collaborator

nvmochs commented Oct 24, 2025

Updated commit message- Canonical had provided feedback requesting confirmation of a plan and timeframe to migrate to the upstream driver. I’ve updated the commit message and comment accordingly in the PR: it now includes that migration plan and timeframe confirmation. I also added instructions for automated way of testing the driver which was also requested by canonical and additionally included some more details about the driver.

Unless Canonical specifically wanted them included in the commit message, I think the details you added are better left for the launchpad bug.

@clsotog
Copy link
Collaborator

clsotog commented Oct 24, 2025

@schythanyaku
Yeah I maybe agree with Matt. I created the launchpad for the previous PR so I think we can use the same.
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.11/+bug/2120474
If you have issues accessing the launchpad let me know and I can update it.
You can remove those extra lines from the commit so I can ack it.

…ot plug on DGX Spark

This driver is used to manage PCIe link for NVIDIA ConnectX-7 (CX7)
hot-plug/unplug on DGX Spark. We need to disable PCIe link when
CX7 cable plug out happens and enable PCIe link when
CX7 cable plug in happens.

It also creates a sysfs entry to emulate cable plug in/out
behavior as below:

plug in - echo 1 > /sys/devices/platform/MTKP0001:00/cx7_dbg/plugin
plug out - echo 0 > /sys/devices/platform/MTKP0001:00/cx7_dbg/plugin

We also implement uevent to notify user-space applications when a
cable is plugged in or removed. Below are the details of our process:

* cable plug-in:
    Report plug-in uevent (driver)
    Power on CX7
    Enable PCIe link (application)
    Rescan CX7 devices (application)

* cable removal:
    Report removal uevent (driver)
    Remove CX7 devices (application)
    Disable PCIe link (application)
    Power off CX7

Signed-off-by: Vaibhav Vyas <[email protected]>
Signed-off-by: Jerry.Guo <[email protected]>
Signed-off-by: Yenchia Chen <[email protected]>
Signed-off-by: Shubhi Garg <[email protected]>
Signed-off-by: Abhishek Sahu <[email protected]>
Signed-off-by: Surabhi Chythanya Kumar <[email protected]>
@schythanyaku schythanyaku force-pushed the 6.14_eint_cx7_hp_driver_13Oct branch from b43ea9a to 9d2ddd7 Compare October 24, 2025 22:31
@schythanyaku
Copy link
Author

@schythanyaku Yeah I maybe agree with Matt. I created the launchpad for the previous PR so I think we can use the same. https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.11/+bug/2120474 If you have issues accessing the launchpad let me know and I can update it. You can remove those extra lines from the commit so I can ack it.

Thanks Matt and Carol for the feedback. I have updated the commit and comment to remove those extra lines and added additional details in the provided launchpad link instead.

@clsotog
Copy link
Collaborator

clsotog commented Oct 24, 2025

@schythanyaku Yeah I maybe agree with Matt. I created the launchpad for the previous PR so I think we can use the same. https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.11/+bug/2120474 If you have issues accessing the launchpad let me know and I can update it. You can remove those extra lines from the commit so I can ack it.

Thanks Matt and Carol for the feedback. I have updated the commit and comment to remove those extra lines and added additional details in the provided launchpad link instead.

I see the launchpad updated.

@clsotog clsotog self-requested a review October 24, 2025 22:56
Copy link
Collaborator

@clsotog clsotog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked-by: Carol L Soto <[email protected]>

@ianm-nv
Copy link
Collaborator

ianm-nv commented Oct 28, 2025

Applied to noble:nvidia-6.14-next

@nvmochs
Copy link
Collaborator

nvmochs commented Oct 28, 2025

@nvmochs nvmochs closed this Oct 28, 2025
@nvmochs
Copy link
Collaborator

nvmochs commented Oct 28, 2025

@schythanyaku - Please also submit a PR for this against linux-nvidia-6.17.

@schythanyaku
Copy link
Author

@schythanyaku - Please also submit a PR for this against linux-nvidia-6.17.

Got it. Thanks. Created a PR for linux-nvidia-6.17 - #228

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants