Skip to content

Conversation

@KobaKoNvidia
Copy link
Collaborator

@KobaKoNvidia KobaKoNvidia commented Aug 29, 2025

Summary

This PR addresses a critical compatibility issue between IOMMU passthrough mode and NVIDIA UVM driver on ARM64 platforms, which causes NEMO framework and PyTorch CUDA initialization failures.

Problem

During NEMO framework testing on ARM64 platforms, tests consistently fail on machines configured with iommu.passthrough=1. The failure is specifically triggered by torch.cuda.set_device(1) calls, resulting in kernel warnings and IOMMU-related errors.

Note: iommu.passthrough=1 was previously recommended by NVIDIA in official presentation slides as a performance optimization parameter, making this a high-impact issue for users following NVIDIA's best practices.

Root Cause

The issue occurs when:

  • ARM64 system is configured with kernel parameter iommu.passthrough=1
  • NVIDIA Coherent GPU Memory Mode (CDMM) is enabled
  • PyTorch attempts to initialize CUDA devices

The kernel warning indicates a conflict in the ARM SMMU v3 driver when UVM attempts to bind GPU VA space:

jgunthorpe and others added 19 commits August 26, 2025 01:51
Instead of passing a naked __le16 * around to represent a STE wrap it in a
"struct arm_smmu_ste" with an array of the correct size. This makes it
much clearer which functions will comprise the "STE API".

Reviewed-by: Moritz Fischer <[email protected]>
Reviewed-by: Michael Shavit <[email protected]>
Reviewed-by: Eric Auger <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Tested-by: Nicolin Chen <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 57b8904)
Signed-off-by: Koba Ko <[email protected]>
arm_smmu_s1_cfg (and by extension arm_smmu_domain) owns both a CD table
and the CD inserted into that table's non-pasid CD entry. This limits
arm_smmu_domain's ability to represent non-pasid domains, where multiple
domains need to be inserted into a common CD table. Rather than describing
an STE entry (which may have multiple domains installed into it with
PASID), a domain should describe a single CD entry instead. This is
precisely the role of arm_smmu_ctx_desc. A subsequent commit will also
move the CD table outside of arm_smmu_domain.

Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Signed-off-by: Michael Shavit <[email protected]>
Tested-by: Nicolin Chen <[email protected]>
Link: https://lore.kernel.org/r/20230915211705.v8.1.I67ab103c18d882aedc8a08985af1fba70bca084e@changeid
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 987a878)
Signed-off-by: Koba Ko <[email protected]>
Remove struct arm_smmu_s1_cfg. This is really just a CD table with a
bit of extra information. Move other attributes of the CD table that
were held there into the existing CD table structure, struct
arm_smmu_ctx_desc_cfg, and replace all usages of arm_smmu_s1_cfg with
arm_smmu_ctx_desc_cfg.

For clarity, use the name "cd_table" for the variables pointing to
arm_smmu_ctx_desc_cfg in the new code instead of cdcfg. A later patch
will make this fully consistent.

Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Signed-off-by: Michael Shavit <[email protected]>
Tested-by: Nicolin Chen <[email protected]>
Link: https://lore.kernel.org/r/20230915211705.v8.2.I1ef1ed19d7786c8176a0d05820c869e650c8d68f@changeid
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 1f85888)
Signed-off-by: Koba Ko <[email protected]>
This is slighlty cleaner: arm_smmu_ctx_desc_cfg is initialized in a
single function instead of having pieces set ahead-of time by its caller.

Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Signed-off-by: Michael Shavit <[email protected]>
Tested-by: Nicolin Chen <[email protected]>
Link: https://lore.kernel.org/r/20230915211705.v8.3.I875254464d044a8ce8b3a2ad6beb655a4a006456@changeid
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit e3aad74)
Signed-off-by: Koba Ko <[email protected]>
A domain can be attached to multiple masters with different
master->stall_enabled values. The stall bit of a CD entry should follow
master->stall_enabled and has an inverse relationship with the
STE.S1STALLD bit.

The stall_enabled bit does not depend on any property of the domain, so
move it out of the arm_smmu_domain struct.  Move it to the CD table
struct so that it can fully describe how CD entries should be written to
it.

Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Signed-off-by: Michael Shavit <[email protected]>
Tested-by: Nicolin Chen <[email protected]>
Link: https://lore.kernel.org/r/20230915211705.v8.4.I5aa89c849228794a64146cfe86df21fb71629384@changeid
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 1228cc5)
Signed-off-by: Koba Ko <[email protected]>
Update arm_smmu_write_ctx_desc and downstream functions to operate on
a master instead of an smmu domain. We expect arm_smmu_write_ctx_desc()
to only be called to write a CD entry into a CD table owned by the
master. Under the hood, arm_smmu_write_ctx_desc still fetches the CD
table from the domain that is attached to the master, but a subsequent
commit will move that table's ownership to the master.

Note that this change isn't a nop refactor since SVA will call
arm_smmu_write_ctx_desc in a loop for every master the domain is
attached to despite the fact that they all share the same CD table. This
loop may look weird but becomes necessary when the CD table becomes
per-master in a subsequent commit.

Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Signed-off-by: Michael Shavit <[email protected]>
Tested-by: Nicolin Chen <[email protected]>
Link: https://lore.kernel.org/r/20230915211705.v8.5.I219054a6cf538df5bb22f4ada2d9933155d6058c@changeid
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 2450314)
Signed-off-by: Koba Ko <[email protected]>
With this change, each master will now own its own CD table instead of
sharing one with other masters attached to the same domain. Attaching a
stage 1 domain installs CD entries into the master's CD table. SVA
writes its CD entries into each master's CD table if the domain is
shared across masters.

Also add the device to the devices list before writing the CD to the
table so that SVA will know that the CD needs to be re-written to this
device's CD table as well if it decides to update the CD's ASID
concurrently with this function.

Tested-by: Nicolin Chen <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Michael Shavit <[email protected]>
Link: https://lore.kernel.org/r/20230915211705.v8.6.Ice063dcf87d1b777a72e008d9e3406d2bcf6d876@changeid
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 10e4968)
Signed-off-by: Koba Ko <[email protected]>
Remove unused master parameter now that the CD table is allocated
elsewhere.

Reviewed-by: Nicolin Chen <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Michael Shavit <[email protected]>
Tested-by: Nicolin Chen <[email protected]>
Link: https://lore.kernel.org/r/20230915211705.v8.7.Iff18df41564b9df82bf40b3ec7af26b87f08ef6e@changeid
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 5e14313)
Signed-off-by: Koba Ko <[email protected]>
Update the comment to reflect the fact that the STE is not always
installed. arm_smmu_domain_finalise_s1 intentionnaly calls
arm_smmu_write_ctx_desc while the STE is not installed.

Reviewed-by: Nicolin Chen <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Michael Shavit <[email protected]>
Tested-by: Nicolin Chen <[email protected]>
Link: https://lore.kernel.org/r/20230915211705.v8.8.I7a8beb615e2520ad395d96df94b9ab9708ee0d9c@changeid
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 6032f58)
Signed-off-by: Koba Ko <[email protected]>
cdcfg is a confusing name, especially given other variables with the cfg
suffix in this driver. cd_table more clearly describes what is being
operated on.

Tested-by: Nicolin Chen <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Signed-off-by: Michael Shavit <[email protected]>
Link: https://lore.kernel.org/r/20230915211705.v8.9.I5ee79793b444ddb933e8bc1eb7b77e728d7f8350@changeid
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 475918e)
Signed-off-by: Koba Ko <[email protected]>
The only caller is arm_smmu_install_ste_for_dev() which never has a NULL
master. Remove the confusing if.

Reviewed-by: Moritz Fischer <[email protected]>
Reviewed-by: Michael Shavit <[email protected]>
Reviewed-by: Eric Auger <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Tested-by: Nicolin Chen <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 12a48fe)
Signed-off-by: Koba Ko <[email protected]>
This allows a driver to set a global static to an IDENTITY domain and
the core code will automatically use it whenever an IDENTITY domain
is requested.

By making it always available it means the IDENTITY can be used in error
handling paths to force the iommu driver into a known state. Devices
implementing global static identity domains should avoid failing their
attach_dev ops.

To make global static domains simpler allow drivers to omit their free
function and update the iommufd selftest.

Convert rockchip to use the new mechanism.

Tested-by: Steven Price <[email protected]>
Tested-by: Marek Szyprowski <[email protected]>
Tested-by: Nicolin Chen <[email protected]>
Reviewed-by: Lu Baolu <[email protected]>
Reviewed-by: Jerry Snitselaar <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Joerg Roedel <[email protected]>
(cherry picked from commit df31b29)
Signed-off-by: Koba Ko <[email protected]>
This is used when the iommu driver is taking control of the dma_ops,
currently only on S390 and power spapr. It is designed to preserve the
original ops->detach_dev() semantic that these S390 was built around.

Provide an opaque domain type and a 'default_domain' ops value that allows
the driver to trivially force any single domain as the default domain.

Update iommufd selftest to use this instead of set_platform_dma_ops

Reviewed-by: Lu Baolu <[email protected]>
Reviewed-by: Jerry Snitselaar <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Joerg Roedel <[email protected]>
(cherry picked from commit 1c68cbc)
Signed-off-by: Koba Ko <[email protected]>
Introduce a new iommu_domain op to create domains owned by userspace,
e.g. through IOMMUFD. These domains have a few different properties
compares to kernel owned domains:

 - They may be PAGING domains, but created with special parameters.
   For instance aperture size changes/number of levels, different
   IOPTE formats, or other things necessary to make a vIOMMU work

 - We have to track all the memory allocations with GFP_KERNEL_ACCOUNT
   to make the cgroup sandbox stronger

 - Device-specialty domains, such as NESTED domains can be created by
   IOMMUFD.

The new op clearly says the domain is being created by IOMMUFD, that the
domain is intended for userspace use, and it provides a way to pass user
flags or a driver specific uAPI structure to customize the created domain
to exactly what the vIOMMU userspace driver requires.

iommu drivers that cannot support VFIO/IOMMUFD should not support this
op. This includes any driver that cannot provide a fully functional PAGING
domain.

This new op for now is only supposed to be used by IOMMUFD, hence no
wrapper for it. IOMMUFD would call the callback directly. As for domain
free, IOMMUFD would use iommu_domain_free().

Link: https://lore.kernel.org/r/[email protected]
Suggested-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Co-developed-by: Nicolin Chen <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
(cherry picked from commit 909f4ab)
Signed-off-by: Koba Ko <[email protected]>
Make IOMMUFD use iommu_domain_alloc_user() by default for iommu_domain
creation. IOMMUFD needs to support iommu_domain allocation with parameters
from userspace in nested support, and a driver is expected to implement
everything under this op.

If the iommu driver doesn't provide domain_alloc_user callback then
IOMMUFD falls back to use iommu_domain_alloc() with an UNMANAGED type if
possible.

Link: https://lore.kernel.org/r/[email protected]
Suggested-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Co-developed-by: Nicolin Chen <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
(cherry picked from commit 7975b72)
Signed-off-by: Koba Ko <[email protected]>
Extends iommufd_hw_pagetable_alloc() to accept user flags, the uAPI will
provide the flags.

Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Kevin Tian <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Reviewed-by: Lu Baolu <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
(cherry picked from commit 89d6387)
Signed-off-by: Koba Ko <[email protected]>
Extend IOMMU_HWPT_ALLOC to allocate domains to be used as parent (stage-2)
in nested translation.

Add IOMMU_HWPT_ALLOC_NEST_PARENT to the uAPI.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Yi Liu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Lu Baolu <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
(cherry picked from commit 4ff5421)
Signed-off-by: Koba Ko <[email protected]>
Add mock_domain_alloc_user() and a new test case for
IOMMU_HWPT_ALLOC_NEST_PARENT.

Link: https://lore.kernel.org/r/[email protected]
Co-developed-by: Nicolin Chen <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
(cherry picked from commit 4086636)
Signed-off-by: Koba Ko <[email protected]>
Add the domain_alloc_user() op implementation. It supports allocating
domains to be used as parent under nested translation.

Unlike other drivers VT-D uses only a single page table format so it only
needs to check if the HW can support nesting.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Yi Liu <[email protected]>
Reviewed-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
(cherry picked from commit c97d1b2)
Signed-off-by: Koba Ko <[email protected]>
Copy link
Collaborator

@arighi arighi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty large patch set (152 commits), but if it clearly fixes a problem, then I guess we don't have many chances...

I'm a bit worried about the risk of conflicts that we may have in the future by applying this big patch set, even if this change is pretty much contained in the iommu stuff, so hopefully it won't be too much pain to maintain.

A couple of comments:

  1. Is there a simple test case that we can use to validate that this patch set fixes the problem (you may have already mentioned it, but I can't find it)?
  2. for the top commit (arm64: configs: Add kernel configuration for 6.6.63 with 64k pages) what about using a similar convention that we were using in Canonical to quickly identify our commits, like adding an NVIDIA: prefix?

@nvmochs
Copy link
Collaborator

nvmochs commented Aug 29, 2025

@KobaKoNvidia

Did you intend to have more information in the PR summary? (It looks like maybe some of the content was lost)

Can you provide information on how this was tested?

Note: iommu.passthrough=1 was previously recommended by NVIDIA in official presentation slides as a performance optimization parameter, making this a high-impact issue for users following NVIDIA's best practices.

While the performance tuning for Grace does state that some applications might have some performance benefits when using this setting, the generic Linux OS guide points out that a patch from 6.11 is required for to enable it. Hence this PR, where you are doing a subsystem backport from 6.11 to 6.6 to obtain that patch.


I don’t have a conern with the individual patches that have been picked as it appears they all picked cleanly. I did manually review the ones that did not match upstream exactly and they confirmed they were only flagged for context differences (likely due to some out of order pick or other unrelated patches being absent).

However, I have the same overall concern as @arighi. Before taking this outright, I think we should strongly consider the long-term implications vs. benefit of supporting this feature.

@khfeng
Copy link
Collaborator

khfeng commented Sep 17, 2025

I have a simple script to find upstream fixes and here's the output:

Fix found: b85b4f30846bb169c114e99ceee17cc119f02a4b iommu: Fix return code in iommu_group_alloc_default_domain()                                                                                                                                 [10/680]
 for 1c68cbc64fe6ac01dc242ba562344303031a76fb iommu: Add IOMMU_DOMAIN_PLATFORM                                                                                                                                                                           
Fix found: 30a3f2f3e4bd6335b727c83c08a982d969752bc1 iommu: Fix two issues in iommu_copy_struct_from_user()                  
 for e9d36c07bb787840e4813fb09a929a17d522a69f iommu: Add iommu_copy_struct_from_user helper                                 
Fix found: 99deffc409b69000ac4877486e69ec6516becd53 iommu/exynos: Fix suspend/resume with IDENTITY domain                   
 for b3d14960e629f9ee8c82f8feb211ae43e1cb3246 iommu/exynos: Implement an IDENTITY domain                                                                                                                                                                 
Fix found: da55da5a42d4247d7a48b843fa5fcd9a4a10f4fe iommu/arm-smmu-v3: Make the kunit into a module                         
 for 56e1a4cc2588a7cb9664457a62fd7a77e005aa01 iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry                                                                                                                                                 
Fix found: cb83f4b965a66d85e9a03621ef3b22c044f4a033 gpu: host1x: Do not assume that a NULL domain means no DMA IOMMU                                                                                                                                     
 for c8cc2655cc6c7ff832827ad5bc1a8f3df165706d iommu/tegra-smmu: Implement an IDENTITY domain                                
Fix found: d2d00e15808c37ec476a5c040ee2cdd23854ef18 powerpc: iommu: Bring back table group release_ownership() call                                                                                                                                      
 for 2ad56efa80dba89162106c06ebc00b611325e584 powerpc/iommu: Setup a default domain and remove set_platform_dma_ops                                                                                                                                      
Fix found: 950aeefb34923fe3c28ade35fe05f24e2c5b1d55 iommufd/device: Fix hwpt at err_unresv in iommufd_device_do_replace()                                                                                                                                
 for 89db31635c87a7856e205c7ebf9f562e4bb206fe iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable                 
Fix found: b58b133e680b20d219940e0fdb6f6132c2b60f38 iommu: Handle iommu faults for a bad iopf setup                         
 for 3dfa64aecbafc288216b2790438d395add192c30 iommu: Make iommu_report_device_fault() return void                           
Fix found: 34e2dccbb30baf7e5502bae382722aacbbfddc5b iommu: Flow ERR_PTR out from __iommu_domain_alloc()                     
 for 482feb5c649261cd2a7ad02e4ca63c159d6ec795 iommu/dart: Call apple_dart_finalize_domain() as part of alloc_paging()                                                                                                                                    
Fix found: 7be423336eccc872249d37900c19c1d24f171353 iommu: Set owner token to SVA domain                                    
 for a9c362db39207c4934c9125e56ed730c5297c37c iommu: Validate that devices match domains                                    
Fix found: b5f9e63278d6f32789478acf1ed41d21d92b36cf iommufd: Correct IOMMU_HWPT_ALLOC_NEST_PARENT description               
 for 4ff542163397073f86eda484318d61980ff1031d iommufd: Support allocating nested parent domain                              
Fix found: 0f91d0795741c12cee200667648669a91b568735 iommu/amd: Enhance def_domain_type to handle untrusted device                                                                                                                                        
 for 59ddce4418da483c932bc7a08b88d6ba14020e83 iommu: Reorganize iommu_get_default_domain_type() to respect def_domain_type()                                                                                                                             
Fix found: 73183ad6ea51029d04b098286dcee98d715015f1 iommufd: Check the domain owner of the parent before creating a nesting domain                                                                                                                       
 for bd529dbb661d62bd9f03e44c9fc837d98a190499 iommufd: Add a nested HW pagetable object                                     
Fix found: 818625570558cd91082c9bafd6f2b59b73241a69 iommufd/selftest: Fix iommufd_dirty_tracking with large hugepage sizes                                                                                                                               
 for a9af47e382a4d517685cb13c780272e7f300ebc5 iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP                            
Fix found: 0493e739ccc60a3e0870847f1a12d6d79b86a1fc iommu/arm-smmu-v3: Add cpu_to_le64() around STRTAB_STE_0_V              
 for 7da51af9125c624318c8099de13c5ddefd47e9e8 iommu/arm-smmu-v3: Make STE programming independent of the callers                                                                                                                                         
Fix found: b5c29fba72a6c950655d1cb0f6aa16b60dc83be7 iommu: Make iommu_sva_domain_alloc() static                             
 for 80af5a45202422db957549a241e00bf4d4e0ce89 iommu: Add ops->domain_alloc_sva()                                            
Fix found: c9b4a3185fcb2bca0ab8cd098a4df85b2951c44b iommu/omap: Add minimal fwnode support                                  
 for 17de3f5fdd35676b0e3d41c7c9bf4e3032eb3673 iommu: Retire bus ops                                                         
Fix found: 685ca577b408ffd9c5a4057a2acc0cd3e6978b36 iommu/arm-smmu-v3: Fix smmu_domain->nr_ats_masters decrement                                                                                                                                         
 for 7497f4211f4fbdcec5fc5bb4df7f6ccd345966e8 iommu/arm-smmu-v3: Make changing domains be hitless for ATS                   
Fix found: 7a41dcb52f9de6079621fc31c3b84c7fc290934b iommu/amd: Set the pgsize_bitmap correctly                              
 for 134288158a415cd863b1c32c7dcddc0a1dc32aab iommu/amd: Add domain_alloc_user based domain allocation                      
Fix found: d3867e7148318e12b5d69b64950622f5ed06fe86 iommu/arm-smmu-v3: Avoid uninitialized asid in case of error            
 for 04905c17f64890311e6b5a5065d8c220602712e5 iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd()                
Fix found: 2780025e01e2e1c92f83ee7da91d9727c2e58a3e iommufd/iova_bitmap: Handle recording beyond the mapped pages                                                                                                                                        
 for 421a511a293fe1c73b37f6147c6676c4ee6efa04 iommu/amd: Access/Dirty bit support in IOPTEs                                 
Fix found: cf1e515c9a40caa8bddb920970d3257bb01c1421 iommufd/selftest: Make dirty_ops static                                 
 for 266ce58989ba05e2a24460fdbf402d766c2e3870 iommufd/selftest: Test IOMMU_HWPT_ALLOC_DIRTY_TRACKING                        
Fix found: 2780025e01e2e1c92f83ee7da91d9727c2e58a3e iommufd/iova_bitmap: Handle recording beyond the mapped pages
 for f35f22cc760eb2c7034bf53251399685d611e03f iommu/vt-d: Access/Dirty bit support for SS domains                           
Fix found: b46064a18810bad3aea089a79993ca5ea7a3d2b2 iommu: Handle race with default domain setup                            
 for 98ac73f99bc44fba8a14252ffb0bad02459f7008 iommu: Require a default_domain for all iommu drivers                         
Fix found: 9759ae2cee7cd42b95f1c48aa3749bd02b5ddb08 iommu: Fix potential memory leak in iopf_queue_remove_device()                                                                                                                                       
 for 19911232713573a2ebea84a25bd4d71d024ed86b iommu: Track iopf group instead of last fault                                 
Fix found: a2cdecdf9d234455fdfc8f539bbf5818711bc29d iommu/vt-d: Enhance capability check for nested parent domain allocation                                                                                                                             
 for c97d1b20d3835178bcd0e3a86c20ce4e36b6d80c iommu/vt-d: Add domain_alloc_user op                                          

@khfeng
Copy link
Collaborator

khfeng commented Sep 17, 2025

Is this 6.6 kernel ARM64 only? Or should AMD64 be covered as well?

yiliu1765 and others added 2 commits November 17, 2025 14:14
This adds the scalable mode check before allocating the nested parent domain
as checking nested capability is not enough. User may turn off scalable mode
which also means no nested support even if the hardware supports it.

Fixes: c97d1b2 ("iommu/vt-d: Add domain_alloc_user op")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Yi Liu <[email protected]>
Reviewed-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
(cherry picked from commit a2cdecd)
Signed-off-by: Koba Ko <[email protected]>
The IOMMU_HWPT_ALLOC_NEST_PARENT flag is used to allocate a HWPT. Though
a HWPT holds a domain in the core structure, it is still quite confusing
to describe it using "domain" in the uAPI kdoc. Correct it to "HWPT".

Fixes: 4ff5421 ("iommufd: Support allocating nested parent domain")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Nicolin Chen <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
(cherry picked from commit b5f9e63)
Signed-off-by: Koba Ko <[email protected]>
jgunthorpe and others added 29 commits December 3, 2025 16:46
A cleared entry is all 0's. Make arm_smmu_clear_cd() do this sequence.

If we are clearing an entry and for some reason it is not already
allocated in the CD table then something has gone wrong.

Remove case (5) from arm_smmu_write_ctx_desc().

Tested-by: Nicolin Chen <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Reviewed-by: Michael Shavit <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Reviewed-by: Moritz Fischer <[email protected]>
Reviewed-by: Mostafa Saleh <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit af8f0b8)
Signed-off-by: Koba Ko <[email protected]>
Only the attach callers can perform an allocation for the CD table entry,
the other callers must not do so, they do not have the correct locking and
they cannot sleep. Split up the functions so this is clear.

arm_smmu_get_cd_ptr() will return pointer to a CD table entry without
doing any kind of allocation.

arm_smmu_alloc_cd_ptr() will allocate the table and any required
leaf.

A following patch will add lockdep assertions to arm_smmu_alloc_cd_ptr()
once the restructuring is completed and arm_smmu_alloc_cd_ptr() is never
called in the wrong context.

Tested-by: Nicolin Chen <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit b2f4c0f)
Signed-off-by: Koba Ko <[email protected]>
Avoid arm_smmu_attach_dev() having to undo the changes to the
smmu_domain->devices list, acquire the cdptr earlier so we don't need to
handle that error.

Now there is a clear break in arm_smmu_attach_dev() where all the
prep-work has been done non-disruptively and we commit to making the HW
change, which cannot fail.

This completes transforming arm_smmu_attach_dev() so that it does not
disturb the HW if it fails.

Tested-by: Nicolin Chen <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Reviewed-by: Michael Shavit <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Reviewed-by: Mostafa Saleh <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 13abe4f)
Signed-off-by: Koba Ko <[email protected]>
Pull all the calculations for building the CD table entry for a mmu_struct
into arm_smmu_make_sva_cd().

Call it in the two places installing the SVA CD table entry.

Open code the last caller of arm_smmu_update_ctx_desc_devices() and remove
the function.

Remove arm_smmu_write_ctx_desc() since all callers are gone. Add the
locking assertions to arm_smmu_alloc_cd_ptr() since
arm_smmu_update_ctx_desc_devices() was the last problematic caller.

Remove quiet_cd since all users are gone, arm_smmu_make_sva_cd() creates
the same value.

The behavior of quiet_cd changes slightly, the old implementation edited
the CD in place to set CTXDESC_CD_0_TCR_EPD0 assuming it was a SVA CD
entry. This version generates a full CD entry with a 0 TTB0 and relies on
arm_smmu_write_cd_entry() to install it hitlessly.

Tested-by: Nicolin Chen <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 7b87c93)
Signed-off-by: Koba Ko <[email protected]>
Half the code was living in arm_smmu_domain_finalise_s1(), just move it
here and take the values directly from the pgtbl_ops instead of storing
copies.

Tested-by: Nicolin Chen <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Reviewed-by: Michael Shavit <[email protected]>
Reviewed-by: Mostafa Saleh <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 04905c1)
Signed-off-by: Koba Ko <[email protected]>
Add tests for some of the more common STE update operations that we expect
to see, as well as some artificial STE updates to test the edges of
arm_smmu_write_entry. These also serve as a record of which common
operation is expected to be hitless, and how many syncs they require.

arm_smmu_write_entry implements a generic algorithm that updates an STE/CD
to any other abritrary STE/CD configuration. The update requires a
sequence of write+sync operations with some invariants that must be held
true after each sync. arm_smmu_write_entry lends itself well to
unit-testing since the function's interaction with the STE/CD is already
abstracted by input callbacks that we can hook to introspect into the
sequence of operations. We can use these hooks to guarantee that
invariants are held throughout the entire update operation.

Link: https://lore.kernel.org/r/[email protected]
Tested-by: Nicolin Chen <[email protected]>
Signed-off-by: Michael Shavit <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 56e1a4c)
Signed-off-by: Koba Ko <[email protected]>
Static checker is complaining about the ASID possibly set uninitialized.
This only happens in case of error and this value would be ignored anyway.

A simple fix would be just to initialize the local variable to zero,
this path will only be reached on the first attach to a domain where
the CD is already initialized to zero.
This avoids having to bloat the function with an error path.

Closes: https://lore.kernel.org/linux-iommu/[email protected]/T/#u
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Mostafa Saleh <[email protected]>
Fixes: 04905c1 ("iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd()")
Reviewed-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit d3867e7)
Signed-off-by: Koba Ko <[email protected]>
This allows the driver the receive the mm and always a device during
allocation. Later patches need this to properly setup the notifier when
the domain is first allocated.

Remove ops->domain_alloc() as SVA was the only remaining purpose.

Tested-by: Nicolin Chen <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Reviewed-by: Michael Shavit <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Reviewed-by: Jerry Snitselaar <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 678d79b)
Signed-off-by: Koba Ko <[email protected]>
Add arm_smmu_set_pasid()/arm_smmu_remove_pasid() which are to be used by
callers that already constructed the arm_smmu_cd they wish to program.

These functions will encapsulate the shared logic to setup a CD entry that
will be shared by SVA and S1 domain cases.

Prior fixes had already moved most of this logic up into
__arm_smmu_sva_bind(), move it to it's final home.

Following patches will relieve some of the remaining SVA restrictions:

 - The RID domain is a S1 domain and has already setup the STE to point to
   the CD table
 - The programmed PASID is the mm_get_enqcmd_pasid()
 - Nothing changes while SVA is running (sva_enable)

SVA invalidation will still iterate over the S1 domain's master list,
later patches will resolve that.

Tested-by: Nicolin Chen <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Reviewed-by: Jerry Snitselaar <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 85f2fb6)
Signed-off-by: Koba Ko <[email protected]>
The next patch will need to store the same master twice (with different
SSIDs), so allocate memory for each list element.

Tested-by: Nicolin Chen <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Reviewed-by: Michael Shavit <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Reviewed-by: Jerry Snitselaar <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit ad10dce)
Signed-off-by: Koba Ko <[email protected]>
The core code allows the domain to be changed on the fly without a forced
stop in BLOCKED/IDENTITY. In this flow the driver should just continually
maintain the ATS with no change while the STE is updated.

ATS relies on a linked list smmu_domain->devices to keep track of which
masters have the domain programmed, but this list is also used by
arm_smmu_share_asid(), unrelated to ats.

Create two new functions to encapsulate this combined logic:
 arm_smmu_attach_prepare()
 <caller generates and sets the STE>
 arm_smmu_attach_commit()

The two functions can sequence both enabling ATS and disabling across
the STE store. Have every update of the STE use this sequence.

Installing a S1/S2 domain always enables the ATS if the PCIe device
supports it.

The enable flow is now ordered differently to allow it to be hitless:

 1) Add the master to the new smmu_domain->devices list
 2) Program the STE
 3) Enable ATS at PCIe
 4) Remove the master from the old smmu_domain

This flow ensures that invalidations to either domain will generate an ATC
invalidation to the device while the STE is being switched. Thus we don't
need to turn off the ATS anymore for correctness.

The disable flow is the reverse:
 1) Disable ATS at PCIe
 2) Program the STE
 3) Invalidate the ATC
 4) Remove the master from the old smmu_domain

Move the nr_ats_masters adjustments to be close to the list
manipulations. It is a count of the number of ATS enabled masters
currently in the list. This is stricly before and after the STE/CD are
revised, and done under the list's spin_lock.

This is part of the bigger picture to allow changing the RID domain while
a PASID is in use. If a SVA PASID is relying on ATS to function then
changing the RID domain cannot just temporarily toggle ATS off without
also wrecking the SVA PASID. The new infrastructure here is organized so
that the PASID attach/detach flows will make use of it as well in
following patches.

Tested-by: Nicolin Chen <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Reviewed-by: Michael Shavit <[email protected]>
Reviewed-by: Jerry Snitselaar <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 7497f42)
Signed-off-by: Koba Ko <[email protected]>
Prepare to allow a S1 domain to be attached to a PASID as well. Keep track
of the SSID the domain is using on each master in the
arm_smmu_master_domain.

Tested-by: Nicolin Chen <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Reviewed-by: Michael Shavit <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Reviewed-by: Jerry Snitselaar <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 64efb3d)
Signed-off-by: Koba Ko <[email protected]>
We no longer need a master->sva_enable to control what attaches are
allowed. Instead we can tell if the attach is legal based on the current
configuration of the master.

Keep track of the number of valid CD entries for SSID's in the cd_table
and if the cd_table has been installed in the STE directly so we know what
the configuration is.

The attach logic is then made into:
 - SVA bind, check if the CD is installed
 - RID attach of S2, block if SSIDs are used
 - RID attach of IDENTITY/BLOCKING, block if SSIDs are used

arm_smmu_set_pasid() is already checking if it is possible to setup a CD
entry, at this patch it means the RID path already set a STE pointing at
the CD table.

Tested-by: Nicolin Chen <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Reviewed-by: Jerry Snitselaar <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit be7c90d)
Signed-off-by: Koba Ko <[email protected]>
Allow creating and managing arm_smmu_mater_domain's with a non-zero SSID
through the arm_smmu_attach_*() family of functions. This triggers ATC
invalidation for the correct SSID in PASID cases and tracks the
per-attachment SSID in the struct arm_smmu_master_domain.

Generalize arm_smmu_attach_remove() to be able to remove SSID's as well by
ensuring the ATC for the PASID is flushed properly.

Tested-by: Nicolin Chen <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Reviewed-by: Jerry Snitselaar <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 1d5f34f)
Signed-off-by: Koba Ko <[email protected]>
Currently the SVA domain is a naked struct iommu_domain, allocate a struct
arm_smmu_domain instead.

This is necessary to be able to use the struct arm_master_domain
mechanism.

Tested-by: Nicolin Chen <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Reviewed-by: Michael Shavit <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Reviewed-by: Jerry Snitselaar <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit d7b2d2b)
Signed-off-by: Koba Ko <[email protected]>
Fill in the smmu_domain->devices list in the new struct arm_smmu_domain
that SVA allocates. Keep track of every SSID and master that is using the
domain reusing the logic for the RID attach.

This is the first step to making the SVA invalidation follow the same
design as S1/S2 invalidation. At present nothing will read this list.

Tested-by: Nicolin Chen <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Reviewed-by: Jerry Snitselaar <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 49db2ed)
Signed-off-by: Koba Ko <[email protected]>
This removes all the notifier de-duplication logic in the driver and
relies on the core code to de-duplicate and allocate only one SVA domain
per mm per smmu instance. This naturally gives a 1:1 relationship between
SVA domain and mmu notifier.

It is a significant simplication of the flow, as we end up with a single
struct arm_smmu_domain for each MM and the invalidation can then be
shifted to properly use the masters list like S1/S2 do.

Remove all of the previous mmu_notifier, bond, shared cd, and cd refcount
logic entirely.

The logic here is tightly wound together with the unusued BTM
support. Since the BTM logic requires holding all the iommu_domains in a
global ASID xarray it conflicts with the design to have a single SVA
domain per PASID, as multiple SMMU instances will need to have different
domains.

Following patches resolve this by making the ASID xarray per-instance
instead of global. However, converting the BTM code over to this
methodology requires many changes.

Thus, since ARM_SMMU_FEAT_BTM is never enabled, remove the parts of the
BTM support for ASID sharing that interact with SVA as well.

A followup series is already working on fully enabling the BTM support,
that requires iommufd's VIOMMU feature to bring in the KVM's VMID as
well. It will come with an already written patch to bring back the ASID
sharing using a per-instance ASID xarray.

https://lore.kernel.org/linux-iommu/[email protected]/
https://lore.kernel.org/linux-iommu/[email protected]/

Tested-by: Nicolin Chen <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Reviewed-by: Michael Shavit <[email protected]>
Reviewed-by: Jerry Snitselaar <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit d38c28d)
Signed-off-by: Koba Ko <[email protected]>
The HW supports this, use the S1DSS bits to configure the behavior
of SSID=0 which is the RID's translation.

If SSID's are currently being used in the CD table then just update the
S1DSS bits in the STE, remove the master_domain and leave ATS alone.

For iommufd the driver design has a small problem that all the unused CD
table entries are set with V=0 which will generate an event if VFIO
userspace tries to use the CD entry. This patch extends this problem to
include the RID as well if PASID is being used.

For BLOCKED with used PASIDs the
F_STREAM_DISABLED (STRTAB_STE_1_S1DSS_TERMINATE) event is generated on
untagged traffic and a substream CD table entry with V=0 (removed pasid)
will generate C_BAD_CD. Arguably there is no advantage to using S1DSS over
the CD entry 0 with V=0.

As we don't yet support PASID in iommufd this is a problem to resolve
later, possibly by using EPD0 for unused CD table entries instead of V=0,
and not using S1DSS for BLOCKED.

Tested-by: Nicolin Chen <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Reviewed-by: Jerry Snitselaar <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit ce26ea9)
Signed-off-by: Koba Ko <[email protected]>
S1DSS brings in quite a few new transition pairs that are
interesting. Test to/from S1DSS_BYPASS <-> S1DSS_SSID0, and
BYPASS <-> S1DSS_SSID0.

Test a contrived non-hitless flow to make sure that the logic works.

Tested-by: Nicolin Chen <[email protected]>
Signed-off-by: Michael Shavit <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Reviewed-by: Jerry Snitselaar <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 3b5302c)
Signed-off-by: Koba Ko <[email protected]>
If the STE doesn't point to the CD table we can upgrade it by
reprogramming the STE with the appropriate S1DSS. We may also need to turn
on ATS at the same time.

Keep track if the installed STE is pointing at the cd_table and the ATS
state to trigger this path.

Tested-by: Nicolin Chen <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Reviewed-by: Jerry Snitselaar <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 8ee9175)
Signed-off-by: Koba Ko <[email protected]>
The SVA cleanup made the SSID logic entirely general so all we need to do
is call it with the correct cd table entry for a S1 domain.

This is slightly tricky because of the ASID and how the locking works, the
simple fix is to just update the ASID once we get the right locks.

Tested-by: Nicolin Chen <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Reviewed-by: Jerry Snitselaar <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit f3b273b)
Signed-off-by: Koba Ko <[email protected]>
Add a comprehensive kernel configuration file for Linux 6.6.63 targeting
ARM64 architecture with 64k page size. This configuration enables essential
features for NVIDIA ARM64 platforms including:

- 64k page size configuration (CONFIG_ARM64_64K_PAGES)
- Full ARM64 architecture support with NUMA balancing
- BPF subsystem with JIT compilation
- Control groups (cgroups) support for resource management
- Memory controller with swap support
- CPU isolation and RCU subsystem configuration
- Process accounting and task statistics
- Standard kernel debugging and security features

The configuration is built with GCC 12.2.0 and includes the build salt
"6.6.y.bsk.z-64k-arm64" to identify this specific kernel build variant.

This configuration serves as the baseline for NVIDIA ARM64 kernel builds
on the linux-nvidia-6.6 branch.

Signed-off-by: Koba Ko <[email protected]>
This function returns NULL on errors, not ERR_PTR.

Fixes: 1c68cbc ("iommu: Add IOMMU_DOMAIN_PLATFORM")
Reported-by: Dan Carpenter <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Joerg Roedel <[email protected]>
(cherry picked from commit b85b4f3)
Signed-off-by: Koba Ko <[email protected]>
In the review for iommu_copy_struct_to_user() helper, Matt pointed out that
a NULL pointer should be rejected prior to dereferencing it:
https://lore.kernel.org/all/[email protected]

And Alok pointed out a typo at the same time:
https://lore.kernel.org/all/[email protected]

Since both issues were copied from iommu_copy_struct_from_user(), fix them
first in the current header.

Fixes: e9d36c0 ("iommu: Add iommu_copy_struct_from_user helper")
Cc: [email protected]
Signed-off-by: Nicolin Chen <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Alok Tiwari <[email protected]>
Reviewed-by: Matthew R. Ochs <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Joerg Roedel <[email protected]>
(cherry picked from commit 30a3f2f)
Signed-off-by: Koba Ko <[email protected]>
Commit bcb81ac ("iommu: Get DT/ACPI parsing into the proper probe
path") changed the sequence of probing the SYSMMU controller devices and
calls to arm_iommu_attach_device(), what results in resuming SYSMMU
controller earlier, when it is still set to IDENTITY mapping. Such change
revealed the bug in IDENTITY handling in the exynos-iommu driver. When
SYSMMU controller is set to IDENTITY mapping, data->domain is NULL, so
adjust checks in suspend & resume callbacks to handle this case
correctly.

Fixes: b3d1496 ("iommu/exynos: Implement an IDENTITY domain")
Signed-off-by: Marek Szyprowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Joerg Roedel <[email protected]>
(cherry picked from commit 99deffc)
Signed-off-by: Koba Ko <[email protected]>
It turns out kconfig has problems ensuring the SMMU module and the KUNIT
module are consistently y/m to allow linking. It will permit KUNIT to be a
module while SMMU is built in.

Also, Fedora apparently enables kunit on production kernels.

So, put the entire kunit in its own module using the
VISIBLE_IF_KUNIT/EXPORT_SYMBOL_IF_KUNIT machinery. This keeps it out of
vmlinus on Fedora and makes the kconfig work in the normal way. There is
no cost if kunit is disabled.

Fixes: 56e1a4c ("iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry")
Reported-by: Thorsten Leemhuis <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Jason Gunthorpe <[email protected]>
Tested-by: Thorsten Leemhuis <[email protected]>
Acked-by: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Joerg Roedel <[email protected]>
(cherry picked from commit da55da5)
Signed-off-by: Koba Ko <[email protected]>
Previously with tegra-smmu, even with CONFIG_IOMMU_DMA, the default domain
could have been left as NULL. The NULL domain is specially recognized by
host1x_iommu_attach() as meaning it is not the DMA domain and
should be replaced with the special shared domain.

This happened prior to the below commit because tegra-smmu was using the
NULL domain to mean IDENTITY.

Now that the domain is properly labled the test in DRM doesn't see NULL.
Check for IDENTITY as well to enable the special domains.

This is the same issue and basic fix as seen in
commit fae6e66 ("drm/tegra: Do not assume that a NULL domain means no
DMA IOMMU").

Fixes: c8cc265 ("iommu/tegra-smmu: Implement an IDENTITY domain")
Reported-by: Diogo Ivo <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Tested-by: Diogo Ivo <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit cb83f4b)
Signed-off-by: Koba Ko <[email protected]>
The commit 2ad56ef ("powerpc/iommu: Setup a default domain and
remove set_platform_dma_ops") refactored the code removing the
set_platform_dma_ops(). It missed out the table group
release_ownership() call which would have got called otherwise
during the guest shutdown via vfio_group_detach_container(). On
PPC64, this particular call actually sets up the 32-bit TCE table,
and enables the 64-bit DMA bypass etc. Now after guest shutdown,
the subsequent host driver (e.g megaraid-sas) probe post unbind
from vfio-pci fails like,

megaraid_sas 0031:01:00.0: Warning: IOMMU dma not supported: mask 0x7fffffffffffffff, table unavailable
megaraid_sas 0031:01:00.0: Warning: IOMMU dma not supported: mask 0xffffffff, table unavailable
megaraid_sas 0031:01:00.0: Failed to set DMA mask
megaraid_sas 0031:01:00.0: Failed from megasas_init_fw 6539

The patch brings back the call to table_group release_ownership()
call when switching back to PLATFORM domain from BLOCKED, while
also separates the domain_ops for both.

Fixes: 2ad56ef ("powerpc/iommu: Setup a default domain and remove set_platform_dma_ops")
Signed-off-by: Shivaprasad G Bhat <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/170628173462.3742.18330000394415935845.stgit@ltcd48-lp2.aus.stglab.ibm.com
Signed-off-by: Joerg Roedel <[email protected]>
(cherry picked from commit d2d00e1)
Signed-off-by: Koba Ko <[email protected]>
The rewind routine should remove the reserved iovas added to the new hwpt.

Fixes: 89db316 ("iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable")
Cc: [email protected]
Link: https://patch.msgid.link/r/[email protected]
Signed-off-by: Nicolin Chen <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
(cherry picked from commit 950aeef)
Signed-off-by: Koba Ko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.