Skip to content

Conversation

@RaviHothi
Copy link

This patch enhances the q6dsp-lpass-ports driver by enabling support for a wider range of PCM sampling rates, from 8000 Hz to 192000 Hz, improving compatibility with diverse audio hardware. It also adds support for 32-bit PCM formats (SNDRV_PCM_FMTBIT_S32_LE) alongside existing 16-bit and 24-bit formats, allowing better handling of high-resolution audio streams.

Copy link

@Srinivas-Kandagatla Srinivas-Kandagatla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the patch looks good to be however it needs to be split into two patches rather than mixing both I2S format changes and rate changes.

.formats = SNDRV_PCM_FMTBIT_S16_LE |
SNDRV_PCM_FMTBIT_S24_LE,
SNDRV_PCM_FMTBIT_S24_LE |
SNDRV_PCM_FMTBIT_S32_LE,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All fomat changes should go in a different patch.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Split the changes into two separate commits.

Extend the q6dsp-lpass-ports driver to support a wider range of sampling
rates, from 8000 Hz to 192000 Hz. This improves compatibility with
diverse audio hardware and enables better flexibility for audio stream
configurations.

Signed-off-by: Ravi Hothi <ravi.hothi@oss.qualcomm.com>
Introduce support for 32-bit PCM format (SNDRV_PCM_FMTBIT_S32_LE) in
addition to existing 16-bit and 24-bit formats. This allows handling
high-resolution audio streams and improves audio quality for supported
hardware.

Signed-off-by: Ravi Hothi <ravi.hothi@oss.qualcomm.com>
Copy link

@Srinivas-Kandagatla Srinivas-Kandagatla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

Copy link

@Srinivas-Kandagatla Srinivas-Kandagatla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

nandamajay pushed a commit that referenced this pull request Dec 1, 2025
On completion of i915_vma_pin_ww(), a synchronous variant of
dma_fence_work_commit() is called.  When pinning a VMA to GGTT address
space on a Cherry View family processor, or on a Broxton generation SoC
with VTD enabled, i.e., when stop_machine() is then called from
intel_ggtt_bind_vma(), that can potentially lead to lock inversion among
reservation_ww and cpu_hotplug locks.

[86.861179] ======================================================
[86.861193] WARNING: possible circular locking dependency detected
[86.861209] 6.15.0-rc5-CI_DRM_16515-gca0305cadc2d+ #1 Tainted: G     U
[86.861226] ------------------------------------------------------
[86.861238] i915_module_loa/1432 is trying to acquire lock:
[86.861252] ffffffff83489090 (cpu_hotplug_lock){++++}-{0:0}, at: stop_machine+0x1c/0x50
[86.861290]
but task is already holding lock:
[86.861303] ffffc90002e0b4c8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_vma_pin.constprop.0+0x39/0x1d0 [i915]
[86.862233]
which lock already depends on the new lock.
[86.862251]
the existing dependency chain (in reverse order) is:
[86.862265]
-> #5 (reservation_ww_class_mutex){+.+.}-{3:3}:
[86.862292]        dma_resv_lockdep+0x19a/0x390
[86.862315]        do_one_initcall+0x60/0x3f0
[86.862334]        kernel_init_freeable+0x3cd/0x680
[86.862353]        kernel_init+0x1b/0x200
[86.862369]        ret_from_fork+0x47/0x70
[86.862383]        ret_from_fork_asm+0x1a/0x30
[86.862399]
-> #4 (reservation_ww_class_acquire){+.+.}-{0:0}:
[86.862425]        dma_resv_lockdep+0x178/0x390
[86.862440]        do_one_initcall+0x60/0x3f0
[86.862454]        kernel_init_freeable+0x3cd/0x680
[86.862470]        kernel_init+0x1b/0x200
[86.862482]        ret_from_fork+0x47/0x70
[86.862495]        ret_from_fork_asm+0x1a/0x30
[86.862509]
-> #3 (&mm->mmap_lock){++++}-{3:3}:
[86.862531]        down_read_killable+0x46/0x1e0
[86.862546]        lock_mm_and_find_vma+0xa2/0x280
[86.862561]        do_user_addr_fault+0x266/0x8e0
[86.862578]        exc_page_fault+0x8a/0x2f0
[86.862593]        asm_exc_page_fault+0x27/0x30
[86.862607]        filldir64+0xeb/0x180
[86.862620]        kernfs_fop_readdir+0x118/0x480
[86.862635]        iterate_dir+0xcf/0x2b0
[86.862648]        __x64_sys_getdents64+0x84/0x140
[86.862661]        x64_sys_call+0x1058/0x2660
[86.862675]        do_syscall_64+0x91/0xe90
[86.862689]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[86.862703]
-> #2 (&root->kernfs_rwsem){++++}-{3:3}:
[86.862725]        down_write+0x3e/0xf0
[86.862738]        kernfs_add_one+0x30/0x3c0
[86.862751]        kernfs_create_dir_ns+0x53/0xb0
[86.862765]        internal_create_group+0x134/0x4c0
[86.862779]        sysfs_create_group+0x13/0x20
[86.862792]        topology_add_dev+0x1d/0x30
[86.862806]        cpuhp_invoke_callback+0x4b5/0x850
[86.862822]        cpuhp_issue_call+0xbf/0x1f0
[86.862836]        __cpuhp_setup_state_cpuslocked+0x111/0x320
[86.862852]        __cpuhp_setup_state+0xb0/0x220
[86.862866]        topology_sysfs_init+0x30/0x50
[86.862879]        do_one_initcall+0x60/0x3f0
[86.862893]        kernel_init_freeable+0x3cd/0x680
[86.862908]        kernel_init+0x1b/0x200
[86.862921]        ret_from_fork+0x47/0x70
[86.862934]        ret_from_fork_asm+0x1a/0x30
[86.862947]
-> #1 (cpuhp_state_mutex){+.+.}-{3:3}:
[86.862969]        __mutex_lock+0xaa/0xed0
[86.862982]        mutex_lock_nested+0x1b/0x30
[86.862995]        __cpuhp_setup_state_cpuslocked+0x67/0x320
[86.863012]        __cpuhp_setup_state+0xb0/0x220
[86.863026]        page_alloc_init_cpuhp+0x2d/0x60
[86.863041]        mm_core_init+0x22/0x2d0
[86.863054]        start_kernel+0x576/0xbd0
[86.863068]        x86_64_start_reservations+0x18/0x30
[86.863084]        x86_64_start_kernel+0xbf/0x110
[86.863098]        common_startup_64+0x13e/0x141
[86.863114]
-> #0 (cpu_hotplug_lock){++++}-{0:0}:
[86.863135]        __lock_acquire+0x1635/0x2810
[86.863152]        lock_acquire+0xc4/0x2f0
[86.863166]        cpus_read_lock+0x41/0x100
[86.863180]        stop_machine+0x1c/0x50
[86.863194]        bxt_vtd_ggtt_insert_entries__BKL+0x3b/0x60 [i915]
[86.863987]        intel_ggtt_bind_vma+0x43/0x70 [i915]
[86.864735]        __vma_bind+0x55/0x70 [i915]
[86.865510]        fence_work+0x26/0xa0 [i915]
[86.866248]        fence_notify+0xa1/0x140 [i915]
[86.866983]        __i915_sw_fence_complete+0x8f/0x270 [i915]
[86.867719]        i915_sw_fence_commit+0x39/0x60 [i915]
[86.868453]        i915_vma_pin_ww+0x462/0x1360 [i915]
[86.869228]        i915_vma_pin.constprop.0+0x133/0x1d0 [i915]
[86.870001]        initial_plane_vma+0x307/0x840 [i915]
[86.870774]        intel_initial_plane_config+0x33f/0x670 [i915]
[86.871546]        intel_display_driver_probe_nogem+0x1c6/0x260 [i915]
[86.872330]        i915_driver_probe+0x7fa/0xe80 [i915]
[86.873057]        i915_pci_probe+0xe6/0x220 [i915]
[86.873782]        local_pci_probe+0x47/0xb0
[86.873802]        pci_device_probe+0xf3/0x260
[86.873817]        really_probe+0xf1/0x3c0
[86.873833]        __driver_probe_device+0x8c/0x180
[86.873848]        driver_probe_device+0x24/0xd0
[86.873862]        __driver_attach+0x10f/0x220
[86.873876]        bus_for_each_dev+0x7f/0xe0
[86.873892]        driver_attach+0x1e/0x30
[86.873904]        bus_add_driver+0x151/0x290
[86.873917]        driver_register+0x5e/0x130
[86.873931]        __pci_register_driver+0x7d/0x90
[86.873945]        i915_pci_register_driver+0x23/0x30 [i915]
[86.874678]        i915_init+0x37/0x120 [i915]
[86.875347]        do_one_initcall+0x60/0x3f0
[86.875369]        do_init_module+0x97/0x2a0
[86.875385]        load_module+0x2c54/0x2d80
[86.875398]        init_module_from_file+0x96/0xe0
[86.875413]        idempotent_init_module+0x117/0x330
[86.875426]        __x64_sys_finit_module+0x77/0x100
[86.875440]        x64_sys_call+0x24de/0x2660
[86.875454]        do_syscall_64+0x91/0xe90
[86.875470]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[86.875486]
other info that might help us debug this:
[86.875502] Chain exists of:
  cpu_hotplug_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex
[86.875539]  Possible unsafe locking scenario:
[86.875552]        CPU0                    CPU1
[86.875563]        ----                    ----
[86.875573]   lock(reservation_ww_class_mutex);
[86.875588]                                lock(reservation_ww_class_acquire);
[86.875606]                                lock(reservation_ww_class_mutex);
[86.875624]   rlock(cpu_hotplug_lock);
[86.875637]
 *** DEADLOCK ***
[86.875650] 3 locks held by i915_module_loa/1432:
[86.875663]  #0: ffff888101f5c1b0 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x104/0x220
[86.875699]  #1: ffffc90002e0b4a0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_vma_pin.constprop.0+0x39/0x1d0 [i915]
[86.876512]  #2: ffffc90002e0b4c8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_vma_pin.constprop.0+0x39/0x1d0 [i915]
[86.877305]
stack backtrace:
[86.877326] CPU: 0 UID: 0 PID: 1432 Comm: i915_module_loa Tainted: G     U              6.15.0-rc5-CI_DRM_16515-gca0305cadc2d+ #1 PREEMPT(voluntary)
[86.877334] Tainted: [U]=USER
[86.877336] Hardware name:  /NUC5CPYB, BIOS PYBSWCEL.86A.0079.2020.0420.1316 04/20/2020
[86.877339] Call Trace:
[86.877344]  <TASK>
[86.877353]  dump_stack_lvl+0x91/0xf0
[86.877364]  dump_stack+0x10/0x20
[86.877369]  print_circular_bug+0x285/0x360
[86.877379]  check_noncircular+0x135/0x150
[86.877390]  __lock_acquire+0x1635/0x2810
[86.877403]  lock_acquire+0xc4/0x2f0
[86.877408]  ? stop_machine+0x1c/0x50
[86.877422]  ? __pfx_bxt_vtd_ggtt_insert_entries__cb+0x10/0x10 [i915]
[86.878173]  cpus_read_lock+0x41/0x100
[86.878182]  ? stop_machine+0x1c/0x50
[86.878191]  ? __pfx_bxt_vtd_ggtt_insert_entries__cb+0x10/0x10 [i915]
[86.878916]  stop_machine+0x1c/0x50
[86.878927]  bxt_vtd_ggtt_insert_entries__BKL+0x3b/0x60 [i915]
[86.879652]  intel_ggtt_bind_vma+0x43/0x70 [i915]
[86.880375]  __vma_bind+0x55/0x70 [i915]
[86.881133]  fence_work+0x26/0xa0 [i915]
[86.881851]  fence_notify+0xa1/0x140 [i915]
[86.882566]  __i915_sw_fence_complete+0x8f/0x270 [i915]
[86.883286]  i915_sw_fence_commit+0x39/0x60 [i915]
[86.884003]  i915_vma_pin_ww+0x462/0x1360 [i915]
[86.884756]  ? i915_vma_pin.constprop.0+0x6c/0x1d0 [i915]
[86.885513]  i915_vma_pin.constprop.0+0x133/0x1d0 [i915]
[86.886281]  initial_plane_vma+0x307/0x840 [i915]
[86.887049]  intel_initial_plane_config+0x33f/0x670 [i915]
[86.887819]  intel_display_driver_probe_nogem+0x1c6/0x260 [i915]
[86.888587]  i915_driver_probe+0x7fa/0xe80 [i915]
[86.889293]  ? mutex_unlock+0x12/0x20
[86.889301]  ? drm_privacy_screen_get+0x171/0x190
[86.889308]  ? acpi_dev_found+0x66/0x80
[86.889321]  i915_pci_probe+0xe6/0x220 [i915]
[86.890038]  local_pci_probe+0x47/0xb0
[86.890049]  pci_device_probe+0xf3/0x260
[86.890058]  really_probe+0xf1/0x3c0
[86.890067]  __driver_probe_device+0x8c/0x180
[86.890072]  driver_probe_device+0x24/0xd0
[86.890078]  __driver_attach+0x10f/0x220
[86.890083]  ? __pfx___driver_attach+0x10/0x10
[86.890088]  bus_for_each_dev+0x7f/0xe0
[86.890097]  driver_attach+0x1e/0x30
[86.890101]  bus_add_driver+0x151/0x290
[86.890107]  driver_register+0x5e/0x130
[86.890113]  __pci_register_driver+0x7d/0x90
[86.890119]  i915_pci_register_driver+0x23/0x30 [i915]
[86.890833]  i915_init+0x37/0x120 [i915]
[86.891482]  ? __pfx_i915_init+0x10/0x10 [i915]
[86.892135]  do_one_initcall+0x60/0x3f0
[86.892145]  ? __kmalloc_cache_noprof+0x33f/0x470
[86.892157]  do_init_module+0x97/0x2a0
[86.892164]  load_module+0x2c54/0x2d80
[86.892168]  ? __kernel_read+0x15c/0x300
[86.892185]  ? kernel_read_file+0x2b1/0x320
[86.892195]  init_module_from_file+0x96/0xe0
[86.892199]  ? init_module_from_file+0x96/0xe0
[86.892211]  idempotent_init_module+0x117/0x330
[86.892224]  __x64_sys_finit_module+0x77/0x100
[86.892230]  x64_sys_call+0x24de/0x2660
[86.892236]  do_syscall_64+0x91/0xe90
[86.892243]  ? irqentry_exit+0x77/0xb0
[86.892249]  ? sysvec_apic_timer_interrupt+0x57/0xc0
[86.892256]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[86.892261] RIP: 0033:0x7303e1b2725d
[86.892271] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48
[86.892276] RSP: 002b:00007ffddd1fdb38 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[86.892281] RAX: ffffffffffffffda RBX: 00005d771d88fd90 RCX: 00007303e1b2725d
[86.892285] RDX: 0000000000000000 RSI: 00005d771d893aa0 RDI: 000000000000000c
[86.892287] RBP: 00007ffddd1fdbf0 R08: 0000000000000040 R09: 00007ffddd1fdb80
[86.892289] R10: 00007303e1c03b20 R11: 0000000000000246 R12: 00005d771d893aa0
[86.892292] R13: 0000000000000000 R14: 00005d771d88f0d0 R15: 00005d771d895710
[86.892304]  </TASK>

Call asynchronous variant of dma_fence_work_commit() in that case.

v3: Provide more verbose in-line comment (Andi),
  - mention target environments in commit message.

Fixes: 7d1c261 ("drm/i915: Take reservation lock around i915_vma_pin.")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14985
Cc: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com>
Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com>
Acked-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://lore.kernel.org/r/20251023082925.351307-6-janusz.krzysztofik@linux.intel.com
(cherry picked from commit 648ef13)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
nandamajay pushed a commit that referenced this pull request Dec 1, 2025
When a connector is connected but inactive (e.g., disabled by desktop
environments), pipe_ctx->stream_res.tg will be destroyed. Then, reading
odm_combine_segments causes kernel NULL pointer dereference.

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP NOPTI
 CPU: 16 UID: 0 PID: 26474 Comm: cat Not tainted 6.17.0+ #2 PREEMPT(lazy)  e6a17af9ee6db7c63e9d90dbe5b28ccab67520c6
 Hardware name: LENOVO 21Q4/LNVNB161216, BIOS PXCN25WW 03/27/2025
 RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu]
 Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00>
 RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286
 RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8
 RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000
 RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0
 R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08
 R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001
 FS:  00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0
 PKRU: 55555554
 Call Trace:
  <TASK>
  seq_read_iter+0x125/0x490
  ? __alloc_frozen_pages_noprof+0x18f/0x350
  seq_read+0x12c/0x170
  full_proxy_read+0x51/0x80
  vfs_read+0xbc/0x390
  ? __handle_mm_fault+0xa46/0xef0
  ? do_syscall_64+0x71/0x900
  ksys_read+0x73/0xf0
  do_syscall_64+0x71/0x900
  ? count_memcg_events+0xc2/0x190
  ? handle_mm_fault+0x1d7/0x2d0
  ? do_user_addr_fault+0x21a/0x690
  ? exc_page_fault+0x7e/0x1a0
  entry_SYSCALL_64_after_hwframe+0x6c/0x74
 RIP: 0033:0x7f44d4031687
 Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00>
 RSP: 002b:00007ffdb4b5f0b0 EFLAGS: 00000202 ORIG_RAX: 0000000000000000
 RAX: ffffffffffffffda RBX: 00007f44d3f9f740 RCX: 00007f44d4031687
 RDX: 0000000000040000 RSI: 00007f44d3f5e000 RDI: 0000000000000003
 RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000202 R12: 00007f44d3f5e000
 R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000040000
  </TASK>
 Modules linked in: tls tcp_diag inet_diag xt_mark ccm snd_hrtimer snd_seq_dummy snd_seq_midi snd_seq_oss snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device x>
  snd_hda_codec_atihdmi snd_hda_codec_realtek_lib lenovo_wmi_helpers think_lmi snd_hda_codec_generic snd_hda_codec_hdmi snd_soc_core kvm snd_compress uvcvideo sn>
  platform_profile joydev amd_pmc mousedev mac_hid sch_fq_codel uinput i2c_dev parport_pc ppdev lp parport nvme_fabrics loop nfnetlink ip_tables x_tables dm_cryp>
 CR2: 0000000000000000
 ---[ end trace 0000000000000000 ]---
 RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu]
 Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00>
 RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286
 RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8
 RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000
 RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0
 R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08
 R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001
 FS:  00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0
 PKRU: 55555554

Fix this by checking pipe_ctx->stream_res.tg before dereferencing.

Fixes: 07926ba ("drm/amd/display: Add debugfs interface for ODM combine info")
Signed-off-by: Rong Zhang <i@rong.moe>
Reviewed-by: Mario Limoncello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f19bbec)
Cc: stable@vger.kernel.org
nandamajay pushed a commit that referenced this pull request Dec 1, 2025
Michael Chan says:

====================
bnxt_en: Bug fixes

Patches 1, 3, and 4 are bug fixes related to the FW log tracing driver
coredump feature recently added in 6.13.  Patch #1 adds the necessary
call to shutdown the FW logging DMA during PCI shutdown.  Patch #3 fixes
a possible null pointer derefernce when using early versions of the FW
with this feature.  Patch #4 adds the coredump header information
unconditionally to make it more robust.

Patch #2 fixes a possible memory leak during PTP shutdown.  Patch #5
eliminates a dmesg warning when doing devlink reload.
====================

Link: https://patch.msgid.link/20251104005700.542174-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
nandamajay pushed a commit that referenced this pull request Dec 1, 2025
 into HEAD

KVM/riscv fixes for 6.18, take #2

- Fix check for local interrupts on riscv32
- Read HGEIP CSR on the correct cpu when checking for IMSIC interrupts
- Remove automatic I/O mapping from kvm_arch_prepare_memory_region()
nandamajay pushed a commit that referenced this pull request Dec 1, 2025
…/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm654 fixes for 6.18, take #2

* Core fixes

  - Fix trapping regression when no in-kernel irqchip is present
    (20251021094358.1963807-1-sascha.bischoff@arm.com)

  - Check host-provided, untrusted ranges and offsets in pKVM
    (20251016164541.3771235-1-vdonnefort@google.com)
    (20251017075710.2605118-1-sebastianene@google.com)

  - Fix regression restoring the ID_PFR1_EL1 register
    (20251030122707.2033690-1-maz@kernel.org

  - Fix vgic ITS locking issues when LPIs are not directly injected
    (20251107184847.1784820-1-oupton@kernel.org)

* Test fixes

  - Correct target CPU programming in vgic_lpi_stress selftest
    (20251020145946.48288-1-mdittgen@amazon.de)

  - Fix exposure of SCTLR2_EL2 and ZCR_EL2 in get-reg-list selftest
    (20251023-b4-kvm-arm64-get-reg-list-sctlr-el2-v1-1-088f88ff992a@kernel.org)
    (20251024-kvm-arm64-get-reg-list-zcr-el2-v1-1-0cd0ff75e22f@kernel.org)

* Misc

  - Update Oliver's email address
    (20251107012830.1708225-1-oupton@kernel.org)
nandamajay pushed a commit that referenced this pull request Dec 1, 2025
When freeing indexed arrays, the corresponding free function should
be called for each entry of the indexed array. For example, for
for 'struct tc_act_attrs' 'tc_act_attrs_free(...)' needs to be called
for each entry.

Previously, memory leaks were reported when enabling the ASAN
analyzer.

=================================================================
==874==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0x7f221fd20cb5 in malloc ./debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67
    #1 0x55c98db048af in tc_act_attrs_set_options_vlan_parms ../generated/tc-user.h:2813
    #2 0x55c98db048af in main  ./linux/tools/net/ynl/samples/tc-filter-add.c:71

Direct leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0x7f221fd20cb5 in malloc ./debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67
    #1 0x55c98db04a93 in tc_act_attrs_set_options_vlan_parms ../generated/tc-user.h:2813
    #2 0x55c98db04a93 in main ./linux/tools/net/ynl/samples/tc-filter-add.c:74

Direct leak of 10 byte(s) in 2 object(s) allocated from:
    #0 0x7f221fd20cb5 in malloc ./debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67
    #1 0x55c98db0527d in tc_act_attrs_set_kind ../generated/tc-user.h:1622

SUMMARY: AddressSanitizer: 58 byte(s) leaked in 4 allocation(s).

The following diff illustrates the changes introduced compared to the
previous version of the code.

 void tc_flower_attrs_free(struct tc_flower_attrs *obj)
 {
+	unsigned int i;
+
 	free(obj->indev);
+	for (i = 0; i < obj->_count.act; i++)
+		tc_act_attrs_free(&obj->act[i]);
 	free(obj->act);
 	free(obj->key_eth_dst);
 	free(obj->key_eth_dst_mask);

Signed-off-by: Zahari Doychev <zahari.doychev@linux.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20251106151529.453026-3-zahari.doychev@linux.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
nandamajay pushed a commit that referenced this pull request Dec 1, 2025
Handle skb allocation failures in RX path, to avoid NULL pointer
dereference and RX stalls under memory pressure. If the refill fails
with -ENOMEM, complete napi polling and wake up later to retry via timer.
Also explicitly re-enable RX DMA after oom, so the dmac doesn't remain
stopped in this situation.

Previously, memory pressure could lead to skb allocation failures and
subsequent Oops like:

	Oops: Kernel access of bad area, sig: 11 [#2]
	Hardware name: SonyPS3 Cell Broadband Engine 0x701000 PS3
	NIP [c0003d0000065900] gelic_net_poll+0x6c/0x2d0 [ps3_gelic] (unreliable)
	LR [c0003d00000659c4] gelic_net_poll+0x130/0x2d0 [ps3_gelic]
	Call Trace:
	  gelic_net_poll+0x130/0x2d0 [ps3_gelic] (unreliable)
	  __napi_poll+0x44/0x168
	  net_rx_action+0x178/0x290

Steps to reproduce the issue:
	1. Start a continuous network traffic, like scp of a 20GB file
	2. Inject failslab errors using the kernel fault injection:
	    echo -1 > /sys/kernel/debug/failslab/times
	    echo 30 > /sys/kernel/debug/failslab/interval
	    echo 100 > /sys/kernel/debug/failslab/probability
	3. After some time, traces start to appear, kernel Oopses
	   and the system stops

Step 2 is not always necessary, as it is usually already triggered by
the transfer of a big enough file.

Fixes: 02c1889 ("ps3: gigabit ethernet driver for PS3, take3")
Signed-off-by: Florian Fuchs <fuchsfl@gmail.com>
Link: https://patch.msgid.link/20251113181000.3914980-1-fuchsfl@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
nandamajay pushed a commit that referenced this pull request Jan 27, 2026
A race condition was found in sg_proc_debug_helper(). It was observed on
a system using an IBM LTO-9 SAS Tape Drive (ULTRIUM-TD9) and monitoring
/proc/scsi/sg/debug every second. A very large elapsed time would
sometimes appear. This is caused by two race conditions.

We reproduced the issue with an IBM ULTRIUM-HH9 tape drive on an x86_64
architecture. A patched kernel was built, and the race condition could
not be observed anymore after the application of this patch. A
reproducer C program utilising the scsi_debug module was also built by
Changhui Zhong and can be viewed here:

https://github.com/MichaelRabek/linux-tests/blob/master/drivers/scsi/sg/sg_race_trigger.c

The first race happens between the reading of hp->duration in
sg_proc_debug_helper() and request completion in sg_rq_end_io().  The
hp->duration member variable may hold either of two types of
information:

 #1 - The start time of the request. This value is present while
      the request is not yet finished.

 #2 - The total execution time of the request (end_time - start_time).

If sg_proc_debug_helper() executes *after* the value of hp->duration was
changed from #1 to #2, but *before* srp->done is set to 1 in
sg_rq_end_io(), a fresh timestamp is taken in the else branch, and the
elapsed time (value type #2) is subtracted from a timestamp, which
cannot yield a valid elapsed time (which is a type #2 value as well).

To fix this issue, the value of hp->duration must change under the
protection of the sfp->rq_list_lock in sg_rq_end_io().  Since
sg_proc_debug_helper() takes this read lock, the change to srp->done and
srp->header.duration will happen atomically from the perspective of
sg_proc_debug_helper() and the race condition is thus eliminated.

The second race condition happens between sg_proc_debug_helper() and
sg_new_write(). Even though hp->duration is set to the current time
stamp in sg_add_request() under the write lock's protection, it gets
overwritten by a call to get_sg_io_hdr(), which calls copy_from_user()
to copy struct sg_io_hdr from userspace into kernel space. hp->duration
is set to the start time again in sg_common_write(). If
sg_proc_debug_helper() is called between these two calls, an arbitrary
value set by userspace (usually zero) is used to compute the elapsed
time.

To fix this issue, hp->duration must be set to the current timestamp
again after get_sg_io_hdr() returns successfully. A small race window
still exists between get_sg_io_hdr() and setting hp->duration, but this
window is only a few instructions wide and does not result in observable
issues in practice, as confirmed by testing.

Additionally, we fix the format specifier from %d to %u for printing
unsigned int values in sg_proc_debug_helper().

Signed-off-by: Michal Rábek <mrabek@redhat.com>
Suggested-by: Tomas Henzl <thenzl@redhat.com>
Tested-by: Changhui Zhong <czhong@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Link: https://patch.msgid.link/20251212160900.64924-1-mrabek@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
nandamajay pushed a commit that referenced this pull request Jan 27, 2026
When a page is freed it coalesces with a buddy into a higher order page
while possible.  When the buddy page migrate type differs, it is expected
to be updated to match the one of the page being freed.

However, only the first pageblock of the buddy page is updated, while the
rest of the pageblocks are left unchanged.

That causes warnings in later expand() and other code paths (like below),
since an inconsistency between migration type of the list containing the
page and the page-owned pageblocks migration types is introduced.

[  308.986589] ------------[ cut here ]------------
[  308.987227] page type is 0, passed migratetype is 1 (nr=256)
[  308.987275] WARNING: CPU: 1 PID: 5224 at mm/page_alloc.c:812 expand+0x23c/0x270
[  308.987293] Modules linked in: algif_hash(E) af_alg(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nf_tables(E) s390_trng(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) vfio(E) sch_fq_codel(E) drm(E) i2c_core(E) drm_panel_orientation_quirks(E) loop(E) nfnetlink(E) vsock_loopback(E) vmw_vsock_virtio_transport_common(E) vsock(E) ctcm(E) fsm(E) diag288_wdt(E) watchdog(E) zfcp(E) scsi_transport_fc(E) ghash_s390(E) prng(E) aes_s390(E) des_generic(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha_common(E) paes_s390(E) crypto_engine(E) pkey_cca(E) pkey_ep11(E) zcrypt(E) rng_core(E) pkey_pckmo(E) pkey(E) autofs4(E)
[  308.987439] Unloaded tainted modules: hmac_s390(E):2
[  308.987650] CPU: 1 UID: 0 PID: 5224 Comm: mempig_verify Kdump: loaded Tainted: G            E       6.18.0-gcc-bpf-debug torvalds#431 PREEMPT
[  308.987657] Tainted: [E]=UNSIGNED_MODULE
[  308.987661] Hardware name: IBM 3906 M04 704 (z/VM 7.3.0)
[  308.987666] Krnl PSW : 0404f00180000000 00000349976fa600 (expand+0x240/0x270)
[  308.987676]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3
[  308.987682] Krnl GPRS: 0000034980000004 0000000000000005 0000000000000030 000003499a0e6d88
[  308.987688]            0000000000000005 0000034980000005 000002be803ac000 0000023efe6c8300
[  308.987692]            0000000000000008 0000034998d57290 000002be00000100 0000023e00000008
[  308.987696]            0000000000000000 0000000000000000 00000349976fa5fc 000002c99b1eb6f0
[  308.987708] Krnl Code: 00000349976fa5f0: c020008a02f2	larl	%r2,000003499883abd4
                          00000349976fa5f6: c0e5ffe3f4b5	brasl	%r14,0000034997378f60
                         #00000349976fa5fc: af000000		mc	0,0
                         >00000349976fa600: a7f4ff4c		brc	15,00000349976fa498
                          00000349976fa604: b9040026		lgr	%r2,%r6
                          00000349976fa608: c0300088317f	larl	%r3,0000034998800906
                          00000349976fa60e: c0e5fffdb6e1	brasl	%r14,00000349976b13d0
                          00000349976fa614: af000000		mc	0,0
[  308.987734] Call Trace:
[  308.987738]  [<00000349976fa600>] expand+0x240/0x270
[  308.987744] ([<00000349976fa5fc>] expand+0x23c/0x270)
[  308.987749]  [<00000349976ff95e>] rmqueue_bulk+0x71e/0x940
[  308.987754]  [<00000349976ffd7e>] __rmqueue_pcplist+0x1fe/0x2a0
[  308.987759]  [<0000034997700966>] rmqueue.isra.0+0xb46/0xf40
[  308.987763]  [<0000034997703ec8>] get_page_from_freelist+0x198/0x8d0
[  308.987768]  [<0000034997706fa8>] __alloc_frozen_pages_noprof+0x198/0x400
[  308.987774]  [<00000349977536f8>] alloc_pages_mpol+0xb8/0x220
[  308.987781]  [<0000034997753bf6>] folio_alloc_mpol_noprof+0x26/0xc0
[  308.987786]  [<0000034997753e4c>] vma_alloc_folio_noprof+0x6c/0xa0
[  308.987791]  [<0000034997775b22>] vma_alloc_anon_folio_pmd+0x42/0x240
[  308.987799]  [<000003499777bfea>] __do_huge_pmd_anonymous_page+0x3a/0x210
[  308.987804]  [<00000349976cb08e>] __handle_mm_fault+0x4de/0x500
[  308.987809]  [<00000349976cb14c>] handle_mm_fault+0x9c/0x3a0
[  308.987813]  [<000003499734d70e>] do_exception+0x1de/0x540
[  308.987822]  [<0000034998387390>] __do_pgm_check+0x130/0x220
[  308.987830]  [<000003499839a934>] pgm_check_handler+0x114/0x160
[  308.987838] 3 locks held by mempig_verify/5224:
[  308.987842]  #0: 0000023ea44c1e08 (vm_lock){++++}-{0:0}, at: lock_vma_under_rcu+0xb2/0x2a0
[  308.987859]  #1: 0000023ee4d41b18 (&pcp->lock){+.+.}-{2:2}, at: rmqueue.isra.0+0xad6/0xf40
[  308.987871]  #2: 0000023efe6c8998 (&zone->lock){..-.}-{2:2}, at: rmqueue_bulk+0x5a/0x940
[  308.987886] Last Breaking-Event-Address:
[  308.987890]  [<0000034997379096>] __warn_printk+0x136/0x140
[  308.987897] irq event stamp: 52330356
[  308.987901] hardirqs last  enabled at (52330355): [<000003499838742e>] __do_pgm_check+0x1ce/0x220
[  308.987907] hardirqs last disabled at (52330356): [<000003499839932e>] _raw_spin_lock_irqsave+0x9e/0xe0
[  308.987913] softirqs last  enabled at (52329882): [<0000034997383786>] handle_softirqs+0x2c6/0x530
[  308.987922] softirqs last disabled at (52329859): [<0000034997382f86>] __irq_exit_rcu+0x126/0x140
[  308.987929] ---[ end trace 0000000000000000 ]---
[  308.987936] ------------[ cut here ]------------
[  308.987940] page type is 0, passed migratetype is 1 (nr=256)
[  308.987951] WARNING: CPU: 1 PID: 5224 at mm/page_alloc.c:860 __del_page_from_free_list+0x1be/0x1e0
[  308.987960] Modules linked in: algif_hash(E) af_alg(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nf_tables(E) s390_trng(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) vfio(E) sch_fq_codel(E) drm(E) i2c_core(E) drm_panel_orientation_quirks(E) loop(E) nfnetlink(E) vsock_loopback(E) vmw_vsock_virtio_transport_common(E) vsock(E) ctcm(E) fsm(E) diag288_wdt(E) watchdog(E) zfcp(E) scsi_transport_fc(E) ghash_s390(E) prng(E) aes_s390(E) des_generic(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha_common(E) paes_s390(E) crypto_engine(E) pkey_cca(E) pkey_ep11(E) zcrypt(E) rng_core(E) pkey_pckmo(E) pkey(E) autofs4(E)
[  308.988070] Unloaded tainted modules: hmac_s390(E):2
[  308.988087] CPU: 1 UID: 0 PID: 5224 Comm: mempig_verify Kdump: loaded Tainted: G        W   E       6.18.0-gcc-bpf-debug torvalds#431 PREEMPT
[  308.988095] Tainted: [W]=WARN, [E]=UNSIGNED_MODULE
[  308.988100] Hardware name: IBM 3906 M04 704 (z/VM 7.3.0)
[  308.988105] Krnl PSW : 0404f00180000000 00000349976f9e32 (__del_page_from_free_list+0x1c2/0x1e0)
[  308.988118]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3
[  308.988127] Krnl GPRS: 0000034980000004 0000000000000005 0000000000000030 000003499a0e6d88
[  308.988133]            0000000000000005 0000034980000005 0000034998d57290 0000023efe6c8300
[  308.988139]            0000000000000001 0000000000000008 000002be00000100 000002be803ac000
[  308.988144]            0000000000000000 0000000000000001 00000349976f9e2e 000002c99b1eb728
[  308.988153] Krnl Code: 00000349976f9e22: c020008a06d9	larl	%r2,000003499883abd4
                          00000349976f9e28: c0e5ffe3f89c	brasl	%r14,0000034997378f60
                         #00000349976f9e2e: af000000		mc	0,0
                         >00000349976f9e32: a7f4ff4e		brc	15,00000349976f9cce
                          00000349976f9e36: b904002b		lgr	%r2,%r11
                          00000349976f9e3a: c030008a06e7	larl	%r3,000003499883ac08
                          00000349976f9e40: c0e5fffdbac8	brasl	%r14,00000349976b13d0
                          00000349976f9e46: af000000		mc	0,0
[  308.988184] Call Trace:
[  308.988188]  [<00000349976f9e32>] __del_page_from_free_list+0x1c2/0x1e0
[  308.988195] ([<00000349976f9e2e>] __del_page_from_free_list+0x1be/0x1e0)
[  308.988202]  [<00000349976ff946>] rmqueue_bulk+0x706/0x940
[  308.988208]  [<00000349976ffd7e>] __rmqueue_pcplist+0x1fe/0x2a0
[  308.988214]  [<0000034997700966>] rmqueue.isra.0+0xb46/0xf40
[  308.988221]  [<0000034997703ec8>] get_page_from_freelist+0x198/0x8d0
[  308.988227]  [<0000034997706fa8>] __alloc_frozen_pages_noprof+0x198/0x400
[  308.988233]  [<00000349977536f8>] alloc_pages_mpol+0xb8/0x220
[  308.988240]  [<0000034997753bf6>] folio_alloc_mpol_noprof+0x26/0xc0
[  308.988247]  [<0000034997753e4c>] vma_alloc_folio_noprof+0x6c/0xa0
[  308.988253]  [<0000034997775b22>] vma_alloc_anon_folio_pmd+0x42/0x240
[  308.988260]  [<000003499777bfea>] __do_huge_pmd_anonymous_page+0x3a/0x210
[  308.988267]  [<00000349976cb08e>] __handle_mm_fault+0x4de/0x500
[  308.988273]  [<00000349976cb14c>] handle_mm_fault+0x9c/0x3a0
[  308.988279]  [<000003499734d70e>] do_exception+0x1de/0x540
[  308.988286]  [<0000034998387390>] __do_pgm_check+0x130/0x220
[  308.988293]  [<000003499839a934>] pgm_check_handler+0x114/0x160
[  308.988300] 3 locks held by mempig_verify/5224:
[  308.988305]  #0: 0000023ea44c1e08 (vm_lock){++++}-{0:0}, at: lock_vma_under_rcu+0xb2/0x2a0
[  308.988322]  #1: 0000023ee4d41b18 (&pcp->lock){+.+.}-{2:2}, at: rmqueue.isra.0+0xad6/0xf40
[  308.988334]  #2: 0000023efe6c8998 (&zone->lock){..-.}-{2:2}, at: rmqueue_bulk+0x5a/0x940
[  308.988346] Last Breaking-Event-Address:
[  308.988350]  [<0000034997379096>] __warn_printk+0x136/0x140
[  308.988356] irq event stamp: 52330356
[  308.988360] hardirqs last  enabled at (52330355): [<000003499838742e>] __do_pgm_check+0x1ce/0x220
[  308.988366] hardirqs last disabled at (52330356): [<000003499839932e>] _raw_spin_lock_irqsave+0x9e/0xe0
[  308.988373] softirqs last  enabled at (52329882): [<0000034997383786>] handle_softirqs+0x2c6/0x530
[  308.988380] softirqs last disabled at (52329859): [<0000034997382f86>] __irq_exit_rcu+0x126/0x140
[  308.988388] ---[ end trace 0000000000000000 ]---

Link: https://lkml.kernel.org/r/20251215081002.3353900A9c-agordeev@linux.ibm.com
Link: https://lkml.kernel.org/r/20251212151457.3898073Add-agordeev@linux.ibm.com
Fixes: e6cf9e1 ("mm: page_alloc: fix up block types when merging compatible blocks")
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Closes: https://lore.kernel.org/linux-mm/87wmalyktd.fsf@linux.ibm.com/
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Marc Hartmayer <mhartmay@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
nandamajay pushed a commit that referenced this pull request Jan 27, 2026
When running the Rust maple tree kunit tests with lockdep, you may trigger
a warning that looks like this:

	lib/maple_tree.c:780 suspicious rcu_dereference_check() usage!

	other info that might help us debug this:

	rcu_scheduler_active = 2, debug_locks = 1
	no locks held by kunit_try_catch/344.

	stack backtrace:
	CPU: 3 UID: 0 PID: 344 Comm: kunit_try_catch Tainted: G                 N  6.19.0-rc1+ #2 NONE
	Tainted: [N]=TEST
	Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
	Call Trace:
	 <TASK>
	 dump_stack_lvl+0x71/0x90
	 lockdep_rcu_suspicious+0x150/0x190
	 mas_start+0x104/0x150
	 mas_find+0x179/0x240
	 _RINvNtCs5QSdWC790r4_4core3ptr13drop_in_placeINtNtCs1cdwasc6FUb_6kernel10maple_tree9MapleTreeINtNtNtBL_5alloc4kbox3BoxlNtNtB1x_9allocator7KmallocEEECsgxAQYCfdR72_25doctests_kernel_generated+0xaf/0x130
	 rust_doctest_kernel_maple_tree_rs_0+0x600/0x6b0
	 ? lock_release+0xeb/0x2a0
	 ? kunit_try_catch_run+0x210/0x210
	 kunit_try_run_case+0x74/0x160
	 ? kunit_try_catch_run+0x210/0x210
	 kunit_generic_run_threadfn_adapter+0x12/0x30
	 kthread+0x21c/0x230
	 ? __do_trace_sched_kthread_stop_ret+0x40/0x40
	 ret_from_fork+0x16c/0x270
	 ? __do_trace_sched_kthread_stop_ret+0x40/0x40
	 ret_from_fork_asm+0x11/0x20
	 </TASK>

This is because the destructor of maple tree calls mas_find() without
taking rcu_read_lock() or the spinlock.  Doing that is actually ok in this
case since the destructor has exclusive access to the entire maple tree,
but it triggers a lockdep warning.  To fix that, take the rcu read lock.

In the future, it's possible that memory reclaim could gain a feature
where it reallocates entries in maple trees even if no user-code is
touching it.  If that feature is added, then this use of rcu read lock
would become load-bearing, so I did not make it conditional on lockdep.

We have to repeatedly take and release rcu because the destructor of T
might perform operations that sleep.

Link: https://lkml.kernel.org/r/20251217-maple-drop-rcu-v1-1-702af063573f@google.com
Fixes: da939ef ("rust: maple_tree: add MapleTree")
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reported-by: Andreas Hindborg <a.hindborg@kernel.org>
Closes: https://rust-for-linux.zulipchat.com/#narrow/channel/x/topic/x/near/564215108
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com>
Cc: Andrew Ballance <andrewjballance@gmail.com>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Trevor Gross <tmgross@umich.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
nandamajay pushed a commit that referenced this pull request Jan 27, 2026
… to macb_open()

In the non-RT kernel, local_bh_disable() merely disables preemption,
whereas it maps to an actual spin lock in the RT kernel. Consequently,
when attempting to refill RX buffers via netdev_alloc_skb() in
macb_mac_link_up(), a deadlock scenario arises as follows:

   WARNING: possible circular locking dependency detected
   6.18.0-08691-g2061f18ad76e torvalds#39 Not tainted
   ------------------------------------------------------
   kworker/0:0/8 is trying to acquire lock:
   ffff00080369bbe0 (&bp->lock){+.+.}-{3:3}, at: macb_start_xmit+0x808/0xb7c

   but task is already holding lock:
   ffff000803698e58 (&queue->tx_ptr_lock){+...}-{3:3}, at: macb_start_xmit
   +0x148/0xb7c

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #3 (&queue->tx_ptr_lock){+...}-{3:3}:
          rt_spin_lock+0x50/0x1f0
          macb_start_xmit+0x148/0xb7c
          dev_hard_start_xmit+0x94/0x284
          sch_direct_xmit+0x8c/0x37c
          __dev_queue_xmit+0x708/0x1120
          neigh_resolve_output+0x148/0x28c
          ip6_finish_output2+0x2c0/0xb2c
          __ip6_finish_output+0x114/0x308
          ip6_output+0xc4/0x4a4
          mld_sendpack+0x220/0x68c
          mld_ifc_work+0x2a8/0x4f4
          process_one_work+0x20c/0x5f8
          worker_thread+0x1b0/0x35c
          kthread+0x144/0x200
          ret_from_fork+0x10/0x20

   -> #2 (_xmit_ETHER#2){+...}-{3:3}:
          rt_spin_lock+0x50/0x1f0
          sch_direct_xmit+0x11c/0x37c
          __dev_queue_xmit+0x708/0x1120
          neigh_resolve_output+0x148/0x28c
          ip6_finish_output2+0x2c0/0xb2c
          __ip6_finish_output+0x114/0x308
          ip6_output+0xc4/0x4a4
          mld_sendpack+0x220/0x68c
          mld_ifc_work+0x2a8/0x4f4
          process_one_work+0x20c/0x5f8
          worker_thread+0x1b0/0x35c
          kthread+0x144/0x200
          ret_from_fork+0x10/0x20

   -> #1 ((softirq_ctrl.lock)){+.+.}-{3:3}:
          lock_release+0x250/0x348
          __local_bh_enable_ip+0x7c/0x240
          __netdev_alloc_skb+0x1b4/0x1d8
          gem_rx_refill+0xdc/0x240
          gem_init_rings+0xb4/0x108
          macb_mac_link_up+0x9c/0x2b4
          phylink_resolve+0x170/0x614
          process_one_work+0x20c/0x5f8
          worker_thread+0x1b0/0x35c
          kthread+0x144/0x200
          ret_from_fork+0x10/0x20

   -> #0 (&bp->lock){+.+.}-{3:3}:
          __lock_acquire+0x15a8/0x2084
          lock_acquire+0x1cc/0x350
          rt_spin_lock+0x50/0x1f0
          macb_start_xmit+0x808/0xb7c
          dev_hard_start_xmit+0x94/0x284
          sch_direct_xmit+0x8c/0x37c
          __dev_queue_xmit+0x708/0x1120
          neigh_resolve_output+0x148/0x28c
          ip6_finish_output2+0x2c0/0xb2c
          __ip6_finish_output+0x114/0x308
          ip6_output+0xc4/0x4a4
          mld_sendpack+0x220/0x68c
          mld_ifc_work+0x2a8/0x4f4
          process_one_work+0x20c/0x5f8
          worker_thread+0x1b0/0x35c
          kthread+0x144/0x200
          ret_from_fork+0x10/0x20

   other info that might help us debug this:

   Chain exists of:
     &bp->lock --> _xmit_ETHER#2 --> &queue->tx_ptr_lock

    Possible unsafe locking scenario:

          CPU0                    CPU1
          ----                    ----
     lock(&queue->tx_ptr_lock);
                                  lock(_xmit_ETHER#2);
                                  lock(&queue->tx_ptr_lock);
     lock(&bp->lock);

    *** DEADLOCK ***

   Call trace:
    show_stack+0x18/0x24 (C)
    dump_stack_lvl+0xa0/0xf0
    dump_stack+0x18/0x24
    print_circular_bug+0x28c/0x370
    check_noncircular+0x198/0x1ac
    __lock_acquire+0x15a8/0x2084
    lock_acquire+0x1cc/0x350
    rt_spin_lock+0x50/0x1f0
    macb_start_xmit+0x808/0xb7c
    dev_hard_start_xmit+0x94/0x284
    sch_direct_xmit+0x8c/0x37c
    __dev_queue_xmit+0x708/0x1120
    neigh_resolve_output+0x148/0x28c
    ip6_finish_output2+0x2c0/0xb2c
    __ip6_finish_output+0x114/0x308
    ip6_output+0xc4/0x4a4
    mld_sendpack+0x220/0x68c
    mld_ifc_work+0x2a8/0x4f4
    process_one_work+0x20c/0x5f8
    worker_thread+0x1b0/0x35c
    kthread+0x144/0x200
    ret_from_fork+0x10/0x20

Notably, invoking the mog_init_rings() callback upon link establishment
is unnecessary. Instead, we can exclusively call mog_init_rings() within
the ndo_open() callback. This adjustment resolves the deadlock issue.
Furthermore, since MACB_CAPS_MACB_IS_EMAC cases do not use mog_init_rings()
when opening the network interface via at91ether_open(), moving
mog_init_rings() to macb_open() also eliminates the MACB_CAPS_MACB_IS_EMAC
check.

Fixes: 633e98a ("net: macb: use resolved link config in mac_link_up()")
Cc: stable@vger.kernel.org
Suggested-by: Kevin Hao <kexin.hao@windriver.com>
Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com>
Link: https://patch.msgid.link/20251222015624.1994551-1-xiaolei.wang@windriver.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
nandamajay pushed a commit that referenced this pull request Jan 27, 2026
ctx->tcxt_list holds the tasks using this ring, and it's currently
protected by the normal ctx->uring_lock. However, this can cause a
circular locking issue, as reported by syzbot, where cancelations off
exec end up needing to remove an entry from this list:

======================================================
WARNING: possible circular locking dependency detected
syzkaller #0 Tainted: G             L
------------------------------------------------------
syz.0.9999/12287 is trying to acquire lock:
ffff88805851c0a8 (&ctx->uring_lock){+.+.}-{4:4}, at: io_uring_del_tctx_node+0xf0/0x2c0 io_uring/tctx.c:179

but task is already holding lock:
ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: prepare_bprm_creds fs/exec.c:1360 [inline]
ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: bprm_execve+0xb9/0x1400 fs/exec.c:1733

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&sig->cred_guard_mutex){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/mutex.c:614 [inline]
       __mutex_lock+0x187/0x1350 kernel/locking/mutex.c:776
       proc_pid_attr_write+0x547/0x630 fs/proc/base.c:2837
       vfs_write+0x27e/0xb30 fs/read_write.c:684
       ksys_write+0x145/0x250 fs/read_write.c:738
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #1 (sb_writers#3){.+.+}-{0:0}:
       percpu_down_read_internal include/linux/percpu-rwsem.h:53 [inline]
       percpu_down_read_freezable include/linux/percpu-rwsem.h:83 [inline]
       __sb_start_write include/linux/fs/super.h:19 [inline]
       sb_start_write+0x4d/0x1c0 include/linux/fs/super.h:125
       mnt_want_write+0x41/0x90 fs/namespace.c:499
       open_last_lookups fs/namei.c:4529 [inline]
       path_openat+0xadd/0x3dd0 fs/namei.c:4784
       do_filp_open+0x1fa/0x410 fs/namei.c:4814
       io_openat2+0x3e0/0x5c0 io_uring/openclose.c:143
       __io_issue_sqe+0x181/0x4b0 io_uring/io_uring.c:1792
       io_issue_sqe+0x165/0x1060 io_uring/io_uring.c:1815
       io_queue_sqe io_uring/io_uring.c:2042 [inline]
       io_submit_sqe io_uring/io_uring.c:2320 [inline]
       io_submit_sqes+0xbf4/0x2140 io_uring/io_uring.c:2434
       __do_sys_io_uring_enter io_uring/io_uring.c:3280 [inline]
       __se_sys_io_uring_enter+0x2e0/0x2b60 io_uring/io_uring.c:3219
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #0 (&ctx->uring_lock){+.+.}-{4:4}:
       check_prev_add kernel/locking/lockdep.c:3165 [inline]
       check_prevs_add kernel/locking/lockdep.c:3284 [inline]
       validate_chain kernel/locking/lockdep.c:3908 [inline]
       __lock_acquire+0x15a6/0x2cf0 kernel/locking/lockdep.c:5237
       lock_acquire+0x107/0x340 kernel/locking/lockdep.c:5868
       __mutex_lock_common kernel/locking/mutex.c:614 [inline]
       __mutex_lock+0x187/0x1350 kernel/locking/mutex.c:776
       io_uring_del_tctx_node+0xf0/0x2c0 io_uring/tctx.c:179
       io_uring_clean_tctx+0xd4/0x1a0 io_uring/tctx.c:195
       io_uring_cancel_generic+0x6ca/0x7d0 io_uring/cancel.c:646
       io_uring_task_cancel include/linux/io_uring.h:24 [inline]
       begin_new_exec+0x10ed/0x2440 fs/exec.c:1131
       load_elf_binary+0x9f8/0x2d70 fs/binfmt_elf.c:1010
       search_binary_handler fs/exec.c:1669 [inline]
       exec_binprm fs/exec.c:1701 [inline]
       bprm_execve+0x92e/0x1400 fs/exec.c:1753
       do_execveat_common+0x510/0x6a0 fs/exec.c:1859
       do_execve fs/exec.c:1933 [inline]
       __do_sys_execve fs/exec.c:2009 [inline]
       __se_sys_execve fs/exec.c:2004 [inline]
       __x64_sys_execve+0x94/0xb0 fs/exec.c:2004
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

other info that might help us debug this:

Chain exists of:
  &ctx->uring_lock --> sb_writers#3 --> &sig->cred_guard_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&sig->cred_guard_mutex);
                               lock(sb_writers#3);
                               lock(&sig->cred_guard_mutex);
  lock(&ctx->uring_lock);

 *** DEADLOCK ***

1 lock held by syz.0.9999/12287:
 #0: ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: prepare_bprm_creds fs/exec.c:1360 [inline]
 #0: ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: bprm_execve+0xb9/0x1400 fs/exec.c:1733

stack backtrace:
CPU: 0 UID: 0 PID: 12287 Comm: syz.0.9999 Tainted: G             L      syzkaller #0 PREEMPT(full)
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_circular_bug+0x2e2/0x300 kernel/locking/lockdep.c:2043
 check_noncircular+0x12e/0x150 kernel/locking/lockdep.c:2175
 check_prev_add kernel/locking/lockdep.c:3165 [inline]
 check_prevs_add kernel/locking/lockdep.c:3284 [inline]
 validate_chain kernel/locking/lockdep.c:3908 [inline]
 __lock_acquire+0x15a6/0x2cf0 kernel/locking/lockdep.c:5237
 lock_acquire+0x107/0x340 kernel/locking/lockdep.c:5868
 __mutex_lock_common kernel/locking/mutex.c:614 [inline]
 __mutex_lock+0x187/0x1350 kernel/locking/mutex.c:776
 io_uring_del_tctx_node+0xf0/0x2c0 io_uring/tctx.c:179
 io_uring_clean_tctx+0xd4/0x1a0 io_uring/tctx.c:195
 io_uring_cancel_generic+0x6ca/0x7d0 io_uring/cancel.c:646
 io_uring_task_cancel include/linux/io_uring.h:24 [inline]
 begin_new_exec+0x10ed/0x2440 fs/exec.c:1131
 load_elf_binary+0x9f8/0x2d70 fs/binfmt_elf.c:1010
 search_binary_handler fs/exec.c:1669 [inline]
 exec_binprm fs/exec.c:1701 [inline]
 bprm_execve+0x92e/0x1400 fs/exec.c:1753
 do_execveat_common+0x510/0x6a0 fs/exec.c:1859
 do_execve fs/exec.c:1933 [inline]
 __do_sys_execve fs/exec.c:2009 [inline]
 __se_sys_execve fs/exec.c:2004 [inline]
 __x64_sys_execve+0x94/0xb0 fs/exec.c:2004
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff3a8b8f749
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ff3a9a97038 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
RAX: ffffffffffffffda RBX: 00007ff3a8de5fa0 RCX: 00007ff3a8b8f749
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000200000000400
RBP: 00007ff3a8c13f91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ff3a8de6038 R14: 00007ff3a8de5fa0 R15: 00007ff3a8f0fa28
 </TASK>

Add a separate lock just for the tctx_list, tctx_lock. This can nest
under ->uring_lock, where necessary, and be used separately for list
manipulation. For the cancelation off exec side, this removes the
need to grab ->uring_lock, hence fixing the circular locking
dependency.

Reported-by: syzbot+b0e3b77ffaa8a4067ce5@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
nandamajay pushed a commit that referenced this pull request Jan 27, 2026
After rename exchanging (either with the rename exchange operation or
regular renames in multiple non-atomic steps) two inodes and at least
one of them is a directory, we can end up with a log tree that contains
only of the inodes and after a power failure that can result in an attempt
to delete the other inode when it should not because it was not deleted
before the power failure. In some case that delete attempt fails when
the target inode is a directory that contains a subvolume inside it, since
the log replay code is not prepared to deal with directory entries that
point to root items (only inode items).

1) We have directories "dir1" (inode A) and "dir2" (inode B) under the
   same parent directory;

2) We have a file (inode C) under directory "dir1" (inode A);

3) We have a subvolume inside directory "dir2" (inode B);

4) All these inodes were persisted in a past transaction and we are
   currently at transaction N;

5) We rename the file (inode C), so at btrfs_log_new_name() we update
   inode C's last_unlink_trans to N;

6) We get a rename exchange for "dir1" (inode A) and "dir2" (inode B),
   so after the exchange "dir1" is inode B and "dir2" is inode A.
   During the rename exchange we call btrfs_log_new_name() for inodes
   A and B, but because they are directories, we don't update their
   last_unlink_trans to N;

7) An fsync against the file (inode C) is done, and because its inode
   has a last_unlink_trans with a value of N we log its parent directory
   (inode A) (through btrfs_log_all_parents(), called from
   btrfs_log_inode_parent()).

8) So we end up with inode B not logged, which now has the old name
   of inode A. At copy_inode_items_to_log(), when logging inode A, we
   did not check if we had any conflicting inode to log because inode
   A has a generation lower than the current transaction (created in
   a past transaction);

9) After a power failure, when replaying the log tree, since we find that
   inode A has a new name that conflicts with the name of inode B in the
   fs tree, we attempt to delete inode B... this is wrong since that
   directory was never deleted before the power failure, and because there
   is a subvolume inside that directory, attempting to delete it will fail
   since replay_dir_deletes() and btrfs_unlink_inode() are not prepared
   to deal with dir items that point to roots instead of inodes.

   When that happens the mount fails and we get a stack trace like the
   following:

   [87.2314] BTRFS info (device dm-0): start tree-log replay
   [87.2318] BTRFS critical (device dm-0): failed to delete reference to subvol, root 5 inode 256 parent 259
   [87.2332] ------------[ cut here ]------------
   [87.2338] BTRFS: Transaction aborted (error -2)
   [87.2346] WARNING: CPU: 1 PID: 638968 at fs/btrfs/inode.c:4345 __btrfs_unlink_inode+0x416/0x440 [btrfs]
   [87.2368] Modules linked in: btrfs loop dm_thin_pool (...)
   [87.2470] CPU: 1 UID: 0 PID: 638968 Comm: mount Tainted: G        W           6.18.0-rc7-btrfs-next-218+ #2 PREEMPT(full)
   [87.2489] Tainted: [W]=WARN
   [87.2494] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
   [87.2514] RIP: 0010:__btrfs_unlink_inode+0x416/0x440 [btrfs]
   [87.2538] Code: c0 89 04 24 (...)
   [87.2568] RSP: 0018:ffffc0e741f4b9b8 EFLAGS: 00010286
   [87.2574] RAX: 0000000000000000 RBX: ffff9d3ec8a6cf60 RCX: 0000000000000000
   [87.2582] RDX: 0000000000000002 RSI: ffffffff84ab45a1 RDI: 00000000ffffffff
   [87.2591] RBP: ffff9d3ec8a6ef20 R08: 0000000000000000 R09: ffffc0e741f4b840
   [87.2599] R10: ffff9d45dc1fffa8 R11: 0000000000000003 R12: ffff9d3ee26d77e0
   [87.2608] R13: ffffc0e741f4ba98 R14: ffff9d4458040800 R15: ffff9d44b6b7ca10
   [87.2618] FS:  00007f7b9603a840(0000) GS:ffff9d4658982000(0000) knlGS:0000000000000000
   [87.2629] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   [87.2637] CR2: 00007ffc9ec33b98 CR3: 000000011273e003 CR4: 0000000000370ef0
   [87.2648] Call Trace:
   [87.2651]  <TASK>
   [87.2654]  btrfs_unlink_inode+0x15/0x40 [btrfs]
   [87.2661]  unlink_inode_for_log_replay+0x27/0xf0 [btrfs]
   [87.2669]  check_item_in_log+0x1ea/0x2c0 [btrfs]
   [87.2676]  replay_dir_deletes+0x16b/0x380 [btrfs]
   [87.2684]  fixup_inode_link_count+0x34b/0x370 [btrfs]
   [87.2696]  fixup_inode_link_counts+0x41/0x160 [btrfs]
   [87.2703]  btrfs_recover_log_trees+0x1ff/0x7c0 [btrfs]
   [87.2711]  ? __pfx_replay_one_buffer+0x10/0x10 [btrfs]
   [87.2719]  open_ctree+0x10bb/0x15f0 [btrfs]
   [87.2726]  btrfs_get_tree.cold+0xb/0x16c [btrfs]
   [87.2734]  ? fscontext_read+0x15c/0x180
   [87.2740]  ? rw_verify_area+0x50/0x180
   [87.2746]  vfs_get_tree+0x25/0xd0
   [87.2750]  vfs_cmd_create+0x59/0xe0
   [87.2755]  __do_sys_fsconfig+0x4f6/0x6b0
   [87.2760]  do_syscall_64+0x50/0x1220
   [87.2764]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
   [87.2770] RIP: 0033:0x7f7b9625f4aa
   [87.2775] Code: 73 01 c3 48 (...)
   [87.2803] RSP: 002b:00007ffc9ec35b08 EFLAGS: 00000246 ORIG_RAX: 00000000000001af
   [87.2817] RAX: ffffffffffffffda RBX: 0000558bfa91ac20 RCX: 00007f7b9625f4aa
   [87.2829] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003
   [87.2842] RBP: 0000558bfa91b120 R08: 0000000000000000 R09: 0000000000000000
   [87.2854] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
   [87.2864] R13: 00007f7b963f1580 R14: 00007f7b963f326c R15: 00007f7b963d8a23
   [87.2877]  </TASK>
   [87.2882] ---[ end trace 0000000000000000 ]---
   [87.2891] BTRFS: error (device dm-0 state A) in __btrfs_unlink_inode:4345: errno=-2 No such entry
   [87.2904] BTRFS: error (device dm-0 state EAO) in do_abort_log_replay:191: errno=-2 No such entry
   [87.2915] BTRFS critical (device dm-0 state EAO): log tree (for root 5) leaf currently being processed (slot 7 key (258 12 257)):
   [87.2929] BTRFS info (device dm-0 state EAO): leaf 30736384 gen 10 total ptrs 7 free space 15712 owner 18446744073709551610
   [87.2929] BTRFS info (device dm-0 state EAO): refs 3 lock_owner 0 current 638968
   [87.2929]      item 0 key (257 INODE_ITEM 0) itemoff 16123 itemsize 160
   [87.2929]              inode generation 9 transid 10 size 0 nbytes 0
   [87.2929]              block group 0 mode 40755 links 1 uid 0 gid 0
   [87.2929]              rdev 0 sequence 7 flags 0x0
   [87.2929]              atime 1765464494.678070921
   [87.2929]              ctime 1765464494.686606513
   [87.2929]              mtime 1765464494.686606513
   [87.2929]              otime 1765464494.678070921
   [87.2929]      item 1 key (257 INODE_REF 256) itemoff 16109 itemsize 14
   [87.2929]              index 4 name_len 4
   [87.2929]      item 2 key (257 DIR_LOG_INDEX 2) itemoff 16101 itemsize 8
   [87.2929]              dir log end 2
   [87.2929]      item 3 key (257 DIR_LOG_INDEX 3) itemoff 16093 itemsize 8
   [87.2929]              dir log end 18446744073709551615
   [87.2930]      item 4 key (257 DIR_INDEX 3) itemoff 16060 itemsize 33
   [87.2930]              location key (258 1 0) type 1
   [87.2930]              transid 10 data_len 0 name_len 3
   [87.2930]      item 5 key (258 INODE_ITEM 0) itemoff 15900 itemsize 160
   [87.2930]              inode generation 9 transid 10 size 0 nbytes 0
   [87.2930]              block group 0 mode 100644 links 1 uid 0 gid 0
   [87.2930]              rdev 0 sequence 2 flags 0x0
   [87.2930]              atime 1765464494.678456467
   [87.2930]              ctime 1765464494.686606513
   [87.2930]              mtime 1765464494.678456467
   [87.2930]              otime 1765464494.678456467
   [87.2930]      item 6 key (258 INODE_REF 257) itemoff 15887 itemsize 13
   [87.2930]              index 3 name_len 3
   [87.2930] BTRFS critical (device dm-0 state EAO): log replay failed in unlink_inode_for_log_replay:1045 for root 5, stage 3, with error -2: failed to unlink inode 256 parent dir 259 name subvol root 5
   [87.2963] BTRFS: error (device dm-0 state EAO) in btrfs_recover_log_trees:7743: errno=-2 No such entry
   [87.2981] BTRFS: error (device dm-0 state EAO) in btrfs_replay_log:2083: errno=-2 No such entry (Failed to recover log tr

So fix this by changing copy_inode_items_to_log() to always detect if
there are conflicting inodes for the ref/extref of the inode being logged
even if the inode was created in a past transaction.

A test case for fstests will follow soon.

CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
nandamajay pushed a commit that referenced this pull request Jan 27, 2026
…te in qfq_reset

`qfq_class->leaf_qdisc->q.qlen > 0` does not imply that the class
itself is active.

Two qfq_class objects may point to the same leaf_qdisc. This happens
when:

1. one QFQ qdisc is attached to the dev as the root qdisc, and

2. another QFQ qdisc is temporarily referenced (e.g., via qdisc_get()
/ qdisc_put()) and is pending to be destroyed, as in function
tc_new_tfilter.

When packets are enqueued through the root QFQ qdisc, the shared
leaf_qdisc->q.qlen increases. At the same time, the second QFQ
qdisc triggers qdisc_put and qdisc_destroy: the qdisc enters
qfq_reset() with its own q->q.qlen == 0, but its class's leaf
qdisc->q.qlen > 0. Therefore, the qfq_reset would wrongly deactivate
an inactive aggregate and trigger a null-deref in qfq_deactivate_agg:

[    0.903172] BUG: kernel NULL pointer dereference, address: 0000000000000000
[    0.903571] #PF: supervisor write access in kernel mode
[    0.903860] #PF: error_code(0x0002) - not-present page
[    0.904177] PGD 10299b067 P4D 10299b067 PUD 10299c067 PMD 0
[    0.904502] Oops: Oops: 0002 [#1] SMP NOPTI
[    0.904737] CPU: 0 UID: 0 PID: 135 Comm: exploit Not tainted 6.19.0-rc3+ #2 NONE
[    0.905157] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[    0.905754] RIP: 0010:qfq_deactivate_agg (include/linux/list.h:992 (discriminator 2) include/linux/list.h:1006 (discriminator 2) net/sched/sch_qfq.c:1367 (discriminator 2) net/sched/sch_qfq.c:1393 (discriminator 2))
[    0.906046] Code: 0f 84 4d 01 00 00 48 89 70 18 8b 4b 10 48 c7 c2 ff ff ff ff 48 8b 78 08 48 d3 e2 48 21 f2 48 2b 13 48 8b 30 48 d3 ea 8b 4b 18 0

Code starting with the faulting instruction
===========================================
   0:	0f 84 4d 01 00 00    	je     0x153
   6:	48 89 70 18          	mov    %rsi,0x18(%rax)
   a:	8b 4b 10             	mov    0x10(%rbx),%ecx
   d:	48 c7 c2 ff ff ff ff 	mov    $0xffffffffffffffff,%rdx
  14:	48 8b 78 08          	mov    0x8(%rax),%rdi
  18:	48 d3 e2             	shl    %cl,%rdx
  1b:	48 21 f2             	and    %rsi,%rdx
  1e:	48 2b 13             	sub    (%rbx),%rdx
  21:	48 8b 30             	mov    (%rax),%rsi
  24:	48 d3 ea             	shr    %cl,%rdx
  27:	8b 4b 18             	mov    0x18(%rbx),%ecx
	...
[    0.907095] RSP: 0018:ffffc900004a39a0 EFLAGS: 00010246
[    0.907368] RAX: ffff8881043a0880 RBX: ffff888102953340 RCX: 0000000000000000
[    0.907723] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[    0.908100] RBP: ffff888102952180 R08: 0000000000000000 R09: 0000000000000000
[    0.908451] R10: ffff8881043a0000 R11: 0000000000000000 R12: ffff888102952000
[    0.908804] R13: ffff888102952180 R14: ffff8881043a0ad8 R15: ffff8881043a0880
[    0.909179] FS:  000000002a1a0380(0000) GS:ffff888196d8d000(0000) knlGS:0000000000000000
[    0.909572] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.909857] CR2: 0000000000000000 CR3: 0000000102993002 CR4: 0000000000772ef0
[    0.910247] PKRU: 55555554
[    0.910391] Call Trace:
[    0.910527]  <TASK>
[    0.910638]  qfq_reset_qdisc (net/sched/sch_qfq.c:357 net/sched/sch_qfq.c:1485)
[    0.910826]  qdisc_reset (include/linux/skbuff.h:2195 include/linux/skbuff.h:2501 include/linux/skbuff.h:3424 include/linux/skbuff.h:3430 net/sched/sch_generic.c:1036)
[    0.911040]  __qdisc_destroy (net/sched/sch_generic.c:1076)
[    0.911236]  tc_new_tfilter (net/sched/cls_api.c:2447)
[    0.911447]  rtnetlink_rcv_msg (net/core/rtnetlink.c:6958)
[    0.911663]  ? __pfx_rtnetlink_rcv_msg (net/core/rtnetlink.c:6861)
[    0.911894]  netlink_rcv_skb (net/netlink/af_netlink.c:2550)
[    0.912100]  netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344)
[    0.912296]  ? __alloc_skb (net/core/skbuff.c:706)
[    0.912484]  netlink_sendmsg (net/netlink/af_netlink.c:1894)
[    0.912682]  sock_write_iter (net/socket.c:727 (discriminator 1) net/socket.c:742 (discriminator 1) net/socket.c:1195 (discriminator 1))
[    0.912880]  vfs_write (fs/read_write.c:593 fs/read_write.c:686)
[    0.913077]  ksys_write (fs/read_write.c:738)
[    0.913252]  do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
[    0.913438]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)
[    0.913687] RIP: 0033:0x424c34
[    0.913844] Code: 89 02 48 c7 c0 ff ff ff ff eb bd 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 2d 44 09 00 00 74 13 b8 01 00 00 00 0f 05 9

Code starting with the faulting instruction
===========================================
   0:	89 02                	mov    %eax,(%rdx)
   2:	48 c7 c0 ff ff ff ff 	mov    $0xffffffffffffffff,%rax
   9:	eb bd                	jmp    0xffffffffffffffc8
   b:	66 2e 0f 1f 84 00 00 	cs nopw 0x0(%rax,%rax,1)
  12:	00 00 00
  15:	90                   	nop
  16:	f3 0f 1e fa          	endbr64
  1a:	80 3d 2d 44 09 00 00 	cmpb   $0x0,0x9442d(%rip)        # 0x9444e
  21:	74 13                	je     0x36
  23:	b8 01 00 00 00       	mov    $0x1,%eax
  28:	0f 05                	syscall
  2a:	09                   	.byte 0x9
[    0.914807] RSP: 002b:00007ffea1938b78 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[    0.915197] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 0000000000424c34
[    0.915556] RDX: 000000000000003c RSI: 000000002af378c0 RDI: 0000000000000003
[    0.915912] RBP: 00007ffea1938bc0 R08: 00000000004b8820 R09: 0000000000000000
[    0.916297] R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffea1938d28
[    0.916652] R13: 00007ffea1938d38 R14: 00000000004b3828 R15: 0000000000000001
[    0.917039]  </TASK>
[    0.917158] Modules linked in:
[    0.917316] CR2: 0000000000000000
[    0.917484] ---[ end trace 0000000000000000 ]---
[    0.917717] RIP: 0010:qfq_deactivate_agg (include/linux/list.h:992 (discriminator 2) include/linux/list.h:1006 (discriminator 2) net/sched/sch_qfq.c:1367 (discriminator 2) net/sched/sch_qfq.c:1393 (discriminator 2))
[    0.917978] Code: 0f 84 4d 01 00 00 48 89 70 18 8b 4b 10 48 c7 c2 ff ff ff ff 48 8b 78 08 48 d3 e2 48 21 f2 48 2b 13 48 8b 30 48 d3 ea 8b 4b 18 0

Code starting with the faulting instruction
===========================================
   0:	0f 84 4d 01 00 00    	je     0x153
   6:	48 89 70 18          	mov    %rsi,0x18(%rax)
   a:	8b 4b 10             	mov    0x10(%rbx),%ecx
   d:	48 c7 c2 ff ff ff ff 	mov    $0xffffffffffffffff,%rdx
  14:	48 8b 78 08          	mov    0x8(%rax),%rdi
  18:	48 d3 e2             	shl    %cl,%rdx
  1b:	48 21 f2             	and    %rsi,%rdx
  1e:	48 2b 13             	sub    (%rbx),%rdx
  21:	48 8b 30             	mov    (%rax),%rsi
  24:	48 d3 ea             	shr    %cl,%rdx
  27:	8b 4b 18             	mov    0x18(%rbx),%ecx
	...
[    0.918902] RSP: 0018:ffffc900004a39a0 EFLAGS: 00010246
[    0.919198] RAX: ffff8881043a0880 RBX: ffff888102953340 RCX: 0000000000000000
[    0.919559] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[    0.919908] RBP: ffff888102952180 R08: 0000000000000000 R09: 0000000000000000
[    0.920289] R10: ffff8881043a0000 R11: 0000000000000000 R12: ffff888102952000
[    0.920648] R13: ffff888102952180 R14: ffff8881043a0ad8 R15: ffff8881043a0880
[    0.921014] FS:  000000002a1a0380(0000) GS:ffff888196d8d000(0000) knlGS:0000000000000000
[    0.921424] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.921710] CR2: 0000000000000000 CR3: 0000000102993002 CR4: 0000000000772ef0
[    0.922097] PKRU: 55555554
[    0.922240] Kernel panic - not syncing: Fatal exception
[    0.922590] Kernel Offset: disabled

Fixes: 0545a30 ("pkt_sched: QFQ - quick fair queue scheduler")
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Link: https://patch.msgid.link/20260106034100.1780779-1-xmei5@asu.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
nandamajay pushed a commit that referenced this pull request Jan 27, 2026
The GPIO controller is configured as non-sleeping but it uses generic
pinctrl helpers which use a mutex for synchronization.

This can cause the following lockdep splat with shared GPIOs enabled on
boards which have multiple devices using the same GPIO:

BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:591
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 12, name:
kworker/u16:0
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
6 locks held by kworker/u16:0/12:
  #0: ffff0001f0018d48 ((wq_completion)events_unbound#2){+.+.}-{0:0},
at: process_one_work+0x18c/0x604
  #1: ffff8000842dbdf0 (deferred_probe_work){+.+.}-{0:0}, at:
process_one_work+0x1b4/0x604
  #2: ffff0001f18498f8 (&dev->mutex){....}-{4:4}, at:
__device_attach+0x38/0x1b0
  #3: ffff0001f75f1e90 (&gdev->srcu){.+.?}-{0:0}, at:
gpiod_direction_output_raw_commit+0x0/0x360
  #4: ffff0001f46e3db8 (&shared_desc->spinlock){....}-{3:3}, at:
gpio_shared_proxy_direction_output+0xd0/0x144 [gpio_shared_proxy]
  #5: ffff0001f180ee90 (&gdev->srcu){.+.?}-{0:0}, at:
gpiod_direction_output_raw_commit+0x0/0x360
irq event stamp: 81450
hardirqs last  enabled at (81449): [<ffff8000813acba4>]
_raw_spin_unlock_irqrestore+0x74/0x78
hardirqs last disabled at (81450): [<ffff8000813abfb8>]
_raw_spin_lock_irqsave+0x84/0x88
softirqs last  enabled at (79616): [<ffff8000811455fc>]
__alloc_skb+0x17c/0x1e8
softirqs last disabled at (79614): [<ffff8000811455fc>]
__alloc_skb+0x17c/0x1e8
CPU: 2 UID: 0 PID: 12 Comm: kworker/u16:0 Not tainted
6.19.0-rc4-next-20260105+ #11975 PREEMPT
Hardware name: Hardkernel ODROID-M1 (DT)
Workqueue: events_unbound deferred_probe_work_func
Call trace:
  show_stack+0x18/0x24 (C)
  dump_stack_lvl+0x90/0xd0
  dump_stack+0x18/0x24
  __might_resched+0x144/0x248
  __might_sleep+0x48/0x98
  __mutex_lock+0x5c/0x894
  mutex_lock_nested+0x24/0x30
  pinctrl_get_device_gpio_range+0x44/0x128
  pinctrl_gpio_direction+0x3c/0xe0
  pinctrl_gpio_direction_output+0x14/0x20
  rockchip_gpio_direction_output+0xb8/0x19c
  gpiochip_direction_output+0x38/0x94
  gpiod_direction_output_raw_commit+0x1d8/0x360
  gpiod_direction_output_nonotify+0x7c/0x230
  gpiod_direction_output+0x34/0xf8
  gpio_shared_proxy_direction_output+0xec/0x144 [gpio_shared_proxy]
  gpiochip_direction_output+0x38/0x94
  gpiod_direction_output_raw_commit+0x1d8/0x360
  gpiod_direction_output_nonotify+0x7c/0x230
  gpiod_configure_flags+0xbc/0x480
  gpiod_find_and_request+0x1a0/0x574
  gpiod_get_index+0x58/0x84
  devm_gpiod_get_index+0x20/0xb4
  devm_gpiod_get_optional+0x18/0x30
  rockchip_pcie_probe+0x98/0x380
  platform_probe+0x5c/0xac
  really_probe+0xbc/0x298

Fixes: 936ee26 ("gpio/rockchip: add driver for rockchip gpio")
Cc: stable@vger.kernel.org
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Closes: https://lore.kernel.org/all/d035fc29-3b03-4cd6-b8ec-001f93540bc6@samsung.com/
Acked-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20260106090011.21603-1-bartosz.golaszewski@oss.qualcomm.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
nandamajay pushed a commit that referenced this pull request Jan 27, 2026
…ked_inode()

In btrfs_read_locked_inode() we are calling btrfs_init_file_extent_tree()
while holding a path with a read locked leaf from a subvolume tree, and
btrfs_init_file_extent_tree() may do a GFP_KERNEL allocation, which can
trigger reclaim.

This can create a circular lock dependency which lockdep warns about with
the following splat:

   [6.1433] ======================================================
   [6.1574] WARNING: possible circular locking dependency detected
   [6.1583] 6.18.0+ #4 Tainted: G     U
   [6.1591] ------------------------------------------------------
   [6.1599] kswapd0/117 is trying to acquire lock:
   [6.1606] ffff8d9b6333c5b8 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x39/0x2f0
   [6.1625]
            but task is already holding lock:
   [6.1633] ffffffffa4ab8ce0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x195/0xc60
   [6.1646]
            which lock already depends on the new lock.

   [6.1657]
            the existing dependency chain (in reverse order) is:
   [6.1667]
            -> #2 (fs_reclaim){+.+.}-{0:0}:
   [6.1677]        fs_reclaim_acquire+0x9d/0xd0
   [6.1685]        __kmalloc_cache_noprof+0x59/0x750
   [6.1694]        btrfs_init_file_extent_tree+0x90/0x100
   [6.1702]        btrfs_read_locked_inode+0xc3/0x6b0
   [6.1710]        btrfs_iget+0xbb/0xf0
   [6.1716]        btrfs_lookup_dentry+0x3c5/0x8e0
   [6.1724]        btrfs_lookup+0x12/0x30
   [6.1731]        lookup_open.isra.0+0x1aa/0x6a0
   [6.1739]        path_openat+0x5f7/0xc60
   [6.1746]        do_filp_open+0xd6/0x180
   [6.1753]        do_sys_openat2+0x8b/0xe0
   [6.1760]        __x64_sys_openat+0x54/0xa0
   [6.1768]        do_syscall_64+0x97/0x3e0
   [6.1776]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
   [6.1784]
            -> #1 (btrfs-tree-00){++++}-{3:3}:
   [6.1794]        lock_release+0x127/0x2a0
   [6.1801]        up_read+0x1b/0x30
   [6.1808]        btrfs_search_slot+0x8e0/0xff0
   [6.1817]        btrfs_lookup_inode+0x52/0xd0
   [6.1825]        __btrfs_update_delayed_inode+0x73/0x520
   [6.1833]        btrfs_commit_inode_delayed_inode+0x11a/0x120
   [6.1842]        btrfs_log_inode+0x608/0x1aa0
   [6.1849]        btrfs_log_inode_parent+0x249/0xf80
   [6.1857]        btrfs_log_dentry_safe+0x3e/0x60
   [6.1865]        btrfs_sync_file+0x431/0x690
   [6.1872]        do_fsync+0x39/0x80
   [6.1879]        __x64_sys_fsync+0x13/0x20
   [6.1887]        do_syscall_64+0x97/0x3e0
   [6.1894]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
   [6.1903]
            -> #0 (&delayed_node->mutex){+.+.}-{3:3}:
   [6.1913]        __lock_acquire+0x15e9/0x2820
   [6.1920]        lock_acquire+0xc9/0x2d0
   [6.1927]        __mutex_lock+0xcc/0x10a0
   [6.1934]        __btrfs_release_delayed_node.part.0+0x39/0x2f0
   [6.1944]        btrfs_evict_inode+0x20b/0x4b0
   [6.1952]        evict+0x15a/0x2f0
   [6.1958]        prune_icache_sb+0x91/0xd0
   [6.1966]        super_cache_scan+0x150/0x1d0
   [6.1974]        do_shrink_slab+0x155/0x6f0
   [6.1981]        shrink_slab+0x48e/0x890
   [6.1988]        shrink_one+0x11a/0x1f0
   [6.1995]        shrink_node+0xbfd/0x1320
   [6.1002]        balance_pgdat+0x67f/0xc60
   [6.1321]        kswapd+0x1dc/0x3e0
   [6.1643]        kthread+0xff/0x240
   [6.1965]        ret_from_fork+0x223/0x280
   [6.1287]        ret_from_fork_asm+0x1a/0x30
   [6.1616]
            other info that might help us debug this:

   [6.1561] Chain exists of:
              &delayed_node->mutex --> btrfs-tree-00 --> fs_reclaim

   [6.1503]  Possible unsafe locking scenario:

   [6.1110]        CPU0                    CPU1
   [6.1411]        ----                    ----
   [6.1707]   lock(fs_reclaim);
   [6.1998]                                lock(btrfs-tree-00);
   [6.1291]                                lock(fs_reclaim);
   [6.1581]   lock(&delayed_node->mutex);
   [6.1874]
             *** DEADLOCK ***

   [6.1716] 2 locks held by kswapd0/117:
   [6.1999]  #0: ffffffffa4ab8ce0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x195/0xc60
   [6.1294]  #1: ffff8d998344b0e0 (&type->s_umount_key#40){++++}- {3:3}, at: super_cache_scan+0x37/0x1d0
   [6.1596]
            stack backtrace:
   [6.1183] CPU: 11 UID: 0 PID: 117 Comm: kswapd0 Tainted: G     U 6.18.0+ #4 PREEMPT(lazy)
   [6.1185] Tainted: [U]=USER
   [6.1186] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023
   [6.1187] Call Trace:
   [6.1187]  <TASK>
   [6.1189]  dump_stack_lvl+0x6e/0xa0
   [6.1192]  print_circular_bug.cold+0x17a/0x1c0
   [6.1194]  check_noncircular+0x175/0x190
   [6.1197]  __lock_acquire+0x15e9/0x2820
   [6.1200]  lock_acquire+0xc9/0x2d0
   [6.1201]  ? __btrfs_release_delayed_node.part.0+0x39/0x2f0
   [6.1204]  __mutex_lock+0xcc/0x10a0
   [6.1206]  ? __btrfs_release_delayed_node.part.0+0x39/0x2f0
   [6.1208]  ? __btrfs_release_delayed_node.part.0+0x39/0x2f0
   [6.1211]  ? __btrfs_release_delayed_node.part.0+0x39/0x2f0
   [6.1213]  __btrfs_release_delayed_node.part.0+0x39/0x2f0
   [6.1215]  btrfs_evict_inode+0x20b/0x4b0
   [6.1217]  ? lock_acquire+0xc9/0x2d0
   [6.1220]  evict+0x15a/0x2f0
   [6.1222]  prune_icache_sb+0x91/0xd0
   [6.1224]  super_cache_scan+0x150/0x1d0
   [6.1226]  do_shrink_slab+0x155/0x6f0
   [6.1228]  shrink_slab+0x48e/0x890
   [6.1229]  ? shrink_slab+0x2d2/0x890
   [6.1231]  shrink_one+0x11a/0x1f0
   [6.1234]  shrink_node+0xbfd/0x1320
   [6.1236]  ? shrink_node+0xa2d/0x1320
   [6.1236]  ? shrink_node+0xbd3/0x1320
   [6.1239]  ? balance_pgdat+0x67f/0xc60
   [6.1239]  balance_pgdat+0x67f/0xc60
   [6.1241]  ? finish_task_switch.isra.0+0xc4/0x2a0
   [6.1246]  kswapd+0x1dc/0x3e0
   [6.1247]  ? __pfx_autoremove_wake_function+0x10/0x10
   [6.1249]  ? __pfx_kswapd+0x10/0x10
   [6.1250]  kthread+0xff/0x240
   [6.1251]  ? __pfx_kthread+0x10/0x10
   [6.1253]  ret_from_fork+0x223/0x280
   [6.1255]  ? __pfx_kthread+0x10/0x10
   [6.1257]  ret_from_fork_asm+0x1a/0x30
   [6.1260]  </TASK>

This is because:

1) The fsync task is holding an inode's delayed node mutex (for a
   directory) while calling __btrfs_update_delayed_inode() and that needs
   to do a search on the subvolume's btree (therefore read lock some
   extent buffers);

2) The lookup task, at btrfs_lookup(), triggered reclaim with the
   GFP_KERNEL allocation done by btrfs_init_file_extent_tree() while
   holding a read lock on a subvolume leaf;

3) The reclaim triggered kswapd which is doing inode eviction for the
   directory inode the fsync task is using as an argument to
   btrfs_commit_inode_delayed_inode() - but in that call chain we are
   trying to read lock the same leaf that the lookup task is holding
   while calling btrfs_init_file_extent_tree() and doing the GFP_KERNEL
   allocation.

Fix this by calling btrfs_init_file_extent_tree() after we don't need the
path anymore and release it in btrfs_read_locked_inode().

Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/linux-btrfs/6e55113a22347c3925458a5d840a18401a38b276.camel@linux.intel.com/
Fixes: 8679d26 ("btrfs: initialize inode::file_extent_tree after i_mode has been set")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
nandamajay pushed a commit that referenced this pull request Jan 27, 2026
When forward-porting Rust Binder to 6.18, I neglected to take commit
fb56fdf ("mm/list_lru: split the lock to per-cgroup scope") into
account, and apparently I did not end up running the shrinker callback
when I sanity tested the driver before submission. This leads to crashes
like the following:

	============================================
	WARNING: possible recursive locking detected
	6.18.0-mainline-maybe-dirty #1 Tainted: G          IO
	--------------------------------------------
	kswapd0/68 is trying to acquire lock:
	ffff956000fa18b0 (&l->lock){+.+.}-{2:2}, at: lock_list_lru_of_memcg+0x128/0x230

	but task is already holding lock:
	ffff956000fa18b0 (&l->lock){+.+.}-{2:2}, at: rust_helper_spin_lock+0xd/0x20

	other info that might help us debug this:
	 Possible unsafe locking scenario:

	       CPU0
	       ----
	  lock(&l->lock);
	  lock(&l->lock);

	 *** DEADLOCK ***

	 May be due to missing lock nesting notation

	3 locks held by kswapd0/68:
	 #0: ffffffff90d2e260 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0x597/0x1160
	 #1: ffff956000fa18b0 (&l->lock){+.+.}-{2:2}, at: rust_helper_spin_lock+0xd/0x20
	 #2: ffffffff90cf3680 (rcu_read_lock){....}-{1:2}, at: lock_list_lru_of_memcg+0x2d/0x230

To fix this, remove the spin_lock() call from rust_shrink_free_page().

Cc: stable <stable@kernel.org>
Fixes: eafedbc ("rust_binder: add Rust Binder driver")
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20251202-binder-shrink-unspin-v1-1-263efb9ad625@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
nandamajay pushed a commit that referenced this pull request Jan 27, 2026
During device unmapping (triggered by module unload or explicit unmap),
a refcount underflow occurs causing a use-after-free warning:

  [14747.574913] ------------[ cut here ]------------
  [14747.574916] refcount_t: underflow; use-after-free.
  [14747.574917] WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x55/0x90, CPU#9: kworker/9:1/378
  [14747.574924] Modules linked in: rnbd_client(-) rtrs_client rnbd_server rtrs_server rtrs_core ...
  [14747.574998] CPU: 9 UID: 0 PID: 378 Comm: kworker/9:1 Tainted: G           O     N  6.19.0-rc3lblk-fnext+ torvalds#42 PREEMPT(voluntary)
  [14747.575005] Workqueue: rnbd_clt_wq unmap_device_work [rnbd_client]
  [14747.575010] RIP: 0010:refcount_warn_saturate+0x55/0x90
  [14747.575037]  Call Trace:
  [14747.575038]   <TASK>
  [14747.575038]   rnbd_clt_unmap_device+0x170/0x1d0 [rnbd_client]
  [14747.575044]   process_one_work+0x211/0x600
  [14747.575052]   worker_thread+0x184/0x330
  [14747.575055]   ? __pfx_worker_thread+0x10/0x10
  [14747.575058]   kthread+0x10d/0x250
  [14747.575062]   ? __pfx_kthread+0x10/0x10
  [14747.575066]   ret_from_fork+0x319/0x390
  [14747.575069]   ? __pfx_kthread+0x10/0x10
  [14747.575072]   ret_from_fork_asm+0x1a/0x30
  [14747.575083]   </TASK>
  [14747.575096] ---[ end trace 0000000000000000 ]---

Befor this patch :-

The bug is a double kobject_put() on dev->kobj during device cleanup.

Kobject Lifecycle:
  kobject_init_and_add()  sets kobj.kref = 1  (initialization)
  kobject_put()           sets kobj.kref = 0  (should be called once)

* Before this patch:

rnbd_clt_unmap_device()
  rnbd_destroy_sysfs()
    kobject_del(&dev->kobj)                   [remove from sysfs]
    kobject_put(&dev->kobj)                   PUT #1 (WRONG!)
      kref: 1 to 0
      rnbd_dev_release()
        kfree(dev)                            [DEVICE FREED!]

  rnbd_destroy_gen_disk()                     [use-after-free!]

  rnbd_clt_put_dev()
    refcount_dec_and_test(&dev->refcount)
    kobject_put(&dev->kobj)                   PUT #2 (UNDERFLOW!)
      kref: 0 to -1                           [WARNING!]

The first kobject_put() in rnbd_destroy_sysfs() prematurely frees the
device via rnbd_dev_release(), then the second kobject_put() in
rnbd_clt_put_dev() causes refcount underflow.

* After this patch :-

Remove kobject_put() from rnbd_destroy_sysfs(). This function should
only remove sysfs visibility (kobject_del), not manage object lifetime.

Call Graph (FIXED):

rnbd_clt_unmap_device()
  rnbd_destroy_sysfs()
    kobject_del(&dev->kobj)                   [remove from sysfs only]
                                              [kref unchanged: 1]

  rnbd_destroy_gen_disk()                     [device still valid]

  rnbd_clt_put_dev()
    refcount_dec_and_test(&dev->refcount)
    kobject_put(&dev->kobj)                   ONLY PUT (CORRECT!)
      kref: 1 to 0                            [BALANCED]
      rnbd_dev_release()
        kfree(dev)                            [CLEAN DESTRUCTION]

This follows the kernel pattern where sysfs removal (kobject_del) is
separate from object destruction (kobject_put).

Fixes: 581cf83 ("block: rnbd: add .release to rnbd_dev_ktype")
Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
nandamajay pushed a commit that referenced this pull request Jan 27, 2026
Patch series "mm/hugetlb: fixes for PMD table sharing (incl.  using
mmu_gather)", v3.

One functional fix, one performance regression fix, and two related
comment fixes.

I cleaned up my prototype I recently shared [1] for the performance fix,
deferring most of the cleanups I had in the prototype to a later point. 
While doing that I identified the other things.

The goal of this patch set is to be backported to stable trees "fairly"
easily. At least patch #1 and #4.

Patch #1 fixes hugetlb_pmd_shared() not detecting any sharing
Patch #2 + #3 are simple comment fixes that patch #4 interacts with.
Patch #4 is a fix for the reported performance regression due to excessive
IPI broadcasts during fork()+exit().

The last patch is all about TLB flushes, IPIs and mmu_gather.
Read: complicated

There are plenty of cleanups in the future to be had + one reasonable
optimization on x86. But that's all out of scope for this series.

Runtime tested, with a focus on fixing the performance regression using
the original reproducer [2] on x86.


This patch (of 4):

We switched from (wrongly) using the page count to an independent shared
count.  Now, shared page tables have a refcount of 1 (excluding
speculative references) and instead use ptdesc->pt_share_count to identify
sharing.

We didn't convert hugetlb_pmd_shared(), so right now, we would never
detect a shared PMD table as such, because sharing/unsharing no longer
touches the refcount of a PMD table.

Page migration, like mbind() or migrate_pages() would allow for migrating
folios mapped into such shared PMD tables, even though the folios are not
exclusive.  In smaps we would account them as "private" although they are
"shared", and we would be wrongly setting the PM_MMAP_EXCLUSIVE in the
pagemap interface.

Fix it by properly using ptdesc_pmd_is_shared() in hugetlb_pmd_shared().

Link: https://lkml.kernel.org/r/20251223214037.580860-1-david@kernel.org
Link: https://lkml.kernel.org/r/20251223214037.580860-2-david@kernel.org
Link: https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ [1]
Link: https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ [2]
Fixes: 59d9094 ("mm: hugetlb: independent PMD page table shared count")
Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Tested-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Uschakow, Stanislav" <suschako@amazon.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
nandamajay pushed a commit that referenced this pull request Jan 27, 2026
A null-ptr-deref was reported in the SCTP transmit path when SCTP-AUTH key
initialization fails:

  ==================================================================
  KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
  CPU: 0 PID: 16 Comm: ksoftirqd/0 Tainted: G W 6.6.0 #2
  RIP: 0010:sctp_packet_bundle_auth net/sctp/output.c:264 [inline]
  RIP: 0010:sctp_packet_append_chunk+0xb36/0x1260 net/sctp/output.c:401
  Call Trace:

  sctp_packet_transmit_chunk+0x31/0x250 net/sctp/output.c:189
  sctp_outq_flush_data+0xa29/0x26d0 net/sctp/outqueue.c:1111
  sctp_outq_flush+0xc80/0x1240 net/sctp/outqueue.c:1217
  sctp_cmd_interpreter.isra.0+0x19a5/0x62c0 net/sctp/sm_sideeffect.c:1787
  sctp_side_effects net/sctp/sm_sideeffect.c:1198 [inline]
  sctp_do_sm+0x1a3/0x670 net/sctp/sm_sideeffect.c:1169
  sctp_assoc_bh_rcv+0x33e/0x640 net/sctp/associola.c:1052
  sctp_inq_push+0x1dd/0x280 net/sctp/inqueue.c:88
  sctp_rcv+0x11ae/0x3100 net/sctp/input.c:243
  sctp6_rcv+0x3d/0x60 net/sctp/ipv6.c:1127

The issue is triggered when sctp_auth_asoc_init_active_key() fails in
sctp_sf_do_5_1C_ack() while processing an INIT_ACK. In this case, the
command sequence is currently:

- SCTP_CMD_PEER_INIT
- SCTP_CMD_TIMER_STOP (T1_INIT)
- SCTP_CMD_TIMER_START (T1_COOKIE)
- SCTP_CMD_NEW_STATE (COOKIE_ECHOED)
- SCTP_CMD_ASSOC_SHKEY
- SCTP_CMD_GEN_COOKIE_ECHO

If SCTP_CMD_ASSOC_SHKEY fails, asoc->shkey remains NULL, while
asoc->peer.auth_capable and asoc->peer.peer_chunks have already been set by
SCTP_CMD_PEER_INIT. This allows a DATA chunk with auth = 1 and shkey = NULL
to be queued by sctp_datamsg_from_user().

Since command interpretation stops on failure, no COOKIE_ECHO should been
sent via SCTP_CMD_GEN_COOKIE_ECHO. However, the T1_COOKIE timer has already
been started, and it may enqueue a COOKIE_ECHO into the outqueue later. As
a result, the DATA chunk can be transmitted together with the COOKIE_ECHO
in sctp_outq_flush_data(), leading to the observed issue.

Similar to the other places where it calls sctp_auth_asoc_init_active_key()
right after sctp_process_init(), this patch moves the SCTP_CMD_ASSOC_SHKEY
immediately after SCTP_CMD_PEER_INIT, before stopping T1_INIT and starting
T1_COOKIE. This ensures that if shared key generation fails, authenticated
DATA cannot be sent. It also allows the T1_INIT timer to retransmit INIT,
giving the client another chance to process INIT_ACK and retry key setup.

Fixes: 730fc3d ("[SCTP]: Implete SCTP-AUTH parameter processing")
Reported-by: Zhen Chen <chenzhen126@huawei.com>
Tested-by: Zhen Chen <chenzhen126@huawei.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/44881224b375aa8853f5e19b4055a1a56d895813.1768324226.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
nandamajay pushed a commit that referenced this pull request Jan 27, 2026
Jamal Hadi Salim says:

====================
net/sched: teql: Enforce hierarchy placement

GangMin Kim <km.kim1503@gmail.com> managed to create a UAF on qfq by inserting
teql as a child qdisc and exploiting a qlen sync issue.
teql is not intended to be used as a child qdisc. Lets enforce that rule in
patch #1. Although patch #1 fixes the issue, we prevent another potential qlen
exploit in qfq in patch #2 by enforcing the child's active status is not
determined by inspecting the qlen. In patch #3 we add a tdc test case.
====================

Link: https://patch.msgid.link/20260114160243.913069-1-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
nandamajay pushed a commit that referenced this pull request Jan 27, 2026
…itives

The "valid" readout delay between the two reads of the watchdog is larger
than the valid delta between the resulting watchdog and clocksource
intervals, which results in false positive watchdog results.

Assume TSC is the clocksource and HPET is the watchdog and both have a
uncertainty margin of 250us (default). The watchdog readout does:

  1) wdnow = read(HPET);
  2) csnow = read(TSC);
  3) wdend = read(HPET);

The valid window for the delta between #1 and #3 is calculated by the
uncertainty margins of the watchdog and the clocksource:

   m = 2 * watchdog.uncertainty_margin + cs.uncertainty margin;

which results in 750us for the TSC/HPET case.

The actual interval comparison uses a smaller margin:

   m = watchdog.uncertainty_margin + cs.uncertainty margin;

which results in 500us for the TSC/HPET case.

That means the following scenario will trigger the watchdog:

 Watchdog cycle N:

 1)       wdnow[N] = read(HPET);
 2)       csnow[N] = read(TSC);
 3)       wdend[N] = read(HPET);

Assume the delay between #1 and #2 is 100us and the delay between #1 and

 Watchdog cycle N + 1:

 4)       wdnow[N + 1] = read(HPET);
 5)       csnow[N + 1] = read(TSC);
 6)       wdend[N + 1] = read(HPET);

If the delay between #4 and #6 is within the 750us margin then any delay
between #4 and #5 which is larger than 600us will fail the interval check
and mark the TSC unstable because the intervals are calculated against the
previous value:

    wd_int = wdnow[N + 1] - wdnow[N];
    cs_int = csnow[N + 1] - csnow[N];

Putting the above delays in place this results in:

    cs_int = (wdnow[N + 1] + 610us) - (wdnow[N] + 100us);
 -> cs_int = wd_int + 510us;

which is obviously larger than the allowed 500us margin and results in
marking TSC unstable.

Fix this by using the same margin as the interval comparison. If the delay
between two watchdog reads is larger than that, then the readout was either
disturbed by interconnect congestion, NMIs or SMIs.

Fixes: 4ac1dd3 ("clocksource: Set cs_watchdog_read() checks based on .uncertainty_margin")
Reported-by: Daniel J Blueman <daniel@quora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/lkml/20250602223251.496591-1-daniel@quora.org/
Link: https://patch.msgid.link/87bjjxc9dq.ffs@tglx
nandamajay pushed a commit that referenced this pull request Jan 27, 2026
When one iio device is a consumer of another, it is possible that
the ->info_exist_lock of both ends up being taken when reading the
value of the consumer device.

Since they currently belong to the same lockdep class (being
initialized in a single location with mutex_init()), that results in a
lockdep warning

         CPU0
         ----
    lock(&iio_dev_opaque->info_exist_lock);
    lock(&iio_dev_opaque->info_exist_lock);

   *** DEADLOCK ***

   May be due to missing lock nesting notation

  4 locks held by sensors/414:
   #0: c31fd6dc (&p->lock){+.+.}-{3:3}, at: seq_read_iter+0x44/0x4e4
   #1: c4f5a1c4 (&of->mutex){+.+.}-{3:3}, at: kernfs_seq_start+0x1c/0xac
   #2: c2827548 (kn->active#34){.+.+}-{0:0}, at: kernfs_seq_start+0x30/0xac
   #3: c1dd2b6 (&iio_dev_opaque->info_exist_lock){+.+.}-{3:3}, at: iio_read_channel_processed_scale+0x24/0xd8

  stack backtrace:
  CPU: 0 UID: 0 PID: 414 Comm: sensors Not tainted 6.17.11 #5 NONE
  Hardware name: Generic AM33XX (Flattened Device Tree)
  Call trace:
   unwind_backtrace from show_stack+0x10/0x14
   show_stack from dump_stack_lvl+0x44/0x60
   dump_stack_lvl from print_deadlock_bug+0x2b8/0x334
   print_deadlock_bug from __lock_acquire+0x13a4/0x2ab0
   __lock_acquire from lock_acquire+0xd0/0x2c0
   lock_acquire from __mutex_lock+0xa0/0xe8c
   __mutex_lock from mutex_lock_nested+0x1c/0x24
   mutex_lock_nested from iio_read_channel_raw+0x20/0x6c
   iio_read_channel_raw from rescale_read_raw+0x128/0x1c4
   rescale_read_raw from iio_channel_read+0xe4/0xf4
   iio_channel_read from iio_read_channel_processed_scale+0x6c/0xd8
   iio_read_channel_processed_scale from iio_hwmon_read_val+0x68/0xbc
   iio_hwmon_read_val from dev_attr_show+0x18/0x48
   dev_attr_show from sysfs_kf_seq_show+0x80/0x110
   sysfs_kf_seq_show from seq_read_iter+0xdc/0x4e4
   seq_read_iter from vfs_read+0x238/0x2e4
   vfs_read from ksys_read+0x6c/0xec
   ksys_read from ret_fast_syscall+0x0/0x1c

Just as the mlock_key already has its own lockdep class, add a
lock_class_key for the info_exist mutex.

Note that this has in theory been a problem since before IIO first
left staging, but it only occurs when a chain of consumers is in use
and that is not often done.

Fixes: ac917a8 ("staging:iio:core set the iio_dev.info pointer to null on unregister under lock.")
Signed-off-by: Rasmus Villemoes <ravi@prevas.dk>
Reviewed-by: Peter Rosin <peda@axentia.se>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants