Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NULL pointer dereference in kmod driver while hot plugging a CPU #2251

Open
loli10K opened this issue Dec 31, 2024 · 7 comments · May be fixed by #2252
Open

NULL pointer dereference in kmod driver while hot plugging a CPU #2251

loli10K opened this issue Dec 31, 2024 · 7 comments · May be fixed by #2252
Labels
kind/bug Something isn't working
Milestone

Comments

@loli10K
Copy link

loli10K commented Dec 31, 2024

Describe the bug

The kmod driver doesn't handle CPU hot plugging gracefully. Maybe it's not a common use case (that is, it doesn't really happen that often during a workload) but it did happen to me.

How to reproduce it

Happened once randomly while hot plugging a core, can be easily reproduced running the following commands in a loop

echo 1 > /sys/devices/system/cpu/cpuX/online
echo 0 > /sys/devices/system/cpu/cpuX/online

Expected behaviour

Falco's kmod driver should handle CPU hot plugging gracefully.

Screenshots

No screenshot, but i'll do you one better, kernel oops (this is from my debug kernel, but it does happen on 5.15.0-67-generic as well):

[   93.904133] BUG: kernel NULL pointer dereference, address: 0000000000000008
[   93.906458] #PF: supervisor read access in kernel mode
[   93.907814] #PF: error_code(0x0000) - not-present page
[   93.909319] PGD 0 P4D 0 
[   93.909996] Oops: 0000 [#1] SMP PTI
[   93.910993] CPU: 2 PID: 23 Comm: cpuhp/2 Tainted: G           OE     5.15.67 falcosecurity/falco#2
[   93.913099] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[   93.914593] RIP: 0010:record_event_consumer+0xe4/0xeb0 [falco]
[   93.916146] Code: 1c c5 80 f8 46 82 41 83 3c 24 02 4c 8b 43 08 0f 84 60 02 00 00 41 bf 01 00 00 00 f0 44 0f c1 7b 24 45 85 ff 0f 85 88 02 00 00 <49> 8b 40 08 48 83 c0 01 49 89 40 08 41 8b 10 41 8b 40 04 39 c2 0f
[   93.921189] RSP: 0000:ffffc900001ebc20 EFLAGS: 00010246
[   93.922569] RAX: 0000000000000002 RBX: ffffe8ffffd42ba8 RCX: 0000000000000000
[   93.924449] RDX: 0000000000000000 RSI: 00000000000000f4 RDI: ffffc900001ebd70
[   93.926355] RBP: ffffc900001ebe48 R08: 0000000000000000 R09: 0000000000000009
[   93.928205] R10: 0000000000000014 R11: ffffc900001ebcc0 R12: ffffc900001ebe20
[   93.930317] R13: ffffc9000036b000 R14: 00000000000000f4 R15: 0000000000000000
[   93.932140] FS:  0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
[   93.934252] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   93.935600] CR2: 0000000000000008 CR3: 000000000260a001 CR4: 00000000003706e0
[   93.937456] Call Trace:
[   93.938269]  <TASK>
[   93.938806]  ? ___cache_free+0x2df/0x490
[   93.939747]  ? netlink_broadcast_filtered+0x146/0x4a0
[   93.941307]  ? unwind_next_frame+0x482/0x5d0
[   93.942556]  ? ret_from_fork+0x22/0x30
[   93.943514]  ? unwind_next_frame+0x61/0x5d0
[   93.944543]  record_event_all_consumers+0x54/0x80 [falco]
[   93.945852]  ? do_cpu_callback+0x120/0x120 [falco]
[   93.947031]  do_cpu_callback+0xf6/0x120 [falco]
[   93.948246]  scap_cpu_online+0x3c/0x50 [falco]
[   93.949352]  cpuhp_invoke_callback+0x25f/0x3c0
[   93.950517]  ? virtnet_cpu_dead+0x30/0x30 [virtio_net]
[   93.951747]  cpuhp_thread_fun+0x8d/0x140
[   93.952718]  smpboot_thread_fn+0xaf/0x140
[   93.953751]  ? smpboot_register_percpu_thread+0xe0/0xe0
[   93.955682]  kthread+0x127/0x150
[   93.956955]  ? set_kthread_struct+0x50/0x50
[   93.958519]  ret_from_fork+0x22/0x30
[   93.959852]  </TASK>
[   93.960726] Modules linked in: falco(OE) evdev(E) virtio_balloon(E) virtio_console(E) sch_fq_codel(E) msr(E) fuse(E) efi_pstore(E) virtio_rng(E) rng_core(E) autofs4(E) efivars(E) virtio_net(E) net_failover(E) virtio_blk(E) failover(E) qxl(E) drm_ttm_helper(E) ttm(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ghash_clmulni_intel(E) drm(E) cryptd(E) ata_generic(E) virtio_pci(E) virtio(E) virtio_pci_modern_dev(E) virtio_ring(E) libata(E) [last unloaded: falco]
[   93.978637] CR2: 0000000000000008
[   93.979894] ---[ end trace 59aa4e75c88d37e4 ]---
[   93.981589] RIP: 0010:record_event_consumer+0xe4/0xeb0 [falco]
[   93.983756] Code: 1c c5 80 f8 46 82 41 83 3c 24 02 4c 8b 43 08 0f 84 60 02 00 00 41 bf 01 00 00 00 f0 44 0f c1 7b 24 45 85 ff 0f 85 88 02 00 00 <49> 8b 40 08 48 83 c0 01 49 89 40 08 41 8b 10 41 8b 40 04 39 c2 0f
[   93.990679] RSP: 0000:ffffc900001ebc20 EFLAGS: 00010246
[   93.992623] RAX: 0000000000000002 RBX: ffffe8ffffd42ba8 RCX: 0000000000000000
[   93.995245] RDX: 0000000000000000 RSI: 00000000000000f4 RDI: ffffc900001ebd70
[   93.997870] RBP: ffffc900001ebe48 R08: 0000000000000000 R09: 0000000000000009
[   94.000504] R10: 0000000000000014 R11: ffffc900001ebcc0 R12: ffffc900001ebe20
[   94.003128] R13: ffffc9000036b000 R14: 00000000000000f4 R15: 0000000000000000
[   94.005743] FS:  0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
[   94.008708] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   94.010837] CR2: 0000000000000008 CR3: 000000000260a001 CR4: 00000000003706e0

kgdb

Thread 27 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 23]
record_event_consumer (consumer=consumer@entry=0xffffc900003a3000, event_type=event_type@entry=PPME_CPU_HOTPLUG_E, 
    drop_flags=drop_flags@entry=UF_NEVER_DROP, ns=ns@entry=1735650240013207314, 
    event_datap=event_datap@entry=0xffffc900001ebe20, tp_type=tp_type@entry=KMOD_PROG_ATTACHED_MAX)
    at /usr/src/falco-7.3.0+driver/main.c:1847
1847		ring_info->n_evts++;
(gdb) list
1842			ring_info->n_preemptions++;
1843			atomic_dec(&ring->preempt_count);
1844			put_cpu();
1845			return res;
1846		}
1847		ring_info->n_evts++;
1848	
1849		/*
1850		 * Calculate the space currently available in the buffer
1851		 */
(gdb) p ring_info
$1 = (struct ppm_ring_buffer_info *) 0x0 <fixed_percpu_data>
(gdb) 

Environment

  • Falco version:
Falco version: 0.39.2
Libs version:  0.18.2
Plugin API:    3.7.0
Engine:        0.43.0
Driver:
  API version:    8.0.0
  Schema version: 2.0.0
  Default driver: 7.3.0+driver
  • System info:
{
  "machine": "x86_64",
  "nodename": "falco",
  "release": "5.15.0-67-generic",
  "sysname": "Linux",
  "version": "#74-Ubuntu SMP Wed Feb 22 14:14:39 UTC 2023"
}
  • Cloud provider or hardware configuration:
QEMU/KVM virtual hardware
  • OS:
PRETTY_NAME="Ubuntu 22.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
  • Kernel:
Linux falco 5.15.0-67-generic falcosecurity/falco#74-Ubuntu SMP Wed Feb 22 14:14:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
  • Installation method:
deb

Additional context

@loli10K loli10K added the kind/bug Something isn't working label Dec 31, 2024
@loli10K
Copy link
Author

loli10K commented Jan 9, 2025

I believe this is a regression, this used to work with an older version of the kernel driver (draios/sysdig#744) and it's actually shockingly easier to reproduce:

  1. stop falco service
  2. rmmod falco for good measure
  3. echo 0 > /sys/devices/system/cpu/cpu1/online
  4. start falco
  5. echo 1 > /sys/devices/system/cpu/cpu1/online

This behavior also contradicts what's written here:

https://github.com/falcosecurity/libs/blob/7.3.0%2Bdriver/driver/main.c#L475-L482

  /*
   * If a cpu is offline when the consumer is first created, we
   * will never get events for that cpu even if it later comes
   * online via hotplug. We could allocate these rings on-demand
   * later in this function if needed for hotplug, but that
   * requires the consumer to know to call open again, and that is
   * not supported.
   */

@FedeDP
Copy link
Contributor

FedeDP commented Jan 21, 2025

Hi! Thanks for the detailed bug report!
I am going to try to repro this and try to come up with a fix! Thank you very much!
/milestone 0.41.0

@FedeDP
Copy link
Contributor

FedeDP commented Jan 22, 2025

I believe this is a regression, this used to work with an older version of the kernel driver (draios/sysdig#744) and it's actually shockingly easier to reproduce:

Question, did you by chance upgrade your running kernel in the meantime?

@FedeDP
Copy link
Contributor

FedeDP commented Jan 22, 2025

Moving to libs.

@FedeDP FedeDP transferred this issue from falcosecurity/falco Jan 22, 2025
@FedeDP
Copy link
Contributor

FedeDP commented Jan 22, 2025

/milestone next-driver

@poiana poiana added this to the next-driver milestone Jan 22, 2025
@FedeDP
Copy link
Contributor

FedeDP commented Jan 22, 2025

Proposed a fix in #2252.
Thanks to @Andreagit97 that worked with me on this!

@loli10K
Copy link
Author

loli10K commented Jan 23, 2025

I believe this is a regression, this used to work with an older version of the kernel driver (draios/sysdig#744) and it's actually shockingly easier to reproduce:

Question, did you by chance upgrade your running kernel in the meantime?

I never actually used that ancient version in production. Actually this is my very first time using falco, i just got curious and started looking for a possible explanation to this bug, i found that old commit mentioning hot plugging and verified that it was working correctly on a throwaway node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants