Page Fault in zap_leaf_array_create #16730

Qubitium · 2024-11-07T06:56:28Z

System information

zfs-2.3.99-941_g91bd12dfe
zfs-kmod-2.3.99-941_g91bd12dfe

[169319.517377] #PF: supervisor read access in kernel mode
[169319.517384] #PF: error_code(0x0000) - not-present page
[169319.517391] PGD 100000067 P4D 100000067 PUD 365e69067 PMD 455f23067 PTE 0
[169319.517401] Oops: 0000 [#1] PREEMPT SMP NOPTI
[169319.517409] CPU: 24 PID: 3376 Comm: txg_sync Tainted: P           OE      6.6.59-x64v4-xanmod1 #0~20241101.g5687e21
[169319.517423] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 3.08 09/18/2024
[169319.517433] RIP: 0010:zap_leaf_array_create+0xe6/0x2b0 [zfs]
[169319.517512] Code: 18 4c 8b 59 18 41 0f b7 4b 22 4c 8d 04 49 48 89 ce 8d 4b fb bb 01 00 00 00 49 c1 e0 02 c4 e2 71 f7 cb 66 89 74 24 16 4c 01 c1 <41> 0f b7 4c 4b 46 66 41 89 4b 22 49 8b 8a e0 00 00 00 48 8b 49 18
[169319.517533] RSP: 0018:ffffc90028ef3980 EFLAGS: 00010202
[169319.517540] RAX: 0000000000000003 RBX: 0000000000000001 RCX: 000000000002c1f4
[169319.517549] RDX: d2f8960000000000 RSI: 0000000000003a7f RDI: ffffc901e529982e
[169319.517558] RBP: 0000000000000003 R08: 000000000002bdf4 R09: 0000000000000008
[169319.517566] R10: ffff889c6920f200 R11: ffffc901e5119000 R12: ffffc90028ef3be8
[169319.517575] R13: 0000000000000008 R14: ffffc90028ef3be8 R15: 0000000000000038
[169319.517583] FS:  0000000000000000(0000) GS:ffff88af9dc00000(0000) knlGS:0000000000000000
[169319.517593] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[169319.517600] CR2: ffffc901e517142e CR3: 000000013f5f4000 CR4: 0000000000f50ee0
[169319.517845] PKRU: 55555554
[169319.518078] Call Trace:
[169319.518309]  <TASK>
[169319.518540]  ? __die+0x1a/0x60
[169319.518976]  ? page_fault_oops+0x14b/0x4d0
[169319.519612]  ? zap_leaf_array_create+0xe6/0x2b0 [zfs]
[169319.520289]  ? search_module_extables+0x2e/0x50
[169319.520905]  ? search_bpf_extables+0x56/0x80
[169319.521497]  ? exc_page_fault+0xa2/0xb0
[169319.522095]  ? asm_exc_page_fault+0x22/0x30
[169319.522695]  ? zap_leaf_array_create+0xe6/0x2b0 [zfs]
[169319.523323]  zap_entry_create+0x13e/0x300 [zfs]
[169319.523942]  fzap_update+0x11e/0x270 [zfs]
[169319.524540]  zap_update_uint64_impl+0x88/0x1b0 [zfs]
[169319.525137]  ddt_zap_update+0x74/0xa0 [zfs]
[169319.525753]  ddt_sync_flush_entry+0x1d0/0x430 [zfs]
[169319.526351]  ? spl_kmem_cache_free+0x128/0x1e0 [spl]
[169319.526901]  ddt_sync_flush_log_incremental+0x230/0x310 [zfs]
[169319.527475]  ddt_sync+0x100/0x470 [zfs]
[169319.528051]  ? zio_wait+0x263/0x2a0 [zfs]
[169319.528612]  ? bplist_iterate+0xe1/0x100 [zfs]
[169319.528858]  spa_sync+0x5e4/0x1050 [zfs]
[169319.529178]  ? spa_txg_history_init_io+0xfd/0x110 [zfs]
[169319.529410]  txg_sync_thread+0x1fc/0x390 [zfs]
[169319.529622]  ? txg_register_callbacks+0xa0/0xa0 [zfs]
[169319.529827]  ? spl_taskq_fini+0x90/0x90 [spl]
[169319.529999]  thread_generic_wrapper+0x52/0x60 [spl]
[169319.530166]  kthread+0xdc/0x110
[169319.530324]  ? kthread_complete_and_exit+0x20/0x20
[169319.530480]  ret_from_fork+0x28/0x40
[169319.530635]  ? kthread_complete_and_exit+0x20/0x20
[169319.530796]  ret_from_fork_asm+0x11/0x20
[169319.530946]  </TASK>
[169319.531091] Modules linked in: tls zram zsmalloc nf_conntrack_netlink xt_nat xt_conntrack xfrm_user xt_addrtype rpcsec_gss_krb5 auth_rpcgss nfsv4 overlay nfs lockd grace fscache netfs cfg80211 veth snd_hrtimer nft_masq>
[169319.531119]  sha1_ssse3 input_leds aesni_intel nvidia_uvm(POE) spl(OE) soundcore crypto_simd i2c_piix4 cryptd k10temp ccp gpio_amdpt rapl wmi_bmof mac_hid sch_fq_pie sch_pie msr parport_pc ppdev lp parport nvme_fabrics>
[169319.534357] CR2: ffffc901e517142e
[169319.534588] ---[ end trace 0000000000000000 ]---
[169319.950622] RIP: 0010:zap_leaf_array_create+0xe6/0x2b0 [zfs]
[169319.950965] Code: 18 4c 8b 59 18 41 0f b7 4b 22 4c 8d 04 49 48 89 ce 8d 4b fb bb 01 00 00 00 49 c1 e0 02 c4 e2 71 f7 cb 66 89 74 24 16 4c 01 c1 <41> 0f b7 4c 4b 46 66 41 89 4b 22 49 8b 8a e0 00 00 00 48 8b 49 18
[169319.951700] RSP: 0018:ffffc90028ef3980 EFLAGS: 00010202
[169319.951702] RAX: 0000000000000003 RBX: 0000000000000001 RCX: 000000000002c1f4
[169319.951702] RDX: d2f8960000000000 RSI: 0000000000003a7f RDI: ffffc901e529982e
[169319.951703] RBP: 0000000000000003 R08: 000000000002bdf4 R09: 0000000000000008
[169319.951704] R10: ffff889c6920f200 R11: ffffc901e5119000 R12: ffffc90028ef3be8
[169319.951704] R13: 0000000000000008 R14: ffffc90028ef3be8 R15: 0000000000000038
[169319.951705] FS:  0000000000000000(0000) GS:ffff88af9dc00000(0000) knlGS:0000000000000000
[169319.951705] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[169319.951705] CR2: ffffc901e517142e CR3: 000000013f5f4000 CR4: 0000000000f50ee0
[169319.951706] PKRU: 55555554
[169319.951707] note: txg_sync[3376] exited with irqs disabled

The text was updated successfully, but these errors were encountered:

robn · 2024-11-08T04:25:14Z

First, please fill out the whole template when filing bugs. Saves me having to ask for things it already asks for, that is, a basic problem description and info about reproducing.

More specific questions:

Was this production or test?
What's the workload like?
Is your dedup table interesting? How big is it? Mix of uniques and duplicates?
Any errors reported? Scrub?
Have you used zpool ddtprune? Have you set dedup_table_quota?
Any unusual config/parameters?

There's not much to go on here. Best I can tell from the opcode and register dumps, its trying to access a ZAP chunk well beyond the end of the leaf buffer. It's hard to see why though. Could be an already-corrupted ZAP (broken chunk links), could be more general memory corruption. I'll have more of a think about what information might be useful here.

Qubitium · 2024-11-09T11:04:08Z

First, please fill out the whole template when filing bugs. Saves me having to ask for things it already asks for, that is, a basic problem description and info about reproducing.

Sorry. I will make sure to provide as much info as possible next time.

Was this production or test?

What's the workload like?

This is was a production instance mainly running ai model training/testing and CI tasks. For CI tasks, lots of small files will result due to the nature of CI compilation tasks. Ai model training will generate larger files but < 20GB per train session.

Is your dedup table interesting? How big is it? Mix of uniques and duplicates?

The zpool reported dedup ratio was astronomically high, like 140x. Personally I don't understand how the dedup can be so high since logically the ci files (docker runs) are deleted after each run, and ai model files have very random variability so even not very compressible or have high dedup blocks.

Any errors reported? Scrub?

No errors reported in logs. Did not run scrub. This was a 4 nvme strip raid zfs pool with no raidz protection.

Have you used zpool ddtprune? Have you set dedup_table_quota?

Any unusual config/parameters?

No that I recall. I tried tip and 2.3.0rc2 and they all crash on load so I had to erase and rebuild the zpool.

There's not much to go on here. Best I can tell from the opcode and register dumps, its trying to access a ZAP chunk well beyond the end of the leaf buffer. It's hard to see why though. Could be an already-corrupted ZAP (broken chunk links), could be more general memory corruption. I'll have more of a think about what information might be useful here.

There will 2-3 random crashes (computer rebooted by itself) preceding the zfs load error here. No power-loss event happened though the nvmes do not have power-loss preventive capacitors. Just 2-3 reboots that I also suspect zfs to be the cause. For now, I have rebuild the zpool and runing on 2.3.0rc2 release branch with dedup off to weed out dedup as culprit.

robn · 2024-11-10T23:29:07Z

The zpool reported dedup ratio was astronomically high, like 140x.

Yeah, bit weird. I'd be interested to know what zpool status -D and zdb -DD <pool> showed, though I'm not going to ask you to torch your machine just to find out!

There will 2-3 random crashes (computer rebooted by itself) preceding the zfs load error here.

I'm not sure what "zfs load" means here. Can you confirm whether this crash occurred during normal operations, or while importing the pool?

Are there any crash logs from the "random crashes" + reboots?

Qubitium · 2024-11-11T00:08:51Z

I'm not sure what "zfs load" means here. Can you confirm whether this crash occurred during normal operations, or while importing the pool?

Typo, zfs load means zfs import here. After one of the reboots, the file sys will no longer mount and the stacktrace reflects the mount error, not caused by error post-mount.

Are there any crash logs from the "random crashes" + reboots?

I checked the dmesg and syslogs and there were no logs preceding the reboots.

I do on a hourly basis flush the linux file cache buffer and have ran into previous zfs issues that affected lxd, at first I thought it was lxd, but lxd maintainer believes, based on all the previous data he has seen, which matches my report, that he strongly suspect zfs has some hidden memory safety problems post kernel 6.6 which is more easily triggered by doing a level 3 flush of the linux file buffer:

echo 3 > /proc/sys/vm/drop_caches

canonical/lxd#14178 (comment)
#16324 (comment)

Qubitium added the Type: Defect Incorrect behavior (e.g. crash, hang) label Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Page Fault in zap_leaf_array_create #16730

Page Fault in zap_leaf_array_create #16730

Qubitium commented Nov 7, 2024

robn commented Nov 8, 2024

Qubitium commented Nov 9, 2024

robn commented Nov 10, 2024

Qubitium commented Nov 11, 2024 •

edited

Loading

Page Fault in zap_leaf_array_create #16730

Page Fault in zap_leaf_array_create #16730

Comments

Qubitium commented Nov 7, 2024

System information

robn commented Nov 8, 2024

Qubitium commented Nov 9, 2024

robn commented Nov 10, 2024

Qubitium commented Nov 11, 2024 • edited Loading

Qubitium commented Nov 11, 2024 •

edited

Loading