fix(mem): align TLSF to 8 bytes on ARMv7-M/E-M/v8-M to avoid LDRD fault by KentLee86 · Pull Request #10015 · lvgl/lvgl

KentLee86 · 2026-04-24T14:27:39Z

Fixes #4747 (unaligned memory access, previously closed as not planned — see below).

Summary

On 32-bit ARM Cortex-M3 / M4 / M7 / M33 (ARMv7-M, ARMv7E-M, ARMv8-M) the LDRD / STRD instructions require strict 8-byte alignment and fault with a UsageFault (UFSR.UNALIGNED) regardless of CCR.UNALIGN_TRP. GCC emits these for 64-bit struct accesses assuming malloc returns pointers aligned to alignof(max_align_t) — which is 8 on these targets because double / long long are 8-byte aligned.

The built-in TLSF allocator currently uses ALIGN_SIZE_LOG2 = 2 (4 bytes) on all 32-bit builds. In internal SRAM allocations usually happen to land on 8-byte boundaries and the issue stays hidden, but on external RAM pools (e.g. STM32H7 + FMC SDRAM configured via LV_MEM_ADR) the mis-aligned addresses are hit almost immediately and the CPU hard-faults inside lv_tlsf_malloc → lv_mem_core_builtin.c:147 during the first non-trivial allocation from the draw path.

Change

Detect LDRD-capable M-profile cores via __ARM_ARCH >= 7 (excluding ARMv6-M, which has no LDRD) and promote ALIGN_SIZE_LOG2 to 3 (8 bytes) there. TLSF_64BIT still takes precedence. No behavior change on AVR, RV32, ARMv6-M (Cortex-M0/M0+) or other 32-bit targets.

#if defined (TLSF_64BIT)
    ALIGN_SIZE_LOG2 = 3;
#elif (defined(__ARM_ARCH) && (__ARM_ARCH >= 7) && !defined(__ARM_ARCH_6M__))
    /* ARMv7-M / ARMv7E-M / ARMv8-M need 8-byte alignment for LDRD/STRD. */
    ALIGN_SIZE_LOG2 = 3;
#else
    ALIGN_SIZE_LOG2 = 2;
#endif

How it was diagnosed

Reproducer: [env:lvgl_test] on a STM32H743II board with 32 MB FMC SDRAM, LV_MEM_ADR = 0xC0100000 (4 MB pool in SDRAM free region after the 800×480 RGB565 LTDC framebuffer). lv_init() succeeds but lv_timer_handler() hard-faults on the first lv_draw_rect.

GDB attach after fault (J-Link):

CFSR  = 0x01000000
  UFSR = 0x0100   → bit 8 = UNALIGNED
HFSR  = 0x40000000 (FORCED)

Backtrace:
  WWDG_IRQHandler (Default_Handler Infinite_Loop)
  <signal handler called>
  lv_draw_rect             .../lv_draw_rect.c:256
  lv_obj_draw              .../lv_obj.c:733
  ...
  lv_malloc_core           .../lv_mem_core_builtin.c:147
  lv_tlsf_malloc (tlsf=0xC0100000)  .../lv_tlsf.c:1102

Ruled out by direct tests before landing on TLSF:

SDRAM / FMC / MPU: mixed-struct probe {u32, u16, u8, void*, u64} at 0xC0100000 reads/writes fine. A 25 MB pattern R/W stress alongside LVGL runs 401 iterations with 0 errors.
CCR.UNALIGN_TRP: already 0, clearing it explicitly has no effect (LDRD strict alignment is ISA-level, independent of that bit).
MPU: SDRAM region set to Normal / non-cacheable / bufferable — no change.
LVGL version: reproduced on both 9.2.2 and 9.3.0.

Workaround while the fix is not merged: build LVGL with -mno-unaligned-access so GCC stops emitting LDRD/STRD. This also fixes the symptom but doesn't address the root cause (TLSF's 4-byte alignment is below alignof(max_align_t) on these targets).

Verification

Same board, same env, only the diff in this PR applied (no -mno-unaligned-access):

lv_init() ✓, lv_timer_handler() ✓
8-row dashboard renders continuously, loop iter count grows, no fault
4 MB SDRAM heap accepted, internal AXI SRAM usage drops from 90 % to 15 %

Notes

Code style: single #elif block matching the existing #if defined (TLSF_64BIT) style, no reformatting.
No new options in lv_conf_template.h → lv_conf_internal_gen.py / Kconfig not affected.
Doc update not needed — allocator ABI is unchanged.
Tests: none added; the existing TLSF tests still pass on x86_64 (via TLSF_64BIT), and adding an ARMv7-M HW test would require CI changes. Happy to add one if that's the project's preference.

Related reports

unaligned memory access #4747 — unaligned memory access, same symptom, closed as not planned
LVGL forum "Getting hardfault when using Ext. RAM for LV_MEM_ADR" (12766)
STMicroelectronics community "stm32h7b0 has UNALIGNED hardfault problem when using LVGL 9.1" (695235)

Marked as Draft for maintainer feedback on the detection macro and whether an additional conditional (e.g. also __ARM_FEATURE_LDRD) is preferred.

On 32-bit ARM Cortex-M3 / M4 / M7 / M33 (ARMv7-M, ARMv7E-M, ARMv8-M) the LDRD / STRD instructions require strict 8-byte alignment and fault with a UsageFault (UFSR.UNALIGNED) regardless of CCR.UNALIGN_TRP. GCC emits these for 64-bit struct accesses assuming malloc returns pointers aligned to alignof(max_align_t), which is 8 on these targets (double / long long). The TLSF built-in allocator currently uses 4-byte alignment for 32-bit builds, so block addresses can end up only 4-byte aligned. In internal SRAM this usually slips by because allocations tend to land on 8-byte boundaries anyway, but on external RAM pools (e.g. STM32H7 + FMC SDRAM configured via LV_MEM_ADR) the mis-aligned addresses are hit quickly and the CPU hard-faults inside lv_tlsf_malloc / lv_draw_rect on the first non-trivial allocation. Detect LDRD-capable ARM M-profile cores via __ARM_ARCH >= 7 (excluding ARMv6-M, which has no LDRD) and promote ALIGN_SIZE_LOG2 to 3 (8 bytes) there. No behavior change on AVR / RV32 / ARMv6-M / other 32-bit targets, and TLSF_64BIT still takes precedence. Related: lvgl#4747 (unaligned hardfault, closed as not planned), LVGL forum "Getting hardfault when using Ext. RAM for LV_MEM_ADR" and multiple STM32H7 + LVGL reports in the ST community. Verified on STM32H743II + 32 MB SDRAM: LV_MEM_ADR in SDRAM (4 MB) now runs a 8-row 800x480 dashboard cleanly; previously HardFault on the first lv_draw_rect. Signed-off-by: wslee <dldntjr407@gmail.com>

KentLee86 · 2026-04-24T14:32:47Z

On second thought, here's how a maintainer can verify the root cause without any external hardware or reproducer project — just by inspecting an existing ARMv7-M LVGL build artifact.

Verify GCC does emit LDRD on malloc-returned pointers

Take any existing LVGL build for Cortex-M3 / M4 / M7 (for example the STM32H7 demo in lv_port_riverdi_101-stm32h7 compiled with the default toolchain):

arm-none-eabi-objdump -d firmware.elf \
  | awk '/^[0-9a-f]+ <lv_draw_rect>/,/^$/' \
  | grep -E 'ldrd|strd' | head

You will see LDRD/STRD instructions inside lv_draw_rect (and many other LVGL functions) whose base register holds a pointer returned by lv_malloc. The compiler emits them because it assumes malloc honors alignof(max_align_t) == 8 on these targets.

Why TLSF's current 4-byte alignment is UB on these targets

Per ARMv7-M Architecture Reference Manual (§A3.2.1), LDRD / STRD with a base other than SP raise UsageFault on any address that is not 8-byte aligned, regardless of SCB->CCR.UNALIGN_TRP. alignof(max_align_t) on ARMv7-M GCC is 8 (driven by double / long long), so any malloc must return at least 8-byte alignment to satisfy C11 §7.22.3:

The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement…

TLSF with ALIGN_SIZE_LOG2 = 2 violates that on any target where alignof(max_align_t) > 4. The violation only surfaces when actual allocations land on 4-byte-but-not-8-byte boundaries, which is rare in internal SRAM (alloc order happens to be favorable) but common when the pool is placed in external RAM via LV_MEM_ADR.

So the fix in this PR isn't "add a workaround for STM32H7" — it's restoring the alignment contract that the C standard already requires of any general-purpose allocator on these targets. The same latent UB exists on M3 / M4 / M33 but just happens to rarely hit the wrong byte boundary.

Net effect of the change

Wastes at most 4 bytes per allocation on 32-bit ARMv7-M/E-M/v8-M targets (worst case).
No change to TLSF_64BIT builds, AVR, RV32, ARMv6-M (Cortex-M0/M0+).
Closes a class of "sporadic hardfault deep in LVGL when moving heap to external RAM" reports without requiring users to add -mno-unaligned-access or hand-pick an aligned address.

github-actions · 2026-04-24T14:33:16Z

Hi 👋, thank you for your PR!

We've run benchmarks in an emulated environment. Here are the results:

ARM Emulated 32b - lv_conf_perf32b

Scene Name	Avg CPU (%)	Avg FPS	Avg Time (ms)	Render Time (ms)	Flush Time (ms)
All scenes avg.	28	37	7	7	0

Detailed Results Per Scene

Scene Name	Avg CPU (%)	Avg FPS	Avg Time (ms)	Render Time (ms)
Empty screen	11	33	0	0
Moving wallpaper	2	33	1	1
Single rectangle	0	50	0	0
Multiple rectangles	0	33 (-1)	0	0
Multiple RGB images	0	39	0	0
Multiple ARGB images	10 (-6)	41 (+3)	2 (-2)	2 (-2)
Rotated ARGB images	57 (-2)	44	15	15
Multiple labels	4 (+1)	35 (+2)	0	0
Screen sized text	83 (+2)	45	17	17
Multiple arcs	39	33	7	7
Containers	4 (+1)	37 (-1)	0	0
Containers with overlay	89 (-1)	21	44	44
Containers with opa	14	37	1	1
Containers with opa_layer	19 (+1)	34	5	5
Containers with scrolling	45 (+1)	45	12	12
Widgets demo	72 (+1)	39 (-1)	16 (-1)	16 (-1)
All scenes avg.	28	37	7	7

ARM Emulated 64b - lv_conf_perf64b

Scene Name	Avg CPU (%)	Avg FPS	Avg Time (ms)	Render Time (ms)	Flush Time (ms)
All scenes avg.	25	37	6	6	0

Detailed Results Per Scene

Scene Name	Avg CPU (%)	Avg FPS	Avg Time (ms)	Render Time (ms)
Empty screen	11	33	0	0
Moving wallpaper	1	33	0	0
Single rectangle	0	50	0	0
Multiple rectangles	0	35	0	0
Multiple RGB images	0	39	0	0
Multiple ARGB images	11	42	0	0
Rotated ARGB images	29	33	9	9
Multiple labels	2	35	0	0
Screen sized text	85	46	18	18
Multiple arcs	33	33	6	6
Containers	4	37 (-1)	0	0
Containers with overlay	98 (+1)	22	42 (+1)	42 (+1)
Containers with opa	15	38	0	0
Containers with opa_layer	8 (+1)	36	2 (+1)	2 (+1)
Containers with scrolling	49 (+1)	49	12	12
Widgets demo	67	40	15	15
All scenes avg.	25	37	6	6

Disclaimer: These benchmarks were run in an emulated environment using QEMU with instruction counting mode.
The timing values represent relative performance metrics within this specific virtualized setup and should
not be interpreted as absolute real-world performance measurements. Values are deterministic and useful for
comparing different LVGL features and configurations, but may not correlate directly with performance on
physical hardware. The measurements are intended for comparative analysis only.

🤖 This comment was automatically generated by a bot.

kisvegabor

It can explain a lot of weird issues. Thank you for investigating it.

The changes are ok from my side. Let's wait for @AndreCostaaa's opinion too

KentLee86 mentioned this pull request Apr 24, 2026

unaligned memory access #4747

Closed

kisvegabor approved these changes Apr 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(mem): align TLSF to 8 bytes on ARMv7-M/E-M/v8-M to avoid LDRD fault#10015

fix(mem): align TLSF to 8 bytes on ARMv7-M/E-M/v8-M to avoid LDRD fault#10015
KentLee86 wants to merge 1 commit intolvgl:masterfrom
KentLee86:fix/tlsf-align-armv7m-ldrd

KentLee86 commented Apr 24, 2026

Uh oh!

KentLee86 commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

kisvegabor left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

KentLee86 commented Apr 24, 2026

Summary

Change

How it was diagnosed

Verification

Notes

Related reports

Uh oh!

KentLee86 commented Apr 24, 2026

Verify GCC does emit LDRD on malloc-returned pointers

Why TLSF's current 4-byte alignment is UB on these targets

Net effect of the change

Uh oh!

github-actions Bot commented Apr 24, 2026

ARM Emulated 32b - lv_conf_perf32b

ARM Emulated 64b - lv_conf_perf64b

Uh oh!

kisvegabor left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants