Skip to content

Conversation

slouken
Copy link
Collaborator

@slouken slouken commented Sep 17, 2025

Special thanks to @mechakotik for their investigation and first pass of this PR.

SDL supports the following use cases:

  • Normal operation with fast parameter checks (default):
    SDL_SetHint(SDL_HINT_INVALID_PARAM_CHECKS, "1");
  • Object parameters are checked for use-after-free issues:
    SDL_SetHint(SDL_HINT_INVALID_PARAM_CHECKS, "2");
  • Enable full validation, plus assert on invalid parameters:
    #define SDL_ASSERT_INVALID_PARAMS
  • Disable all parameter validation:
    #define SDL_DISABLE_INVALID_PARAMS

Closes #13213
Closes #13943

@slouken slouken added this to the 3.4.0 milestone Sep 17, 2025
@slouken
Copy link
Collaborator Author

slouken commented Sep 17, 2025

Okay, thanks for the feedback everyone! I think this is a much improved approach.

@icculus, @sezero, @madebr, @smcv, @AntTheAlchemist, @mechakotik, @bjorn

@slouken slouken force-pushed the param-checks2 branch 2 times, most recently from 3898212 to fab019c Compare September 17, 2025 02:28
@bjorn
Copy link

bjorn commented Sep 17, 2025

I like the fact that this change reverts the behavior to mostly match SDL2 in terms of parameter checking, by making the slow object validation an opt-in. It's also nice that it's still possible to compile the checks out entirely. I agree with mechakotik's comment that a runtime NULL-check bypass doesn't add any value (in terms of performance, though it might have been useful for debugging purposes).

It seems to me that in case of SDL_DISABLE_INVALID_PARAMS, one could #define CHECK_PARAM(invalid) while (false) as well to avoid the need to wrap it in PARAMETER_CHECKS in almost all places, which especially benefits almost all uses of helper macros like CHECK_TEXTURE_MAGIC and CHECK_RENDERER_MAGIC.

@slouken
Copy link
Collaborator Author

slouken commented Sep 17, 2025

It seems to me that in case of SDL_DISABLE_INVALID_PARAMS, one could #define CHECK_PARAM(invalid) while (false) as well to avoid the need to wrap it in PARAMETER_CHECKS in almost all places, which especially benefits almost all uses of helper macros like CHECK_TEXTURE_MAGIC and CHECK_RENDERER_MAGIC.

That's a good point, I'll go ahead and do that, thanks!

@slouken slouken force-pushed the param-checks2 branch 2 times, most recently from 5f78ed2 to ae65c10 Compare September 17, 2025 14:17
@slouken
Copy link
Collaborator Author

slouken commented Sep 17, 2025

@bjorn, that made the diff much more readable and less intrusive, thanks!

@mechakotik
Copy link
Contributor

I think it's a good idea to explicitly inline the "fast checks" part of SDL_ObjectValid, e.g.

SDL_INLINE bool SDL_ObjectValid(void *object, SDL_ObjectType type)
{
    if (!object) {
        return false;
    }

    if (!SDL_object_validation) {
        return true;
    }

    return CheckObjectInHashTable(object, type);
}

Also, why would we spend resources maintaining SDL_objects hash table when validation is disabled? I think it's better to require hint to be set before SDL initialization and add bypass to SDL_SetObjectValid (like I did originally). Don't see any usecase for enabling/disabling validation mid-execution.

@slouken
Copy link
Collaborator Author

slouken commented Sep 17, 2025

I think it's a good idea to explicitly inline the "fast checks" part of SDL_ObjectValid

Good idea, I'll go ahead and do that.

Also, why would we spend resources maintaining SDL_objects hash table when validation is disabled?

Because it allows object leak tracking in the default case when full object validation isn't enabled.

I think it's better to require hint to be set before SDL initialization and add bypass to SDL_SetObjectValid (like I did originally). Don't see any usecase for enabling/disabling validation mid-execution.

This allows an application to set it at startup after other code (possibly in a third party library) has been run and potentially triggered this initialization.

@slouken slouken force-pushed the param-checks2 branch 2 times, most recently from 3b139ee to b8500a8 Compare September 17, 2025 14:44
@icculus
Copy link
Collaborator

icculus commented Sep 17, 2025

Okay, I've warmed up on this in the latest revisions. It's pretty good looking now!

@slouken slouken force-pushed the param-checks2 branch 2 times, most recently from 33ecadd to bbb841b Compare September 17, 2025 23:50
@slouken
Copy link
Collaborator Author

slouken commented Sep 18, 2025

@madebr, do you see any reason why the Emscripten build would abort in the video_raiseWindow() test?

@madebr
Copy link
Contributor

madebr commented Sep 18, 2025

@madebr, do you see any reason why the Emscripten build would abort in the video_raiseWindow() test?

I cannot reproduce locally.

Can you rebase your branch on top of current master, and add [sdl-ci-artifacts] to your last commit message?
I just pushed a change such that the Emscripten artifacts will contain the html, js and wasm files of the tests.

@slouken
Copy link
Collaborator Author

slouken commented Sep 18, 2025

@madebr, do you see any reason why the Emscripten build would abort in the video_raiseWindow() test?

I cannot reproduce locally.

Can you rebase your branch on top of current master, and add [sdl-ci-artifacts] to your last commit message? I just pushed a change such that the Emscripten artifacts will contain the html, js and wasm files of the tests.

Done!

@madebr
Copy link
Contributor

madebr commented Sep 18, 2025

I can reproduce the error (occasionally) using chromium.
There's out-of-bounds access somewhere:

>>> Test 'video_getWindowSurface': Passed
----- Test Case 24.23: 'video_raiseWindow' started
Test Description: 'Checks window focus'

[post-exception status] Exception thrown, see JavaScript console
Uncaught RuntimeError: memory access out of bounds
    at dlmalloc (dlmalloc.c:4576:17)
    at real_malloc (SDL_malloc.c:6334:54)
    at SDL_TrackAllocation (SDL_test_memory.c:133:39)
    at SDLTest_TrackedMalloc (SDL_test_memory.c:238:9)
    at SDL_malloc (SDL_malloc.c:6459:11)
    at testautomation.js:784:22
    at SDL3.makePointerEventCStruct (testautomation.js:1106:434)
    at target.sdlEventHandlerMouseButtonUpGlobal (testautomation.js:1105:160)

Other times, the test exits fine with error code 0, but this error gets printed after exit:

Aborted(Stack overflow! Stack cookie has been overwritten at 0x00000004, expected hex dwords 0x89BACDFE and 0x2135467, but received 0x00000273 0x02135467)

Reproducer steps (you need to rebase one more time on top of current main because of a .wasm.map oversight): This job has all files

  • Download the SDL-emscripten.zip archive from GitHub actions
  • Extract the zip and embedded SDL3-3.3.0-Emscripten.tar.gz somewhere in /tmp
  • Checkout SDL git to the version used to build
  • Start a http server, using the server.py script (modify the paths if needed)
    export SDL_HOME=$HOME/projects/SDL
    export TEST_DIR=/tmp/dist/SDL3-3.3.0-Emscripten/libexec/installed-tests/SDL3
    $SDL_HOME/test/emscripten/server.py -d $TEST_DIR --map $SDL_HOME:/SDL
  • Start chrome, open the developer tools and go to this local url

@slouken
Copy link
Collaborator Author

slouken commented Sep 18, 2025

Okay, I've rebased and rebuilt. Does the error happen if you run testautomation --filter video_raiseWindow to just do that test?

@madebr
Copy link
Contributor

madebr commented Sep 18, 2025

Okay, I've rebased and rebuilt. Does the error happen if you run testautomation --filter video_raiseWindow to just do that test?

No, sadly it does not.

I can reproduce errors locally by adding this to code to the SDL_{malloc,calloc,free}`

#ifdef __EMSCRIPTEN__
    EM_ASM({
        checkStackCookie();
    });
#endif

Anyhow, I think the error is in SDL and triggered by video_getWindowSurface.
Building SDL with address (and undefined) sanitizers (targeting native Linux) causes a crash with the following log:

==62800==ERROR: AddressSanitizer: heap-use-after-free on address 0x7c9461e3b04c at pc 0x7f7463a1f096 bp 0x7ffcc577dbb0 sp 0x7ffcc577dba8
READ of size 4 at 0x7c9461e3b04c thread T0
    #0 0x7f7463a1f095 in SDL_DestroyTexture_REAL /home/maarten/projects/SDL/src/render/SDL_render.c:5430
    #1 0x7f746408bcd6 in SDL_CleanupWindowTextureData /home/maarten/projects/SDL/src/video/SDL_video.c:305
    #2 0x7f7463813777 in SDL_FreePropertyWithCleanup /home/maarten/projects/SDL/src/SDL_properties.c:64
    #3 0x7f74638138d0 in SDL_FreeProperty /home/maarten/projects/SDL/src/SDL_properties.c:81
    #4 0x7f746380b7f2 in destroy_all /home/maarten/projects/SDL/src/SDL_hashtable.c:434
    #5 0x7f746380baf8 in SDL_DestroyHashTable /home/maarten/projects/SDL/src/SDL_hashtable.c:456
    #6 0x7f7463813942 in SDL_FreeProperties /home/maarten/projects/SDL/src/SDL_properties.c:87
    #7 0x7f74638194cd in SDL_DestroyProperties_REAL /home/maarten/projects/SDL/src/SDL_properties.c:820
    #8 0x7f74640bf2b9 in SDL_DestroyWindow_REAL /home/maarten/projects/SDL/src/video/SDL_video.c:4444
    #9 0x7f74638b860f in SDL_DestroyWindow /home/maarten/projects/SDL/src/dynapi/SDL_dynapi_procs.h:184
    #10 0x0000004bcac2 in video_getWindowSurface /home/maarten/projects/SDL/test/testautomation_video.c:2442
    #11 0x0000004ead7c in SDLTest_RunTest /home/maarten/projects/SDL/src/test/SDL_test_harness.c:269
    #12 0x0000004ee6b9 in SDLTest_ExecuteTestSuiteRunner /home/maarten/projects/SDL/src/test/SDL_test_harness.c:674
    #13 0x000000402fef in main /home/maarten/projects/SDL/test/testautomation.c:137
    #14 0x7f7462e11574 in __libc_start_call_main (/lib64/libc.so.6+0x3574) (BuildId: 48c4b9b1efb1df15da8e787f489128bf31893317)
    #15 0x7f7462e11627 in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x3627) (BuildId: 48c4b9b1efb1df15da8e787f489128bf31893317)
    #16 0x000000402374 in _start (/home/maarten/projects/SDL/cmake-build-debug/test/testautomation+0x402374) (BuildId: e3ff775a830f6ffed75842c6ba963bbdb73fdbf8)

0x7c9461e3b04c is located 12 bytes inside of 304-byte region [0x7c9461e3b040,0x7c9461e3b170)
freed by thread T0 here:
    #0 0x7f74678e5beb in free.part.0 (/lib64/libasan.so.8+0xe5beb) (BuildId: 10b8ccd49f75c21babf1d7abe51bb63589d8471f)
    #1 0x7f7463cf6c94 in real_free /home/maarten/projects/SDL/src/stdlib/SDL_malloc.c:6337
    #2 0x7f7463cf721c in SDL_free_REAL /home/maarten/projects/SDL/src/stdlib/SDL_malloc.c:6512
    #3 0x7f7463a1ef31 in SDL_DestroyTextureInternal /home/maarten/projects/SDL/src/render/SDL_render.c:5423
    #4 0x7f7463a1fdcd in SDL_DestroyRendererWithoutFreeing /home/maarten/projects/SDL/src/render/SDL_render.c:5491
    #5 0x7f74640bf0de in SDL_DestroyWindow_REAL /home/maarten/projects/SDL/src/video/SDL_video.c:4434
    #6 0x7f74638b860f in SDL_DestroyWindow /home/maarten/projects/SDL/src/dynapi/SDL_dynapi_procs.h:184
    #7 0x0000004bcac2 in video_getWindowSurface /home/maarten/projects/SDL/test/testautomation_video.c:2442
    #8 0x0000004ead7c in SDLTest_RunTest /home/maarten/projects/SDL/src/test/SDL_test_harness.c:269
    #9 0x0000004ee6b9 in SDLTest_ExecuteTestSuiteRunner /home/maarten/projects/SDL/src/test/SDL_test_harness.c:674
    #10 0x000000402fef in main /home/maarten/projects/SDL/test/testautomation.c:137
    #11 0x7f7462e11574 in __libc_start_call_main (/lib64/libc.so.6+0x3574) (BuildId: 48c4b9b1efb1df15da8e787f489128bf31893317)
    #12 0x7f7462e11627 in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x3627) (BuildId: 48c4b9b1efb1df15da8e787f489128bf31893317)
    #13 0x000000402374 in _start (/home/maarten/projects/SDL/cmake-build-debug/test/testautomation+0x402374) (BuildId: e3ff775a830f6ffed75842c6ba963bbdb73fdbf8)

previously allocated by thread T0 here:
    #0 0x7f74678e68a3 in calloc (/lib64/libasan.so.8+0xe68a3) (BuildId: 10b8ccd49f75c21babf1d7abe51bb63589d8471f)
    #1 0x7f7463cf6c55 in real_calloc /home/maarten/projects/SDL/src/stdlib/SDL_malloc.c:6335
    #2 0x7f7463cf715d in SDL_calloc_REAL /home/maarten/projects/SDL/src/stdlib/SDL_malloc.c:6478
    #3 0x7f74639d5d37 in SDL_CreateTextureWithProperties_REAL /home/maarten/projects/SDL/src/render/SDL_render.c:1440
    #4 0x7f74639d8cf6 in SDL_CreateTexture_REAL /home/maarten/projects/SDL/src/render/SDL_render.c:1580
    #5 0x7f746408d0b1 in SDL_CreateWindowTexture /home/maarten/projects/SDL/src/video/SDL_video.c:439
    #6 0x7f74640b41b2 in SDL_CreateWindowFramebuffer /home/maarten/projects/SDL/src/video/SDL_video.c:3622
    #7 0x7f74640b4a3e in SDL_GetWindowSurface_REAL /home/maarten/projects/SDL/src/video/SDL_video.c:3653
    #8 0x7f74638bbc13 in SDL_GetWindowSurface /home/maarten/projects/SDL/src/dynapi/SDL_dynapi_procs.h:596
    #9 0x0000004bca67 in video_getWindowSurface /home/maarten/projects/SDL/test/testautomation_video.c:2436
    #10 0x0000004ead7c in SDLTest_RunTest /home/maarten/projects/SDL/src/test/SDL_test_harness.c:269
    #11 0x0000004ee6b9 in SDLTest_ExecuteTestSuiteRunner /home/maarten/projects/SDL/src/test/SDL_test_harness.c:674
    #12 0x000000402fef in main /home/maarten/projects/SDL/test/testautomation.c:137
    #13 0x7f7462e11574 in __libc_start_call_main (/lib64/libc.so.6+0x3574) (BuildId: 48c4b9b1efb1df15da8e787f489128bf31893317)
    #14 0x7f7462e11627 in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x3627) (BuildId: 48c4b9b1efb1df15da8e787f489128bf31893317)
    #15 0x000000402374 in _start (/home/maarten/projects/SDL/cmake-build-debug/test/testautomation+0x402374) (BuildId: e3ff775a830f6ffed75842c6ba963bbdb73fdbf8)

Reproducer:

test/testautomation --filter video_getWindowSurface

@slouken
Copy link
Collaborator Author

slouken commented Sep 19, 2025

Anyhow, I think the error is in SDL and triggered by video_getWindowSurface. Building SDL with address (and undefined) sanitizers (targeting native Linux) causes a crash with the following log:

Ah, thank you for the stack trace, that was very helpful. Also, I see now that we were relying on the object validity check to prevent a crash here before. I wonder how many other people are going to run into similar problems?

SDL supports the following use cases:
 * Normal operation with fast parameter checks (default):
    SDL_SetHint(SDL_HINT_INVALID_PARAM_CHECKS, "1");
 * Object parameters are checked for use-after-free issues:
    SDL_SetHint(SDL_HINT_INVALID_PARAM_CHECKS, "2");
 * Enable full validation, plus assert on invalid parameters:
    #define SDL_ASSERT_INVALID_PARAMS
 * Disable all parameter validation:
    #define SDL_DISABLE_INVALID_PARAMS
@slouken slouken merged commit 0eff3fe into libsdl-org:main Sep 19, 2025
41 checks passed
@slouken slouken deleted the param-checks2 branch September 19, 2025 03:58
@madebr
Copy link
Contributor

madebr commented Sep 19, 2025

Anyhow, I think the error is in SDL and triggered by video_getWindowSurface. Building SDL with address (and undefined) sanitizers (targeting native Linux) causes a crash with the following log:

Ah, thank you for the stack trace, that was very helpful. Also, I see now that we were relying on the object validity check to prevent a crash here before. I wonder how many other people are going to run into similar problems?

testsprite also has an obvious bug. So there will be others.

@AntTheAlchemist
Copy link
Contributor

I wonder how many other people are going to run into similar problems?

🙈

I found one in SDL_mixer.c: MIX_CreateMixerDevice() > MIX_CreateGroup() > LockMixer() > SDL_LockAudioStream(mixer->output_stream);. mixer->output_stream is null.

Shall I create a new issue for this over at SDL_mixer?

@AntTheAlchemist
Copy link
Contributor

Another one, when plugging in a gamepad. SDL_PumpEventsInternal(false) > SDL_PumpEventMaintenance() > SDL_UpdateJoysticks() > WINDOWS_JoystickDetect() > SDL_PrivateJoystickAdded(3) > SDL_IsGamepad(3) > SDL_FindInHashTable(s_gamepadInstanceIDs, (void *)(uintptr_t)instance_id, &value). s_gamepadInstanceIDs is null.

To be fair, I don't think there are many of these. This is a good thing, right? It's exposing potential internal SDL bugs? We can stamp them out as we find them.

@AntTheAlchemist
Copy link
Contributor

On Android: SDL_InitJoysticks() > SDL_InitGamepadMappings > SDL_AddGamepadMappingsFromFile() > SDL_AddGamepadMappingsFromIO(SDL_IOFromFile(file, "rb"), true). I'm guessing src is null, because file doesn't exist, so SDL_GetIOSize() fails because it's passed a null and tries to context->iface.size when context is null.

@bjorn
Copy link

bjorn commented Sep 19, 2025

I wonder how many other people are going to run into similar problems?

By default there are still fast null checks, which would likely prevent all the above issues, right? I don't have time to look into each of them, but it seems like they are being found in optimized builds with checks entirely disabled?

I do wonder a little whether the option to remove these checks entirely was a good idea. Now all code will need to be adjusted to handle the case where these checks have been compiled out, which may in many cases cause needless double-checking of the parameters in a default build (checking the parameter both outside and inside the function). It may be worth it for optimized builds, but has any difference in performance actually been demonstrated? (as opposed to disabling just SDL_ObjectValid checks, which definitely had a measurable impact)

@slouken
Copy link
Collaborator Author

slouken commented Sep 19, 2025

testsprite also has an obvious bug. So there will be others.

Yep. Well, we might as well fix these. Can you provide a PR for this?

@madebr
Copy link
Contributor

madebr commented Sep 19, 2025

testsprite also has an obvious bug. So there will be others.

Yep. Well, we might as well fix these. Can you provide a PR for this?

I just pushed it to git, since it was a trivial fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants