Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashing tests on Windows x64 #15709

Closed
cmb69 opened this issue Sep 2, 2024 · 22 comments
Closed

Crashing tests on Windows x64 #15709

cmb69 opened this issue Sep 2, 2024 · 22 comments

Comments

@cmb69
Copy link
Member

cmb69 commented Sep 2, 2024

Description

As of 4dc7795, a couple of tests are crashing. (Note that the issues have no introduced by this commit, but are merely visible now.)

=====================================================================
FAILED TEST SUMMARY
---------------------------------------------------------------------
Bug #54268 (Double free when destroy_zend_class fails) [D:\a\php-src\php-src\Zend\tests\bug54268.phpt]
GH-9407: LSP error in eval'd code refers to wrong class for static type [D:\a\php-src\php-src\Zend\tests\gh9407.phpt]
Stack limit 014 - Fuzzer [D:\a\php-src\php-src\Zend\tests\stack_limit\stack_limit_014.phpt]
GH-14639 (Member access within null pointer in ext/spl/spl_observer.c) [D:\a\php-src\php-src\ext\spl\tests\gh14639.phpt]
Bug #45392 (ob_start()/ob_end_clean() and memory_limit) [D:\a\php-src\php-src\tests\lang\bug45392.phpt]
=====================================================================

4 of these test fail with status code FFFFFFFFC00000FF, which is ERROR_EA_LIST_INCONSISTENT, while one fails with FFFFFFFFC0000028, which is ERROR_OUT_OF_PAPER (these errors don't make much sense, but the German translation of ERROR_OUT_OF_PAPER even less: "Der Drucker ist aus Papier.")

Anyhow, this appears to be a tracing JIT issue.

PHP Version

master

Operating System

Windows

@cmb69
Copy link
Member Author

cmb69 commented Sep 2, 2024

Oops, apparently looked up the wrong error codes: FFFFFFFFC0000028 is STATUS_BAD_STACK and FFFFFFFFC00000FF is STATUS_BAD_FUNCTION_TABLE. Makes more sense.

Note that locally I can only reproduce the crashes of Zend\tests\gh9407.phpt and tests\lang\bug45392.phpt, and both fail with STATUS_BAD_STACK, but only with tracing JIT enabled. gh9407.phpt fails on the LONGJMP() in _zend_bailout().

@cmb69
Copy link
Member Author

cmb69 commented Sep 2, 2024

#14919 (comment) seems to be related.

PS: indeed, the tests are only failing when built with Visual Studio 2022 on x64 (but not on x86, and not when built with Visual Studio 2019). There might be an issue with vs17, or maybe just more thorough checking.

cmb69 added a commit to cmb69/php-src that referenced this issue Sep 2, 2024
This is a stop-gap measure for phpGH-15709 to keep CI green.
@iluuu1994
Copy link
Member

Der Drucker ist aus Papier.

Lol. That sounds like it would be a problem.

cmb69 added a commit that referenced this issue Sep 2, 2024
This is a stop-gap measure for GH-15709 to keep CI green.
@Girgias
Copy link
Member

Girgias commented Sep 9, 2024

If/when this issue is fixed 6d59620 should be reverted.

cmb69 referenced this issue Oct 23, 2024
The error changed in master, not 8.4. My bad.
@cmb69
Copy link
Member Author

cmb69 commented Oct 23, 2024

‎Zend/tests/gh16508.phpt has the same issue (STATUS_BAD_FUNCTION_TABLE). I cannot reproduce locally like for the other STATUS_BAD_FUNCTION_TABLE tests. I can only reproduce STATUS_BAD_STACK issues. I'll have a closer look.

cmb69 added a commit that referenced this issue Oct 23, 2024
This is a stop-gap measure for GH-15709 to keep CI green.
cmb69 added a commit that referenced this issue Oct 23, 2024
This is a stop-gap measure for GH-15709 to keep CI green.

Sorry, xfailed the wrong test case previously.
@cmb69
Copy link
Member Author

cmb69 commented Oct 23, 2024

I've experimented with this locally, focusing on the Zend/tests/gh9407.phpt, since I can reproduce the STATUS_BAD_STACK reliably with x64/vs17 ZTS builds on master. I cannot, however, reproduce with x86 builds, nor NTS builds. nor with vs16, nor with vs17 builds of PHP-8.3. I also cannot reproduce if running gh9407.php. I've also did some experiments with CI. Some findings:

PS: the issue apparently does not occur if ZEND_DEBUG=1, and is likely somewhat related to ASan builds on Windows.

@danog
Copy link
Contributor

danog commented Nov 16, 2024

Note that adding --repeat 2 and improving the JIT flags as per #12406 also creates a number of additional issues: https://github.com/danog/php-src/actions/runs/11869776299/job/33080469494#step:6:470

cmb69 added a commit to cmb69/php-src that referenced this issue Nov 30, 2024
These look related to php#15709, but apparently only fail with ASan
instrumentation.  Let's xfail these for now.
dstogov added a commit to dstogov/php-src that referenced this issue Dec 4, 2024
@dstogov
Copy link
Member

dstogov commented Dec 4, 2024

@cmb69 you already made an excellent investigation.

The problems occurs when PHP calls longjmp() and we have a call frame of JIT-ed function somewhere upper in the stack trace. JIT doesn't generate any SEH (Structured Exception Handling) information and RTL Unwinder fails.

This works fine on 32-bit Windows because it doesn't use SEH.

Actually, I'm very surprised that SEH is used for longjump(). This is MSFT invention and the root of the problem.

There are few ways to fix this.

  1. Completely disable SEH. GCC may do this with -fno-excptions or -fno-unwind-tables. I'm not sure about MSVC. May be /SAFESEH:NO.
  2. Use the setjmp(..., NULL) hack. I saw your attempt in Testing #16563, but that approach may be incorrect. It's better to fix the setjump() call at first place (may be using assembler).
  3. Fix JIT. May be some JIT-ed function doesn't follow to expected calling convention.
  4. Generate SEH info for all JIT function. This will take time... I didn't run Windows for ages. Right now I'm even not able to start a VM.

Could you please do additional investigation regarding (3). May be this would help to understand more details.
I need to see the stack backtrace (at the moment of calling the crashing longjmp) and the disassemble of all the involved JIT-ed functions.

dstogov added a commit that referenced this issue Dec 4, 2024
* Fix GH-9011: Assertion failure with tracing JIT

* Temporay SKIP the test on 64-bit Windows because of GH-15709
@cmb69
Copy link
Member Author

cmb69 commented Dec 4, 2024

Thanks @dstogov for looking into this!

Completely disable SEH

/SAFESEH:NO is not supported on x64 (compiler explicitly rejects this option). I've already disabled /EH before, but that didn't help.

Use the setjmp(..., NULL) hack

I'm afraid that will lead nowhere, since this is nowadays apparently a compiler instrinsic. We could roll our own setjmp(), but besides that it appears to be a fragile solution per se, it likely causes issues with C++ code on Windows.

Fix JIT

Ah indeed there might be something wrong per se. Running Zend/tests/gh9407.phpt with current master gives STATUS_BAD_STACK.

backtrace
 	ntdll.dll!00007ffc0b872576()	Unbekannt	Es wurden keine Symbole geladen.
 	ntdll.dll!00007ffc0b7a0bb3()	Unbekannt	Es wurden keine Symbole geladen.
 	ntdll.dll!00007ffc0b7c1f3d()	Unbekannt	Es wurden keine Symbole geladen.
 	vcruntime140.dll!00007ffbf12010a9()	Unbekannt	Es wurden keine Symbole geladen.
>	php8ts.dll!_zend_bailout(const char * filename, unsigned int lineno) Zeile 1265	C	Symbole wurden geladen.
 	php8ts.dll!zend_try_early_bind(_zend_class_entry * ce, _zend_class_entry * parent_ce, _zend_string * lcname, _zval_struct * delayed_early_binding) Zeile 3874	C	Symbole wurden geladen.
 	php8ts.dll!zend_compile_class_decl(_znode * result, _zend_ast * ast, bool toplevel) Zeile 9138	C	Symbole wurden geladen.
 	php8ts.dll!zend_compile_top_stmt(_zend_ast * ast) Zeile 11341	C	Symbole wurden geladen.
 	php8ts.dll!zend_compile_top_stmt(_zend_ast * ast) Zeile 11325	C	Symbole wurden geladen.
 	php8ts.dll!zend_compile(int type) Zeile 620	C	Symbole wurden geladen.
 	php8ts.dll!compile_string(_zend_string * source_string, const char * filename, _zend_compile_position position) Zeile 817	C	Symbole wurden geladen.
 	php8ts.dll!zend_include_or_eval(_zval_struct * inc_filename_zv, int type) Zeile 5182	C	Symbole wurden geladen.
 	php8ts.dll!ZEND_INCLUDE_OR_EVAL_SPEC_CONST_HANDLER(_zend_execute_data * execute_data) Zeile 5248	C	Symbole wurden geladen.
 	0000100008000d11()	Unbekannt	IDE-generierter Code
 	0000000000000007()	Unbekannt	IDE-generierter Code
 	00000203e6a15070()	Unbekannt	IDE-generierter Code
 	00000203e6a15070()	Unbekannt	IDE-generierter Code
 	php_opcache.dll!zend_jit_trace_hot_root(_zend_execute_data * execute_data, const _zend_op * opline) Zeile 8064	C	Symbole wurden geladen.

The disassemblies around the relevant addresses (note that I've build with libcapstone, but I don't think that helps currently for MSVC). If you need more context, I can provide that.

00000203E6A15070
00000203E6A15064  add         eax,dword ptr [rdx]  
00000203E6A15066  add         byte ptr [rax],al  
00000203E6A15068  add         byte ptr [rax],al  
00000203E6A1506A  add         byte ptr [rax],al  
00000203E6A1506C  add         byte ptr [rax],al  
00000203E6A1506E  add         byte ptr [rax],al  
00000203E6A15070  cmp         byte ptr [rax+1000012Ah],ah  
00000203E6A15076  add         byte ptr [rax],al  
00000203E6A15078  add         byte ptr [rax],al  
00000203E6A1507A  add         byte ptr [rax],al  
00000203E6A1507C  add         byte ptr [rax],al  
00000203E6A1507E  add         byte ptr [rax],al  
00000203E6A15080  add         byte ptr [rax],al  
00000203E6A15082  add         byte ptr [rax],al  
00000203E6A15084  add         byte ptr [rax],al  
00000203E6A15086  add         byte ptr [rax],al  
00000203E6A15088  sbb         byte ptr [rdi+1000012Ah],bl  
00000203E6A1508E  add         byte ptr [rax],al  
00000203E6A15090  mov         al,9Ch  
00000203E6A15092  sub         al,byte ptr [rcx]  
00000203E6A15094  add         byte ptr [rax],dl  
00000203E6A15096  add         byte ptr [rax],al  
00000203E6A15098  add         byte ptr [rax],al  
00000203E6A1509A  add         byte ptr [rax],al  
00000203E6A1509C  add         byte ptr [rax],al  
00000203E6A1509E  add         byte ptr [rax],al  
00000203E6A150A0  and         byte ptr [rax-5Fh],dl  
00000203E6A150A3  out         3,al  
00000203E6A150A5  add         al,byte ptr [rax]  
00000203E6A150A7  add         byte ptr [rax],al  
00000203E6A150A9  add         byte ptr [rax],al  
00000203E6A150AB  add         byte ptr [rax],al  
00000203E6A150AD  add         byte ptr [rax],al  
00000203E6A150AF  add         byte ptr [rax+3E6A057h],bl  
00000203E6A150B5  add         al,byte ptr [rax]  
00000203E6A150B7  add         byte ptr [rax],al  
00000203E6A150B9  add         byte ptr [rax],al  
00000203E6A150BB  add         byte ptr [rax],al  
00000203E6A150BD  add         byte ptr [rax],al  
00000203E6A150BF  add         byte ptr [rax],al  
00000203E6A150C1  add         byte ptr [rax],al  
00000203E6A150C3  add         byte ptr [rax],al  
00000203E6A150C5  add         byte ptr [rax],al  
00000203E6A150C7  add         byte ptr [rax],al  
00000203E6A150C9  add         byte ptr [rax],al  
00000203E6A150CB  add         byte ptr [rax],al  
00000203E6A150CD  add         byte ptr [rax],al  
00000203E6A150CF  add         byte ptr [rax],al  
00000203E6A150D1  add         byte ptr [rax],al  
00000203E6A150D3  add         byte ptr [rax],al  
00000203E6A150D5  add         byte ptr [rax],al  
00000203E6A150D7  add         byte ptr [rax],al  
00000203E6A150D9  add         byte ptr [rax],al  
00000203E6A150DB  add         byte ptr [rax],al  
00000203E6A150DD  add         byte ptr [rax],al  
00000203E6A150DF  add         byte ptr [rax],al  
00000203E6A150E1  add         byte ptr [rax],al  
00000203E6A150E3  add         byte ptr [rax],al  
00000203E6A150E5  add         byte ptr [rax],al  
00000203E6A150E7  add         byte ptr [rax],al  
00000203E6A150E9  add         byte ptr [rax],al  
00000203E6A150EB  add         byte ptr [rax],al  
00000203E6A150ED  add         byte ptr [rax],al  
00000203E6A150EF  add         byte ptr [rax],al  
00000203E6A150F1  add         byte ptr [rax],al  
00000203E6A150F3  add         byte ptr [rax],al  
00000203E6A150F5  add         byte ptr [rax],al  
00000203E6A150F7  add         ch,al  
00000203E6A150F9  push        rbp  
0000100008000D11
0000100008000CD0  push        rbx  
0000100008000CD1  push        rbp  
0000100008000CD2  push        rsi  
0000100008000CD3  push        rdi  
0000100008000CD4  push        r12  
0000100008000CD6  push        r13  
0000100008000CD8  push        r14  
0000100008000CDA  push        r15  
0000100008000CDC  sub         rsp,28h  
0000100008000CE0  mov         r14,rcx  
0000100008000CE3  mov         rax,qword ptr gs:[58h]  
0000100008000CEC  mov         rax,qword ptr [rax+38h]  
0000100008000CF0  mov         rax,qword ptr [rax+8]  
0000100008000CF4  mov         dword ptr [rax+29D8h],2  
0000100008000CFE  sub         rsp,20h  
0000100008000D02  mov         rcx,r14  
0000100008000D05  mov         rax,offset ZEND_INCLUDE_OR_EVAL_SPEC_CONST_HANDLER (07FFB71524B60h)  
0000100008000D0F  call        rax  
0000100008000D11  add         rsp,20h  
0000100008000D15  mov         rax,qword ptr gs:[58h]  
0000100008000D1E  mov         rax,qword ptr [rax+38h]  
0000100008000D22  mov         rax,qword ptr [rax+8]  
0000100008000D26  cmp         qword ptr [rax+2C28h],0  
0000100008000D2E  jne         0000100008000110  
0000100008000D34  mov         eax,2  
0000100008000D39  add         rsp,28h  
0000100008000D3D  pop         r15  
0000100008000D3F  pop         r14  
0000100008000D41  pop         r13  
0000100008000D43  pop         r12  
0000100008000D45  pop         rdi  
0000100008000D46  pop         rsi  
0000100008000D47  pop         rbp  
0000100008000D48  pop         rbx  
0000100008000D49  ret  

I do not understand that backtrace. How could code at 0000000000000007 be executed? And what is that code at 00000203E6A15070. Only the code at 0000100008000D11 is from SHM, and only that disassembly makes some sense to me.

Generate SEH info for all JIT function.

I'm pretty sure that we need to use RtlInstallFunctionTableCallback() for proper fixing of STATUS_BAD_FUNCTION_TABLE at least. But that might actually be unrelated to the STATUS_BAD_STACK issues.

@cmb69
Copy link
Member Author

cmb69 commented Dec 5, 2024

I do not understand that backtrace.

Apparently, that backtrace can't make sense since frame information is missing for the debugger.

We could roll our own setjmp(), but besides that it appears to be a fragile solution per se, it likely causes issues with C++ code on Windows.

I made the following

setjmp() experiment
C:\php-sdk\phpdev\vs17\x64
$ type setjmp.c
#include <Windows.h>
#include <stdio.h>
#include <setjmp.h>

static jmp_buf env;

void sub()
{
    __try {
        printf("sub\n");
        longjmp(env, 1);
    } __finally {
        printf("finally\n");
    }
}

int main(int argc, char *argv[])
{
    HMODULE mod = LoadLibraryA("C:\\Windows\\System32\\ucrtbase.dll");
    FARPROC proc = GetProcAddress(mod, "setjmp");
    printf("start\n");
    if (argc < 2) {
        if (!setjmp(env)) {
            sub();
        } else {
            printf("jumped via setjmp\n");
        }
    } else {
        if (!proc(env, 0)) {
            sub();
        } else {
            printf("jumped via proc\n");
        }
    }
    printf("end\n");
    return 0;
}

C:\php-sdk\phpdev\vs17\x64
$ cl /nologo /Od /Zi setjmp.c
setjmp.c

C:\php-sdk\phpdev\vs17\x64
$ setjmp.exe
start
sub
finally
jumped via setjmp
end

C:\php-sdk\phpdev\vs17\x64
$ setjmp.exe 1
start
sub
jumped via proc
end

So apparently, setjmp() accepts two parameters; only the headers (and maybe the compiler) prevent to easily use this. I'll see whether this can be used for php-src, and would at least solve some of the issues.

@dstogov
Copy link
Member

dstogov commented Dec 5, 2024

Modification of setjmp() call should be the simplest solution. I'm not sure if it'll solve all the problems.

I see JIT-ed code modifies %rsp in the middle of the function. This is not recommended for _WIN64. In the provided disassemble sub rsp,20h allocates shadow space for arguments. For this function, it may be pre-allocated in the function prologue. May be this will help (anyway, this will make the code better), but I'm not sure how it's possible to completely avoid %rsp modification in general (especially for dynamic alloca).

V8 and HotSpot use RtlAddFunctionTable, but they use exceptions and they work in a single process. In PHP function may be compiled by one worker and executed by the other. RtlInstallFunctionTableCallback may be a right direction, but this will require significant efforts.

@dstogov
Copy link
Member

dstogov commented Dec 5, 2024

A quick question PHP-8.3 and below are not affected or just weren't tested with the new MSVC?

@cmb69
Copy link
Member Author

cmb69 commented Dec 5, 2024

A quick question PHP-8.3 and below are not affected or just weren't tested with the new MSVC?

That's apparently only an issue with the new MSVC (vs17). I found no issues when building PHP-8.4/master with vs16, IIRC.

Switching back to vs16 would likely solve this issue for now, but we would probably need to rebuilt everything (dependencies and extensions).

@dstogov
Copy link
Member

dstogov commented Dec 5, 2024

Did you saw the same problem in PHP-8.3 built with vs17?

@cmb69
Copy link
Member Author

cmb69 commented Dec 5, 2024

At least Zend\tests\gh9407.php passes on PHP-8.3 build with vs17.

backtrace (on a breakpoint)
>	php8ts.dll!_zend_bailout(const char * filename, unsigned int lineno) Zeile 1214	C	Symbole wurden geladen.
 	php8ts.dll!zend_try_early_bind(_zend_class_entry * ce, _zend_class_entry * parent_ce, _zend_string * lcname, _zval_struct * delayed_early_binding) Zeile 3435	C	Symbole wurden geladen.
 	php8ts.dll!zend_compile_class_decl(_znode * result, _zend_ast * ast, bool toplevel) Zeile 8189	C	Symbole wurden geladen.
 	php8ts.dll!zend_compile_top_stmt(_zend_ast * ast) Zeile 10360	C	Symbole wurden geladen.
 	php8ts.dll!zend_compile_top_stmt(_zend_ast * ast) Zeile 10344	C	Symbole wurden geladen.
 	php8ts.dll!zend_compile(int type) Zeile 620	C	Symbole wurden geladen.
 	php8ts.dll!compile_string(_zend_string * source_string, const char * filename, _zend_compile_position position) Zeile 817	C	Symbole wurden geladen.
 	php8ts.dll!zend_include_or_eval(_zval_struct * inc_filename_zv, int type) Zeile 4950	C	Symbole wurden geladen.
 	php8ts.dll!ZEND_INCLUDE_OR_EVAL_SPEC_CONST_HANDLER(_zend_execute_data * execute_data) Zeile 4986	C	Symbole wurden geladen.
 	0000100008000cdb()	Unbekannt	IDE-generierter Code
Disassembly around 0000100008000cdb
0000100008000C9D  ret  
0000100008000C9E  add         byte ptr [rax],al  
0000100008000CA0  sub         rsp,58h  
0000100008000CA4  mov         qword ptr [rsp+48h],r14  
0000100008000CA9  mov         qword ptr [rsp+50h],r15  
0000100008000CAE  mov         r14,rcx  
0000100008000CB1  mov         rax,qword ptr gs:[58h]  
0000100008000CBA  mov         rax,qword ptr [rax+40h]  
0000100008000CBE  mov         rax,qword ptr [rax+8]  
0000100008000CC2  mov         dword ptr [rax+2948h],2  
0000100008000CCC  mov         rcx,r14  
0000100008000CCF  mov         rax,offset ZEND_INCLUDE_OR_EVAL_SPEC_CONST_HANDLER (07FFB60BCE160h)  
0000100008000CD9  call        rax  
0000100008000CDB  mov         rcx,qword ptr gs:[58h]  
0000100008000CE4  mov         rcx,qword ptr [rcx+40h]  
0000100008000CE8  mov         rcx,qword ptr [rcx+8]  
0000100008000CEC  cmp         qword ptr [rcx+2B50h],0  
0000100008000CF4  jne         00001000080001D0  
0000100008000CFA  mov         r14,qword ptr [rsp+48h]  
0000100008000CFF  mov         r15,qword ptr [rsp+50h]  
0000100008000D04  add         rsp,58h  
0000100008000D08  mov         rax,2  
0000100008000D0F  ret  

@nono303
Copy link
Contributor

nono303 commented Dec 6, 2024

Hi @here,

I opened #16449 and @cmb69 redirected me to this issue.
Here is what I can bring:

@dstogov

Did you saw the same problem in PHP-8.3 built with vs17?

I've been building php with vs17 since 8.0.8 (so all 8.3x) and I've never had any opcache JIT issue with this configuration

opcache.jit	1255
opcache.mmap_base	0x0000700000000000

@cmb69

/SAFESEH:NO is not supported on x64 (compiler explicitly rejects this option). I've already disabled /EH before, but that didn't help.

/SAFESEH:NO is a linker option and not a compiler.
I just tested and the build is done without exception.

"link.exe" @"C:\sdk\src\php-sdk\phpmaster\vs17\x64\build\Release\resp\OPCACHE_GLOBAL_OBJS.txt" C:\sdk\src\php-sdk\phpmaster\vs17\x64\build\Release\php8.lib  kernel32.lib ole32.lib user32.lib advapi32.lib shell32.lib ws2_32.lib Dnsapi.lib psapi.lib bcrypt.lib C:\sdk\src\php-sdk\phpmaster\vs17\x64\build\Release\php_opcache.dll.res /out:C:\sdk\src\php-sdk\phpmaster\vs17\x64\build\Release\php_opcache.dll /dll /nologo /d2:-AllowCompatibleILVersions /incremental:no /LTCG /NODEFAULTLIB:libcmt.lib /SAFESEH:NO /debug /opt:ref,icf /libpath:"C:\sdk\release\vs17_x64-avx2\lib" /libpath:"C:\sdk\release\vs17_x64-avx2\_openssl\openssl\lib" /libpath:"C:\sdk\release\vs17_x64-avx2\_proj\lib" /libpath:"C:\sdk\release\vs17_x64-avx2\_gdal\lib"

I updated my installation with this build and now I don't have 500 error anymore with opcache.jit 1255 🎉

I had to put opcache.jit 1235 before

The full configure & make log: php_vs17-x64-avx2_2024-12-06_08-18-57.log


Note: all feedback I make comes from real usage (httpd <> mod_fcgid <> php-cgi) and not from test execution.

My context

⏩ Available for any question, precision or other testing :)

Edit: quick performance benchmark with https://github.com/php/php-src/blob/master/Zend/micro_bench.php

  • opcache.jit 1235 : 575ms
  • opcache.jit 1255 : 462ms (-20%)

@dstogov
Copy link
Member

dstogov commented Dec 6, 2024

At least Zend\tests\gh9407.php passes on PHP-8.3 build with vs17.

So, may be it's possible to fix this improving the code generator (avoid %rsp modification).
I'll try to take a deeper look myself. This will take time, because I need to restore Win VM (already done, but had to move from vmware to virtualbox), install vs17, adopt php build system, etc...

@nono303 thanks for the feedback. It confirms my guess about degradation caused by new JIT back-end.
Can you provide a patch that adds /SAFESEH:NO (just post it here).

@cmb69
Copy link
Member Author

cmb69 commented Dec 6, 2024

So, may be it's possible to fix this improving the code generator (avoid %rsp modification).

I'm not sure that avoiding %rsp modification is even necessary. From my limited knowledge, I looks like JIT IT doesn't reserve enough stack space (old JIT decreases %rsp by 58h, while JIT IR only decreases by 48h, although several registers are pushed additionally).

old JIT vs JIT IR
--- C:/php-sdk/phpdev/vs17/x64/JIT IR.txt	Fri Dec  6 11:33:24 2024
+++ C:/php-sdk/phpdev/vs17/x64/old JIT.txt	Fri Dec  6 11:33:11 2024
@@ -1,24 +1,37 @@
-JIT IR
+old JIT
 
-0000100008000C9E  add         byte ptr [rax],al  
-0000100008000CA0  sub         rsp,58h  
-0000100008000CA4  mov         qword ptr [rsp+48h],r14  
-0000100008000CA9  mov         qword ptr [rsp+50h],r15  
-0000100008000CAE  mov         r14,rcx  
-0000100008000CB1  mov         rax,qword ptr gs:[58h]  
-0000100008000CBA  mov         rax,qword ptr [rax+40h]  
-0000100008000CBE  mov         rax,qword ptr [rax+8]  
-0000100008000CC2  mov         dword ptr [rax+2948h],2  
-0000100008000CCC  mov         rcx,r14  
-0000100008000CCF  mov         rax,offset ZEND_INCLUDE_OR_EVAL_SPEC_CONST_HANDLER (07FFB60BCE160h)  
-0000100008000CD9  call        rax  
-0000100008000CDB  mov         rcx,qword ptr gs:[58h]  
-0000100008000CE4  mov         rcx,qword ptr [rcx+40h]  
-0000100008000CE8  mov         rcx,qword ptr [rcx+8]  
-0000100008000CEC  cmp         qword ptr [rcx+2B50h],0  
-0000100008000CF4  jne         00001000080001D0  
-0000100008000CFA  mov         r14,qword ptr [rsp+48h]  
-0000100008000CFF  mov         r15,qword ptr [rsp+50h]  
-0000100008000D04  add         rsp,58h  
-0000100008000D08  mov         rax,2  
-0000100008000D0F  ret  
+0000100008000CD0  push        rbx  
+0000100008000CD1  push        rbp  
+0000100008000CD2  push        rsi  
+0000100008000CD3  push        rdi  
+0000100008000CD4  push        r12  
+0000100008000CD6  push        r13  
+0000100008000CD8  push        r14  
+0000100008000CDA  push        r15  
+0000100008000CDC  sub         rsp,28h  
+0000100008000CE0  mov         r14,rcx  
+0000100008000CE3  mov         rax,qword ptr gs:[58h]  
+0000100008000CEC  mov         rax,qword ptr [rax+38h]  
+0000100008000CF0  mov         rax,qword ptr [rax+8]  
+0000100008000CF4  mov         dword ptr [rax+29D8h],2  
+0000100008000CFE  sub         rsp,20h  
+0000100008000D02  mov         rcx,r14  
+0000100008000D05  mov         rax,offset ZEND_INCLUDE_OR_EVAL_SPEC_CONST_HANDLER (07FFB71524B60h)  
+0000100008000D0F  call        rax  
+0000100008000D11  add         rsp,20h  
+0000100008000D15  mov         rax,qword ptr gs:[58h]  
+0000100008000D1E  mov         rax,qword ptr [rax+38h]  
+0000100008000D22  mov         rax,qword ptr [rax+8]  
+0000100008000D26  cmp         qword ptr [rax+2C28h],0  
+0000100008000D2E  jne         0000100008000110  
+0000100008000D34  mov         eax,2  
+0000100008000D39  add         rsp,28h  
+0000100008000D3D  pop         r15  
+0000100008000D3F  pop         r14  
+0000100008000D41  pop         r13  
+0000100008000D43  pop         r12  
+0000100008000D45  pop         rdi  
+0000100008000D46  pop         rsi  
+0000100008000D47  pop         rbp  
+0000100008000D48  pop         rbx  
+0000100008000D49  ret  

/SAFESEH:NO is a linker option and not a compiler.
I just tested and the build is done without exception.

Ah, thanks! Adding this to LDFLAGS still reproduces the STATUS_BAD_STACK for me with tracing JIT (1254).

@dstogov
Copy link
Member

dstogov commented Dec 6, 2024

I finally configured windows environment and reproduced the failure with Release_TS build (Debug_TS works fine despite of the same generated code). If I make a quick hack removing %rsp modification after the prologue, Release_TS starts work.

I found note about this restriction at https://codemachine.com/articles/x64_deep_dive.html ("Since push and pop instructions alter the stack pointer, X64 functions restrict push and pop instructions to the function prolog and epilog respectively. The fact that the stack pointer does not change at all between the prolog and the epilog is a characteristic feature of X64 functions, as shown in Figure 3").

To understand this better I would trace vcruntime... Where can I find CRT sources?

@cmb69
Copy link
Member Author

cmb69 commented Dec 6, 2024

Where can I find CRT sources?

I don't think you can. I don't even think you can get the PDBs easily. :(

Maybe Wine or MinGW sources can help a bit.

@cmb69
Copy link
Member Author

cmb69 commented Dec 6, 2024

reproduced the failure with Release_TS build (Debug_TS works fine despite of the same generated code).

The relevant difference is the optimization level (/Ox for release, /Od for debug builds). If I do a release build with /Od (see diff below), the STATUS_BAD_STACK is gone.

diff
 win32/build/confutils.js | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/win32/build/confutils.js b/win32/build/confutils.js
index e847417bc7..6e7b6db031 100644
--- a/win32/build/confutils.js
+++ b/win32/build/confutils.js
@@ -3446,7 +3446,7 @@ function toolset_setup_build_mode()
 			ADD_FLAG("CFLAGS", "/Od /D NDebug /D NDEBUG /D ZEND_WIN32_NEVER_INLINE /D ZEND_DEBUG=0");
 		} else {
 			// Equivalent to Release_TSInline build -> best optimization
-			ADD_FLAG("CFLAGS", "/Ox /D NDebug /D NDEBUG /GF /D ZEND_DEBUG=0");
+			ADD_FLAG("CFLAGS", "/Od /D NDebug /D NDEBUG /GF /D ZEND_DEBUG=0");
 		}
 
 		// if you have VS.Net /GS hardens the binary against buffer overruns

@nono303
Copy link
Contributor

nono303 commented Dec 7, 2024

@cmb69

The relevant difference is the optimization level (/Ox for release, /Od for debug builds).

Fyi, I build with /O2 /Ob3 see. https://github.com/nono303/win-build-scripts/blob/master/modules/phpsdk-config_make.bat#L175

@dstogov

Can you provide a patch that adds /SAFESEH:NO (just post it here).

diff --git "a/win32/build/confutils.js" "b/win32/build/confutils.js"
index 1a4ddbffaa..87bc57f6a2 100644
--- "a/win32/build/confutils.js"
+++ "b/win32/build/confutils.js"
@@ -3442,7 +3442,7 @@ function toolset_setup_build_mode()
 		// Generate external debug files when --enable-debug-pack is specified
 		if (PHP_DEBUG_PACK == "yes") {
 			ADD_FLAG("CFLAGS", "/Zi");
-			ADD_FLAG("LDFLAGS", "/incremental:no /debug /opt:ref,icf");
+			ADD_FLAG("LDFLAGS", "/incremental:no /safeseh:no /debug /opt:ref,icf");
 		}
 		ADD_FLAG("CFLAGS", "/LD /MD");
 		if (PHP_SANITIZER == "yes" && CLANG_TOOLSET) {

cmb69 added a commit to cmb69/php-src that referenced this issue Dec 8, 2024
These look related to php#15709, but apparently only fail with ASan
instrumentation.  Let's xfail these for now.
dstogov added a commit that referenced this issue Dec 12, 2024
* PHP-8.4:
  Fix GH-15709: Crashing tests on Windows x64  (#17095)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants