Skip to content

Performance testing #838

@henderkes

Description

@henderkes

placeholder issue for now, I will prepare detailed instructions to establish a baseline on your system and we can play around with optimisations later

Copy-paste from zig pr:

Test settings:

PHP: 8.4.10
Test System: RHEL 10. GCC 14.2.1, Zig 0.15-master (Clang 20.1.2), Clang 19.1.7. i7 13700, 16gb ram (WSL).
CFLAGS: -fpic -fpie -O3 -march=x86-64-v3
Extensions: ./configure --disable-all --with-openssl --enable-opcache=shared --with-zlib --with-zip --with-bz2 --enable-dom --enable-simplexml --enable-gd --enable-posix --enable-pcntl --with-libxml --with-readline

Static and dynamic compilation don't make a difference in runtime performance. I ran shared and dynamic 5x and got results within 1% each direction at most. This was more or less expected - static compilation should allow for greater optimization in theory, but I suppose it's just not made proper use of with something this complex. All following tests are run against shared libraries because recompiling statically every time would take longer.

ZTS and NTS don't make a big difference. Talking about ~1-2%ish faster NTS on average. Highest difference between runs I saw was 4.5% in favour of NTS, but I also saw runs where ZTS was faster.

zig-cc (LLVM 20, native-native-gnu, so host glibc) and clang (19) make no difference in terms of runtime performance.
zig nts with 116k vs clang nts with... 116k.

LTO made a negligible (~2%) difference in performance, but an insane difference in compile time (2x thin - 7x fat!). zig cc lto ZTS with 116k is on par with NTS. I raised an issue on php to make php <= 8.3 compatible with lto. The big issue here is that all our libraries also link their own programs with -flto unnecessarily, if we can only link php and extensions with lto it would probably only be a ~20% difference in total time.

Now to the unfortunate part... zig-cc or clang vs gcc (14.2) make a massive performance difference. Talking about 22-27% faster performance with gcc (136k zts, 137k nts). I haven't tested the old centos 7 (gcc 10) for performance yet. This is due to gcc global registers. When building with configure --disable-gcc-global-regs option, performance is slightly slower than Clangs.

Remi gcc performed far better than anything I compiled locally. Not sure why - maybe because of different extension sets? 160k compared to my local 137k with higher optimisation flags. It performs better with gcc -O2 locally too with 139k. I have not found the reason yet. When I recompile his RPM from source, I get the same 160k performance. But when I copy the CFLAGS and LDFLAGS and apply them in either a static or dynamic manual build, I'm back to ~137k.

Side note: GCC fails with LTO when global registers are used. But global registers have a ~25% speed increase, while LTO only offers ~2%. Clang does not support global registers yet. https://clang.llvm.org/docs/UsersManual.html#gcc-extensions-not-implemented-yet

Edit: Shivam Mathur's php zts is also missing these 25% (+ in this case).
Edit 2: Official Appstream php NTS is as fast as remi's, so it must be related to the rpmbuild system somehow.
Edit 3: Official ubuntu images NTS at 135k...

Metadata

Metadata

Labels

documentationImprovements or additions to documentationhelp wantedExtra attention is neededos/linuxThings only for Linux OSos/macosThings only for macOSwipWork In Process

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions