Release v5.2.4 · L4cache/hmp3

first of all, msys2/ucrt64 build is faster than msys2/mingw64 build. (msys2/clang64 env is used for clang builds)
"normal" version is without setting -march
x86-64-v3 version is with march=x86-64-v3 (cannot run on non-avx2 cpus)
znver4 version is of course with march=znver4 (cannot run on non-avx3 cpus)
both seem to be slightly faster than John's at rarewares.org
on zen4, roughly: 700x -> 715x -> 730x

~~and yes, clang build is significantly slower (630x), gcc build is recommended.~~ See the benchmark.

fast-math seems to improve speed of avx-512 enabled builds (znver4 and x86-64-v4) but nothing really noticeable for avx2 (x86-64-v3) build.
roughly, 750x

p.s. all these march or fast-math changes makes encoder to encode the file slightly differently
update: add clang builds, various optimization options.

Some kind of benchmark

source length: 01:25:03.000 or 5103 seconds
source format: Wave PCM S16 48000Hz Stereo
measurement: py -m timeit -n 1 -r 5 -v -s "import subprocess" "subprocess.run('\"hmp3-gcc-fast-math\" 1.wav',shell=True)"
CPU: Ryzen 7900X
RAM: DDR5-6000 32x2 ~80GB/s 63.3ns (AIDA64)
SSD: NVMe PCIe 3.0 in external enclosure, ~400MB/s (where test file and hmp3 binary stored, OS is on internal samsung 980 pro)

compiler versions: gcc: 14.1.0, clang: 18.1.6

	clang	gcc	gcc+pgo	clang+pgo
normal	7.58s/673.22x	7.43s/686.81x		7.23s/705.81x
fast-math	7.54s/676.79x	7.16s/712.71x		7.05s/723.83x
x86-64-v3	7.14s/714.71x	7.13s/715.71x		6.8s/750.44x
x86-64-v3-fast-math	6.98s/731.09x	7.02s/726.92x	6.95s/734.24x	6.55s/779.08x
znver4	7.82s/652.56x	6.58s/775.53x
znver4-fast-math	11.6s/439.91x	6.12s/833.82x	5.92s/861.99x	14.3s/356.85x(WTF?)

dev-20240615 contains:
clang-fast-math-pgo,
clang-x86-64-v3-fast-math-pgo,
gcc-x86-64-v3-fast-math-pgo,
gcc-znver4-fast-math-pgo,
based on the benchmarks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v5.2.4

Some kind of benchmark