Skip to content

v5.2.4

Latest
Compare
Choose a tag to compare
@L4cache L4cache released this 31 May 21:51

first of all, msys2/ucrt64 build is faster than msys2/mingw64 build. (msys2/clang64 env is used for clang builds)
"normal" version is without setting -march
x86-64-v3 version is with march=x86-64-v3 (cannot run on non-avx2 cpus)
znver4 version is of course with march=znver4 (cannot run on non-avx3 cpus)
both seem to be slightly faster than John's at rarewares.org
on zen4, roughly: 700x -> 715x -> 730x

and yes, clang build is significantly slower (630x), gcc build is recommended. See the benchmark.

fast-math seems to improve speed of avx-512 enabled builds (znver4 and x86-64-v4) but nothing really noticeable for avx2 (x86-64-v3) build.
roughly, 750x

p.s. all these march or fast-math changes makes encoder to encode the file slightly differently
update: add clang builds, various optimization options.

Some kind of benchmark

source length: 01:25:03.000 or 5103 seconds
source format: Wave PCM S16 48000Hz Stereo
measurement: py -m timeit -n 1 -r 5 -v -s "import subprocess" "subprocess.run('\"hmp3-gcc-fast-math\" 1.wav',shell=True)"
CPU: Ryzen 7900X
RAM: DDR5-6000 32x2 ~80GB/s 63.3ns (AIDA64)
SSD: NVMe PCIe 3.0 in external enclosure, ~400MB/s (where test file and hmp3 binary stored, OS is on internal samsung 980 pro)

compiler versions: gcc: 14.1.0, clang: 18.1.6

clang gcc gcc+pgo clang+pgo
normal 7.58s/673.22x 7.43s/686.81x 7.23s/705.81x
fast-math 7.54s/676.79x 7.16s/712.71x 7.05s/723.83x
x86-64-v3 7.14s/714.71x 7.13s/715.71x 6.8s/750.44x
x86-64-v3-fast-math 6.98s/731.09x 7.02s/726.92x 6.95s/734.24x 6.55s/779.08x
znver4 7.82s/652.56x 6.58s/775.53x
znver4-fast-math 11.6s/439.91x 6.12s/833.82x 5.92s/861.99x 14.3s/356.85x(WTF?)

dev-20240615 contains:
clang-fast-math-pgo,
clang-x86-64-v3-fast-math-pgo,
gcc-x86-64-v3-fast-math-pgo,
gcc-znver4-fast-math-pgo,
based on the benchmarks.