Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use AVX* instructions on default build on amd64 #157

Closed
VVD opened this issue Nov 4, 2024 · 8 comments
Closed

Use AVX* instructions on default build on amd64 #157

VVD opened this issue Nov 4, 2024 · 8 comments

Comments

@VVD
Copy link

VVD commented Nov 4, 2024

OS: FreeBSD 14.1 amd64.
CPU: Core 2 Quad Q6600.
Build OpenJPH 0.17.0 from ports without -march and with -march=core2 - same result.
Dependencies list: smplayer => qt5 => kf5-kimageformats => libheif => OpenJPH.

$ gdb smplayer
GNU gdb (GDB) 15.1 [GDB v15.1 for FreeBSD]
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd14.1".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from smplayer...
(No debugging symbols found in smplayer)
(gdb) run
Starting program: /usr/local/bin/smplayer
[New LWP 807958 of process 16273]
Qt: Session management error: None of the authentication protocols specified are supported
[New LWP 807959 of process 16273]
This is SMPlayer v. 24.5.0 (revision 10277) running on FreeBSD
[New LWP 807960 of process 16273]
[Detaching after fork from child process 16274]

Thread 1 received signal SIGILL, Illegal instruction.
Privileged opcode.
0x000000081576a81a in ojph::local::initialize_tables_avx2() () from /usr/local/lib/libopenjph.so.0.17
$ valgrind smplayer
...
vex amd64->IR: unhandled instruction bytes: 0xC5 0xF8 0x77 0xC3 0xC5 0xF8 0x77 0xE8 0xDA 0x64
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==16247== valgrind: Unrecognised instruction at address 0x1b5f881a.
==16247==    at 0x1B5F881A: ojph::local::initialize_tables_avx2() (in /usr/local/lib/libopenjph.so.0.17.0)
==16247==    by 0x400AF5C: ??? (in /libexec/ld-elf.so.1)
==16247==    by 0x400F3F2: ??? (in /libexec/ld-elf.so.1)
==16247==    by 0x400BF52: ??? (in /libexec/ld-elf.so.1)
==16247==    by 0x12D32276: ??? (in /usr/local/lib/libheif.so.1.19.1)
==16247==    by 0x12CBCB3B: heif_load_plugin (in /usr/local/lib/libheif.so.1.19.1)
==16247==    by 0x12CBC7DE: heif_load_plugins (in /usr/local/lib/libheif.so.1.19.1)
==16247==    by 0x12CBC5DE: heif_init (in /usr/local/lib/libheif.so.1.19.1)
==16247==    by 0x12C1CADA: ??? (in /usr/local/lib/qt5/plugins/imageformats/kimg_heif.so)
==16247==    by 0x12C1CCD9: ??? (in /usr/local/lib/qt5/plugins/imageformats/kimg_heif.so)
==16247==    by 0x532D72F: ??? (in /usr/local/lib/qt5/libQt5Gui.so.5.15.15)
==16247==    by 0x532A4DF: QImageReader::supportedImageFormats() (in /usr/local/lib/qt5/libQt5Gui.so.5.15.15)
==16247== Your program just tried to execute an instruction that Valgrind
==16247== did not recognise.  There are two possible reasons for this.
==16247== 1. Your program has a bug and erroneously jumped to a non-code
==16247==    location.  If you are running Memcheck and you just saw a
==16247==    warning about a bad jump, it's probably your program's fault.
==16247== 2. The instruction is legitimate but Valgrind doesn't handle it,
==16247==    i.e. it's Valgrind's fault.  If you think this is the case or
==16247==    you are not sure, please let us know and we'll try to fix it.
==16247== Either way, Valgrind will now raise a SIGILL signal which will
==16247== probably kill your program.
==16247==
==16247== Process terminating with default action of signal 4 (SIGILL): dumping core
==16247==  Illegal opcode at address 0x1B5F881A
==16247==    at 0x1B5F881A: ojph::local::initialize_tables_avx2() (in /usr/local/lib/libopenjph.so.0.17.0)
==16247==    by 0x400AF5C: ??? (in /libexec/ld-elf.so.1)
==16247==    by 0x400F3F2: ??? (in /libexec/ld-elf.so.1)
==16247==    by 0x400BF52: ??? (in /libexec/ld-elf.so.1)
==16247==    by 0x12D32276: ??? (in /usr/local/lib/libheif.so.1.19.1)
==16247==    by 0x12CBCB3B: heif_load_plugin (in /usr/local/lib/libheif.so.1.19.1)
==16247==    by 0x12CBC7DE: heif_load_plugins (in /usr/local/lib/libheif.so.1.19.1)
==16247==    by 0x12CBC5DE: heif_init (in /usr/local/lib/libheif.so.1.19.1)
==16247==    by 0x12C1CADA: ??? (in /usr/local/lib/qt5/plugins/imageformats/kimg_heif.so)
==16247==    by 0x12C1CCD9: ??? (in /usr/local/lib/qt5/plugins/imageformats/kimg_heif.so)
==16247==    by 0x532D72F: ??? (in /usr/local/lib/qt5/libQt5Gui.so.5.15.15)
==16247==    by 0x532A4DF: QImageReader::supportedImageFormats() (in /usr/local/lib/qt5/libQt5Gui.so.5.15.15)
...

AFAIU: 0xC5 0xF8 0x77 = vzeroupper from AVX.
https://fuchsia.googlesource.com/third_party/llvm-project/+/refs/tags/llvmorg-13.0.0-rc1/llvm/test/CodeGen/X86/fma.ll?autodive=0%2F%2F%2F%2F%2F%2F%2F%2F#665
https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#New_instructions

Runtime detection of supported SIMD level in init_cpu_ext_level() work well, but OpenJPH still use instructions from unsupported SIMDs on current CPU.

There are build options:

option(OJPH_DISABLE_SIMD "Disables the use of SIMD instructions -- agnostic to architectures" OFF)
option(OJPH_DISABLE_SSE "Disables the use of SSE SIMD instructions and associated files" OFF)
option(OJPH_DISABLE_SSE2 "Disables the use of SSE2 SIMD instructions and associated files" OFF)
option(OJPH_DISABLE_SSSE3 "Disables the use of SSSE3 SIMD instructions and associated files" OFF)
option(OJPH_DISABLE_SSE4 "Disables the use of SSE4 SIMD instructions and associated files" OFF)
option(OJPH_DISABLE_AVX "Disables the use of AVX SIMD instructions and associated files" OFF)
option(OJPH_DISABLE_AVX2 "Disables the use of AVX2 SIMD instructions and associated files" OFF)
option(OJPH_DISABLE_AVX512 "Disables the use of AVX512 SIMD instructions and associated files" OFF)
option(OJPH_DISABLE_NEON "Disables the use of NEON SIMD instructions and associated files" OFF)

If I build with -DOJPH_DISABLE_SSE4=ON -DOJPH_DISABLE_AVX=ON -DOJPH_DISABLE_AVX2=ON -DOJPH_DISABLE_AVX512=ON then smplayer run and work fine.

@aous72
Copy link
Owner

aous72 commented Nov 4, 2024

Thank you for this post.
I think I need to use Intel's sde to check what happens on older platforms, as part of testing.
I think I know why this is happening, and I will try to fix it soon.

Kind regards,
Aous.

@aous72
Copy link
Owner

aous72 commented Nov 8, 2024

I had a deeper look. I am assuming you are using 64-bit architecture.

  1. The code in ojph::local::initialize_tables_avx2 does not run, unless the CPU support AVX2 or higher.
    bool initialize_tables_avx2() {
      if (get_cpu_ext_level() >= X86_CPU_EXT_LEVEL_AVX2) {
        bool result;
        result = vlc_init_tables();
        result = result && uvlc_init_tables();
        return result;
      }
      return false;
    }
  1. I tested with Intel SDE on linux, the code, compiled with GCC, runs without issues for Pentium 4 Prescott, Merom and Penryn Architectures. I think core 2 cpus belong to either Merom and Penryn.

I am about to release a new version.
In this version, I changed some code in that file, and in ojph_block_encode_avx512.cpp. In particular,
I removed = { 0 } from

    static ui32 vlc_tbl0[2048];
    static ui32 vlc_tbl1[2048];

My worry is that = { 0 } will be called automatically, because it is statically defined, and since the code has the -mavx2 option, it will definitely use avx2 instructions.
To counter the effect of that I changed the above code to

    /////////////////////////////////////////////////////////////////////////
    bool initialize_tables_avx2() {
      if (get_cpu_ext_level() >= X86_CPU_EXT_LEVEL_AVX2) {
        memset(vlc_tbl0, 0, 2048 * sizeof(ui32));
        memset(vlc_tbl1, 0, 2048 * sizeof(ui32));

        bool result;
        result = vlc_init_tables();
        result = result && uvlc_init_tables();
        return result;
      }
      return false;
    }

It might be also useful to change the code around get_cpu_ext_level() to

  ////////////////////////////////////////////////////////////////////////////
  static int cpu_level;
  static bool cpu_level_initialized;

  ////////////////////////////////////////////////////////////////////////////
  int get_cpu_ext_level()
  {
    if (!cpu_level_initialized)
      cpu_level_initialized = init_cpu_ext_level(cpu_level);
    return cpu_level;
  }

You can wait for the new release -- should be today or tomorrow (Sydney time). You can test these suggestions.

Hope this help.

Kind regards,
Aous.

@VVD
Copy link
Author

VVD commented Nov 9, 2024

Ofc 64bit: OS: FreeBSD 14.1 amd64.
Build with LLVM 18 from base system.

If you can commit these changes, testing will be easier.

@VVD
Copy link
Author

VVD commented Nov 9, 2024

Same error.

aous72 added a commit that referenced this issue Nov 9, 2024
@aous72
Copy link
Owner

aous72 commented Nov 10, 2024

Thank you for putting this in.

Just to keep you in the picture.
I had to do some digging and the problem only occurred with clang when the code is compiled in Release build type. It does not happen with gcc, nor in other build types.

The source of the problem is that clang inserts vzeroupper instruction in a normal C++ function (where no intrinsics are used). It does this at the end of the function, just before returning.

One solution could have been using the following compiler flags

-mllvm -x86-use-vzeroupper=0

But the solution I implemented is better.

See how it goes with you and if this solves the problem, please let me know.

Cheers,
Aous.

PS: See this https://stackoverflow.com/questions/68736527/do-i-need-to-use-mm256-zeroupper-in-2021

@VVD
Copy link
Author

VVD commented Nov 10, 2024

Fixed in version 0.18.0. Thanks!

Offtopic:

$ readelf -d /usr/local/lib/libopenjph.so | grep SONAME
 0x000000000000000e SONAME               Library soname: [libopenjph.so.0.18]

SONAME "must be" libopenjph.so.0.

Patch:

--- src/core/CMakeLists.txt.orig        2024-11-10 02:36:26 UTC
+++ src/core/CMakeLists.txt
@@ -133,9 +133,9 @@ else()
     PROPERTIES
       OUTPUT_NAME "openjph.${OPENJPH_VERSION_MAJOR}.${OPENJPH_VERSION_MINOR}")
 else()
-  set(OJPH_LIB_NAME_STRING "openjph.${OPENJPH_VERSION_MAJOR}.${OPENJPH_VERSION_MINOR}")
+  set(OJPH_LIB_NAME_STRING "openjph.${OPENJPH_VERSION_MAJOR}")
   set_target_properties(openjph
     PROPERTIES
-      SOVERSION "${OPENJPH_VERSION_MAJOR}.${OPENJPH_VERSION_MINOR}"
+      SOVERSION "${OPENJPH_VERSION_MAJOR}"
       VERSION "${OPENJPH_VERSION}")
 endif()

@aous72
Copy link
Owner

aous72 commented Nov 10, 2024

Thank you for the suggestion regarding the SONAME name.

I am aware that this is not what is expected -- as in issue #155.
I am still not very sure what is the best course of action.
This library is my first contribution to open source development.

Kind regards,
Aous.

@VVD
Copy link
Author

VVD commented Nov 10, 2024

Thanks.

@VVD VVD closed this as completed Nov 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants