Skip to content

Performance patches and build fixes for Elbrus 2000 (e2k) architecture.

Notifications You must be signed in to change notification settings

ilyakurdyukov/e2k-ports

Folders and files

NameName
Last commit message
Last commit date
Feb 4, 2022
Jan 12, 2022
Jun 8, 2021
May 3, 2023
Feb 23, 2024
Jun 23, 2021
Jul 7, 2021
Oct 11, 2022
Oct 27, 2023
May 26, 2021
Dec 28, 2023
Jun 8, 2021
Mar 28, 2024
Jan 8, 2022
Aug 5, 2024
Aug 31, 2022
Dec 10, 2022
Dec 28, 2021
Aug 24, 2022
Dec 3, 2021
Dec 3, 2021
Feb 9, 2024
Aug 13, 2021
Apr 7, 2021
Nov 18, 2021
Aug 31, 2021
May 22, 2024
Jul 19, 2023
Jan 20, 2022
Jun 20, 2023
Jan 13, 2022
Mar 13, 2021
Sep 9, 2021
Feb 21, 2024
Aug 17, 2021
Mar 26, 2023
Mar 26, 2021
Jan 19, 2022
May 30, 2023
Jul 6, 2024
Oct 3, 2022

Repository files navigation

e2k-ports

Performance patches and build fixes for Elbrus (e2k) architecture.

This is my personal repository so that patches won't get lost.

Elbrus porting cheat sheet:

Elbrus 2000 (aka e2k) is a 64-bit little-endian architecture.
The compiler is mostly GCC compatible (defines __GNUC__), EDG frontend.

detection

  • shell: uname -m returns e2k
  • cmake: if({CMAKE_SYSTEM_PROCESSOR} STREQUAL "e2k")
  • C preprocessor: if defined(__e2k__)
  • compiler version: if __LCC__ = 125 and __LCC_MINOR__ = 9 then it's "LCC 1.25.09"
  • architecture version: defined in __iset__ (less than 3 is obsolete, 6 is the latest at the moment)

intrinsics

  • MMX, SSE2, SSSE3, SSE4.1* - native support
  • AVX, AVX2 - supported, but not recommended, uses too much CPU registers
  • SSE4.2 and _mm_dp_ps (from SSE4.1) - emulated, slow, do not use

The compiler enables MMX to AVX2 support by default, pass -mno-avx (-mno-sse4.2) if code depends on the presence of macros (e.g. #if defined(__AVX2__)).

builtins

  • __sync*, __atomic* - supported by the compiler
  • count leading/trailing zeros - supported (__builtin_clz, __builtin_ctz)
  • memory fence - supported (need to include x86intrin.h first)
    • __builtin_ia32_mfence, __builtin_ia32_lfence, __builtin_ia32_sfence

cpuid

Use compile time CPU detection, select the best SIMD up to SSE4.1.

rdtsc

#include <x86intrin.h>
uint64_t time = __rdtsc();
// same: unsigned aux; uint64_t time = __rdtscp(&aux);

useful pragmas

_Pragma("name") - to use from macros.

Use before the loop:

  • #pragma ivdep - ignore data dependencies inside the loop
  • #pragma unroll(n) - unroll cycle N times

restrict

Using the restrict keyword is good for performance, but note that it is ignored by the LCC if you're using vector load/store intrinsics such as _mm_load_si128(). For code with vector intrinsics use #pragma ivdep.

makecontext

Instead of makecontext(ctx, ...) use makecontext_e2k(ctx, ...), returns a negative integer on error. Allocates extra resources that need to be freed using freecontext_e2k(ctx).

nop

Use __asm__ __volatile__ ("nop") or _mm_pause() for a little delay.

clearing the instruction cache

The GNUC standard function __clear_cache(char *begin, char *end) works correctly since LCC 1.25.18, LCC 1.26.04. This function is available in previous versions, but does nothing.

inline

If it's crucial to performance, then use __attribute__((__always_inline__)) inline rather than just inline. Because when using large or complicated inline functions, the LCC compiler may decide not to inline them.

avoid if possible

The GNUC C extension Labels as Values is available in the LCC, but performance is worse than using a simple switch/case.

The GNUC Vector Extension is also available in LCC, but poorly implemented and its performance is very bad.

About

Performance patches and build fixes for Elbrus 2000 (e2k) architecture.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published