Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal boolean reduction vectorization #128665

Open
psiha opened this issue Feb 25, 2025 · 3 comments
Open

Suboptimal boolean reduction vectorization #128665

psiha opened this issue Feb 25, 2025 · 3 comments

Comments

@psiha
Copy link

psiha commented Feb 25, 2025

Both Clang and GCC struggle, ARM and x86, but GCC does a better job overall (yet still suboptimal/worse than the handwritten version). If you make the accumulation variable (occurrences) 32 bit at least that helps them use horizontal adds/addv (and it again helps GCC more than Clang).

std::size_t count_occurrences(const std::uint8_t * buf, std::size_t size)
{
    std::size_t occurrences = 0;
    for (auto i = 0; i < size; ++i)
    {
        occurrences += ( buf[ i ] == 'a' );
    }
    return occurrences;
}

https://godbolt.org/z/E13T3MKv7

minbench.zip

@llvmbot
Copy link
Member

llvmbot commented Feb 25, 2025

@llvm/issue-subscribers-backend-arm

Author: Domagoj Šarić (psiha)

Both Clang and GCC struggle, ARM and x86, but GCC does a better job overall (yet still suboptimal/worse than the handwritten version). If you make the accumulation variable (occurrences) 32 bit at least that helps them use horizontal adds/addv (and it again helps GCC more than Clang).
std::size_t count_occurrences(const std::uint8_t * buf, std::size_t size)
{
    std::size_t occurrences = 0;
    for (auto i = 0; i &lt; size; ++i)
    {
        occurrences += ( buf[ i ] == 'a' );
    }
    return occurrences;
}

https://godbolt.org/z/E13T3MKv7

minbench.zip

@llvmbot
Copy link
Member

llvmbot commented Feb 25, 2025

@llvm/issue-subscribers-backend-x86

Author: Domagoj Šarić (psiha)

Both Clang and GCC struggle, ARM and x86, but GCC does a better job overall (yet still suboptimal/worse than the handwritten version). If you make the accumulation variable (occurrences) 32 bit at least that helps them use horizontal adds/addv (and it again helps GCC more than Clang).
std::size_t count_occurrences(const std::uint8_t * buf, std::size_t size)
{
    std::size_t occurrences = 0;
    for (auto i = 0; i &lt; size; ++i)
    {
        occurrences += ( buf[ i ] == 'a' );
    }
    return occurrences;
}

https://godbolt.org/z/E13T3MKv7

minbench.zip

@llvmbot
Copy link
Member

llvmbot commented Feb 25, 2025

@llvm/issue-subscribers-backend-aarch64

Author: Domagoj Šarić (psiha)

Both Clang and GCC struggle, ARM and x86, but GCC does a better job overall (yet still suboptimal/worse than the handwritten version). If you make the accumulation variable (occurrences) 32 bit at least that helps them use horizontal adds/addv (and it again helps GCC more than Clang).
std::size_t count_occurrences(const std::uint8_t * buf, std::size_t size)
{
    std::size_t occurrences = 0;
    for (auto i = 0; i &lt; size; ++i)
    {
        occurrences += ( buf[ i ] == 'a' );
    }
    return occurrences;
}

https://godbolt.org/z/E13T3MKv7

minbench.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants