Unroll some tight loops #235

howard0su · 2024-07-06T13:32:41Z

No description provided.

fventuri · 2024-07-06T14:54:34Z

Good idea Howard!
I don't know how C++ compilers work these days, but perhaps a block like this:

            *output++ = float(*input++);
            *output++ = float(*input++);
            *output++ = float(*input++);
            *output++ = float(*input++);

could be replaced by something like this:

            const int16_t *in = input + 4 * m;
            float *out = output + 4 * m;
            out[0] = float(int[0]);
            out[1] = float(int[1]);
            out[2] = float(int[2]);
            out[3] = float(int[3]);

I don't know if from the point of the compiler the second option could use SIMD instructions, since there are no output+_ and input++ around, but perhaps the compiler figures it out since it is idiomatic.

There's also the volk library (https://www.libvolk.org/) that has a couple of functions called 'volk_16i_s32f_convert_32f' (https://www.libvolk.org/doxygen/volk_16i_s32f_convert_32f.html) and 'volk_16ic_convert_32fc' (https://www.libvolk.org/doxygen/volk_16ic_convert_32fc.html), which are optimized for different types of hardware, but I am not sure if its license (GPL v3) is compatible with this project's license.

Franco

howard0su · 2024-07-06T15:48:09Z

I got the idea from CMSIS-DSP. Ideally, hand write instruction can do a better job but hard to adopt to different SIMD solutions. I will do more experiment to see how difference they are.

…

On Sat, Jul 6, 2024 at 10:54 PM Franco Venturi ***@***.***> wrote: Good idea Howard! I don't know how C++ compilers work these days, but perhaps a block like this: *output++ = float(*input++); *output++ = float(*input++); *output++ = float(*input++); *output++ = float(*input++); could be replaced by something like this: const int16_t *in = input + 4 * m; float *out = output + 4 * m; out[0] = float(int[0]); out[1] = float(int[1]); out[2] = float(int[2]); out[3] = float(int[3]); I don't know if from the point of the compiler the second option could use SIMD instructions, since there are no output+_ and input++ around, but perhaps the compiler figures it out since it is idiomatic. There's also the volk library (https://www.libvolk.org/) that has a couple of functions called 'volk_16i_s32f_convert_32f' ( https://www.libvolk.org/doxygen/volk_16i_s32f_convert_32f.html) and 'volk_16ic_convert_32fc' ( https://www.libvolk.org/doxygen/volk_16ic_convert_32fc.html), which are optimized for different types of hardware, but I am not sure if its license (GPL v3) is compatible with this project's license. Franco — Reply to this email directly, view it on GitHub <#235 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAF3GRG3H7J2CTQAXZGSAVLZLAAL7AVCNFSM6AAAAABKOOA3BSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJRG44DSMBXHA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

-- -Howard

cozycactus · 2024-07-06T21:06:48Z

what do you think about fast int16_t to float conversion like this https://github.com/m-ou-se/floatconv
when i did perf it showed conversion takes many cpu %

cozycactus · 2024-07-06T21:21:38Z

https://blog.m-ou.se/floats/

howard0su · 2024-07-07T00:12:50Z

This is interesting blog. I will look into it and port i16_to_float over.

…

On Sun, Jul 7, 2024 at 5:22 AM Ruslan Migirov ***@***.***> wrote: https://blog.m-ou.se/floats/ — Reply to this email directly, view it on GitHub <#235 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAF3GRDEZWWXJHMFTVMP6RLZLBNXRAVCNFSM6AAAAABKOOA3BSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJRHE3TEOBRGM> . You are receiving this because you authored the thread.Message ID: ***@***.***>

-- -Howard

cozycactus · 2024-10-17T00:05:12Z

i tried this optimization and i have found only convert_float one showed under one percent better cpu usage on r2iq thread...

Unroll some tight loops

33e389d

howard0su requested review from fventuri and hayguen July 6, 2024 13:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unroll some tight loops #235

Unroll some tight loops #235

howard0su commented Jul 6, 2024

fventuri commented Jul 6, 2024

howard0su commented Jul 6, 2024 via email

cozycactus commented Jul 6, 2024

cozycactus commented Jul 6, 2024

howard0su commented Jul 7, 2024 via email

cozycactus commented Oct 17, 2024

Unroll some tight loops #235

Are you sure you want to change the base?

Unroll some tight loops #235

Conversation

howard0su commented Jul 6, 2024

fventuri commented Jul 6, 2024

howard0su commented Jul 6, 2024 via email

cozycactus commented Jul 6, 2024

cozycactus commented Jul 6, 2024

howard0su commented Jul 7, 2024 via email

cozycactus commented Oct 17, 2024