-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unroll some tight loops #235
base: master
Are you sure you want to change the base?
Conversation
Good idea Howard!
could be replaced by something like this:
I don't know if from the point of the compiler the second option could use SIMD instructions, since there are no output+_ and input++ around, but perhaps the compiler figures it out since it is idiomatic. There's also the volk library (https://www.libvolk.org/) that has a couple of functions called 'volk_16i_s32f_convert_32f' (https://www.libvolk.org/doxygen/volk_16i_s32f_convert_32f.html) and 'volk_16ic_convert_32fc' (https://www.libvolk.org/doxygen/volk_16ic_convert_32fc.html), which are optimized for different types of hardware, but I am not sure if its license (GPL v3) is compatible with this project's license. Franco |
I got the idea from CMSIS-DSP. Ideally, hand write instruction can do a
better job but hard to adopt to different SIMD solutions. I will do more
experiment to see how difference they are.
…On Sat, Jul 6, 2024 at 10:54 PM Franco Venturi ***@***.***> wrote:
Good idea Howard!
I don't know how C++ compilers work these days, but perhaps a block like
this:
*output++ = float(*input++);
*output++ = float(*input++);
*output++ = float(*input++);
*output++ = float(*input++);
could be replaced by something like this:
const int16_t *in = input + 4 * m;
float *out = output + 4 * m;
out[0] = float(int[0]);
out[1] = float(int[1]);
out[2] = float(int[2]);
out[3] = float(int[3]);
I don't know if from the point of the compiler the second option could use
SIMD instructions, since there are no output+_ and input++ around, but
perhaps the compiler figures it out since it is idiomatic.
There's also the volk library (https://www.libvolk.org/) that has a
couple of functions called 'volk_16i_s32f_convert_32f' (
https://www.libvolk.org/doxygen/volk_16i_s32f_convert_32f.html) and
'volk_16ic_convert_32fc' (
https://www.libvolk.org/doxygen/volk_16ic_convert_32fc.html), which are
optimized for different types of hardware, but I am not sure if its license
(GPL v3) is compatible with this project's license.
Franco
—
Reply to this email directly, view it on GitHub
<#235 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAF3GRG3H7J2CTQAXZGSAVLZLAAL7AVCNFSM6AAAAABKOOA3BSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJRG44DSMBXHA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
-Howard
|
what do you think about fast int16_t to float conversion like this https://github.com/m-ou-se/floatconv |
This is interesting blog. I will look into it and port i16_to_float over.
…On Sun, Jul 7, 2024 at 5:22 AM Ruslan Migirov ***@***.***> wrote:
https://blog.m-ou.se/floats/
—
Reply to this email directly, view it on GitHub
<#235 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAF3GRDEZWWXJHMFTVMP6RLZLBNXRAVCNFSM6AAAAABKOOA3BSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJRHE3TEOBRGM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
-Howard
|
i tried this optimization and i have found only convert_float one showed under one percent better cpu usage on r2iq thread... |
No description provided.