-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement vectorisation to make this lib blazingly fast #585
Comments
Thanks, but currently the merge function is not the bottleneck of performance. |
And I completely agree with you that SSE/AVX can make this tool even faster. You can take a look at fastplong, which is more time consuming and is being intensively developed. |
@chinwobble AdapterRemoval got a nice speed boost when SIMD instructions were introduced. It is a similar program to fastp so you might get the idea where the speed boost is needed. |
Thanks I will have a go at implementing this! |
I've had a quick attempt implementing SIMD on the (I understand this might not be the bottleneck). The implementation looks like this: string Sequence::reverseComplementHwy(string *origin)
{
auto length = origin->length();
const auto sequence = reinterpret_cast<const uint8_t*>(&origin[0]);
uint8_t output[length];
const auto transform = [](const auto d, auto output, const auto sequence) HWY_ATTR
{
const auto a = hn::Set(d, 65UL);
const auto t = hn::Set(d, 84UL);
const auto c = hn::Set(d, 67UL);
const auto g = hn::Set(d, 71UL);
output = hn::IfThenElse(hn::Eq(sequence, a), t, output);
output = hn::IfThenElse(hn::Eq(sequence, t), a, output);
output = hn::IfThenElse(hn::Eq(sequence, c), g, output);
output = hn::IfThenElse(hn::Eq(sequence, g), c, output);
return output;
};
const hn::ScalableTag<uint8_t> d;
hn::Transform1(d, output, length, sequence, transform);
auto retVal = reinterpret_cast<char *>(output);
std::string reversed(retVal, length);
return reversed;
} Comparisons:
In order to implement this into the repo we would need to a few things:
|
This seems very promising! Have you ever tried the arm-arch ? |
Yes I have tested this in CI with mac-os ARM64.
|
@sfchen would you be open me changing this repo or I pushed a branch and the build looks like this. |
Hi, thanks for your suggestion. What kind of external dependencies you want to use? I was always trying to keep the simplicity of building such tools, so I didn't use CMake. You know, most of the users are with biological backgrounds, so simplicity is very important. |
Hey thanks for your response. I totally understand your emphasis on simplicity and potentially removing external dependencies for contributors. Here are some problems and solutions to address your concerns:
make I think this can be achieved by putting all commands needed to setup |
@sfchen I've started adding SIMD and fixed the windows build on this PR. |
Vectorisation using SIMD should provide big performance improvements on large datasets.
There are a few candidates for vectorisation such as in the merge function.
https://github.com/OpenGene/fastp/blob/master/src/stats.cpp#L881-L939
I'm happy to have a go at implementing this if you can provide a sufficient fastq file for me to test with
References:
https://chryswoods.com/vector_c++/portable.html
https://github.com/google/highway/
The text was updated successfully, but these errors were encountered: