-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: hybrid bench sort issues #209
base: main-dev
Are you sure you want to change the base?
Conversation
for (size_t i = 0; i != strings.size(); ++i) { | ||
size_t index = order[i]; | ||
|
||
for (size_t j = 0; j < std::min<std::size_t>(strings[(sz_size_t)index].size(), 4ul); ++j) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a memcpy and byte order reversal, right? Should be faster without loops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, do you have an example on how to do it without loops?
|
||
std::sort(order, order + strings.size(), [&](sz_u64_t i, sz_u64_t j) { | ||
char *i_bytes = (char *)&i; | ||
char *j_bytes = (char *)&j; | ||
return *(uint32_t *)(i_bytes + offset_in_word) < *(uint32_t *)(j_bytes + offset_in_word); | ||
}); | ||
|
||
for (size_t i = 0; i != strings.size(); ++i) std::memset((char *)&order[i] + offset_in_word, 0, 4ul); | ||
const auto extract_bytes = [](sz_u64_t v) -> uint32_t { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure if I understand the purpose of the following part. Can you please clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once I had fixed the byte order issues, these functions were performing similar to their alternatives.
It turns out the final sort call was the slowest part, but since we are already partially sorting the strings, we only really need to sort each unsorted sub group (all strings with equal first 4 chars).
But to do this we need to keep the bytes in the order
s whilst we find the start and end of each group to sort. The extract_bytes
lambda just made it easier to get those first 4 string bytes.
I know it's not the prettiest/simplest code.
e6863bb
to
9880f26
Compare
Fixes issues found in the hybrid bench sort function:
order
vector being erased (becauseoffset_in_word
was 0)Which led to it having better performance that it actually had.
Also add logic to split the single final sorts into multiple may not be the most efficient or tidy. Which is needed as these functions would perform similar or worse than the equivalent
std
vm version, and now are around ~2x faster than there alternative.Fixes: #208