Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: hybrid bench sort issues #209

Open
wants to merge 3 commits into
base: main-dev
Choose a base branch
from

Conversation

ashbob999
Copy link
Contributor

Fixes issues found in the hybrid bench sort function:

  • intial string bytes inserted in the wrong order.
  • order vector being erased (because offset_in_word was 0)

Which led to it having better performance that it actually had.

Also add logic to split the single final sorts into multiple may not be the most efficient or tidy. Which is needed as these functions would perform similar or worse than the equivalent stdvm version, and now are around ~2x faster than there alternative.

Fixes: #208

for (size_t i = 0; i != strings.size(); ++i) {
size_t index = order[i];

for (size_t j = 0; j < std::min<std::size_t>(strings[(sz_size_t)index].size(), 4ul); ++j) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a memcpy and byte order reversal, right? Should be faster without loops.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, do you have an example on how to do it without loops?


std::sort(order, order + strings.size(), [&](sz_u64_t i, sz_u64_t j) {
char *i_bytes = (char *)&i;
char *j_bytes = (char *)&j;
return *(uint32_t *)(i_bytes + offset_in_word) < *(uint32_t *)(j_bytes + offset_in_word);
});

for (size_t i = 0; i != strings.size(); ++i) std::memset((char *)&order[i] + offset_in_word, 0, 4ul);
const auto extract_bytes = [](sz_u64_t v) -> uint32_t {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure if I understand the purpose of the following part. Can you please clarify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once I had fixed the byte order issues, these functions were performing similar to their alternatives.

It turns out the final sort call was the slowest part, but since we are already partially sorting the strings, we only really need to sort each unsorted sub group (all strings with equal first 4 chars).

But to do this we need to keep the bytes in the orders whilst we find the start and end of each group to sort. The extract_bytes lambda just made it easier to get those first 4 string bytes.

I know it's not the prettiest/simplest code.

scripts/bench_sort.cpp Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants