Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform single copy + pointer select instead of double write for small types #29

Merged
merged 2 commits into from
Dec 12, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 12 additions & 17 deletions src/quicksort.rs
Original file line number Diff line number Diff line change
Expand Up @@ -165,11 +165,12 @@ impl<T> StablePartitionTypeImpl for T {
scratch_rev = scratch_rev.sub(1);

let is_less_than_pivot = is_less(&*scan, &*pivot);
let dst = if is_less_than_pivot {
scratch_base.add(num_lt) // i + num_lt
let dst_base = if is_less_than_pivot {
scratch_base // i + num_lt
} else {
scratch_rev.add(num_lt) // len - (i + 1) + num_lt = len - 1 - num_ge
scratch_rev // len - (i + 1) + num_lt = len - 1 - num_ge
};
let dst = dst_base.add(num_lt);

// Save pivot location in scratch for later.
if const { crate::has_direct_interior_mutability::<T>() }
Expand Down Expand Up @@ -253,11 +254,6 @@ where

/// Specialization for small types, through traits to not invoke compile time
/// penalties for loop unrolling when not used.
///
/// Benchmarks show that for small types simply storing to *both* potential
/// destinations is more efficient than a conditional store. It is also less at
/// risk of having the compiler generating a branch instead of conditional
/// store. And explicit loop unrolling is also often very beneficial.
impl<T> StablePartitionTypeImpl for T
where
T: Copy + crate::Freeze,
Expand All @@ -279,21 +275,20 @@ where
}

unsafe {
// SAFETY: exactly the same invariants and logic as the
// non-specialized impl. The conditional store being replaced by a
// double copy changes nothing, on all but the final iteration the
// bad write will simply be overwritten by a later iteration, and on
// the final iteration we write to the same index twice. And we do
// naive loop unrolling where the exact same loop body is just
// repeated.
// SAFETY: exactly the same invariants and logic as the non-specialized impl. And we do
// naive loop unrolling where the exact same loop body is just repeated.
let pivot = v_base.add(pivot_pos);

let mut loop_body = |state: &mut PartitionState<T>| {
state.scratch_rev = state.scratch_rev.sub(1);

let is_less_than_pivot = is_less(&*state.scan, &*pivot);
ptr::copy_nonoverlapping(state.scan, scratch_base.add(state.num_lt), 1);
ptr::copy_nonoverlapping(state.scan, state.scratch_rev.add(state.num_lt), 1);
let dst_base = if is_less_than_pivot {
scratch_base // i + num_lt
} else {
state.scratch_rev // len - (i + 1) + num_lt = len - 1 - num_ge
};
ptr::copy_nonoverlapping(state.scan, dst_base.add(state.num_lt), 1);

state.num_lt += is_less_than_pivot as usize;
state.scan = state.scan.add(1);
Expand Down
Loading