You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
That doesn't seem right. Extending with an iterator over a slice shouldn't be faster than extending with a slice directly. Here is what perf tells me about extend_with_slice:
That looks to me like the extend is gone and this just updates some internal benchmark counter.
I cannot devise a way to prevent this optimization using bencher::black_box. If I apply the most aggressive black-boxing I can think of:
diff --git a/benches/extend.rs b/benches/extend.rs
index ba33a93..de24f57 100644
--- a/benches/extend.rs+++ b/benches/extend.rs@@ -37,9 +37,12 @@ fn extend_with_slice(b: &mut Bencher) {
let mut v = ArrayVec::<u8, 512>::new();
let data = [1; 512];
b.iter(|| {
+ black_box(&v);
v.clear();
+ black_box(&v);
let iter = data.iter().map(|&x| x);
v.extend(iter);
+ black_box(&v);
v[511]
});
b.bytes = v.capacity() as u64;
If this is optimized well, we should get performance equal to extend_from_slice, but we don't. It looks to me like we end up with a loop that copies each item independently, for a ~36x regression. Ow. But if that isn't bad enough, setting codegen-units = 1 in profile.bench again lets LLVM optimize away the benchmark. And unfortunately the structure of these benchmarks forbids doing something like v = black_box(v);, but if I restructure them to accommodate that, benchmarks at dominated by bencher::black_box.
If I use the clobber-all-memory black box with this diff, no combination of codegen-units = 1, lto = true, and panic = "abort" will induce LLVM to optimize away the benchmark:
diff --git a/benches/extend.rs b/benches/extend.rs
index ba33a93..0decd87 100644
--- a/benches/extend.rs+++ b/benches/extend.rs@@ -1,4 +1,4 @@-+#![feature(bench_black_box)]
extern crate arrayvec;
#[macro_use] extern crate bencher;
@@ -7,7 +7,7 @@ use std::io::Write;
use arrayvec::ArrayVec;
use bencher::Bencher;
-use bencher::black_box;+use core::hint::black_box;
fn extend_with_constant(b: &mut Bencher) {
let mut v = ArrayVec::<u8, 512>::new();
@@ -40,6 +40,7 @@ fn extend_with_slice(b: &mut Bencher) {
v.clear();
let iter = data.iter().map(|&x| x);
v.extend(iter);
+ black_box(&v);
v[511]
});
b.bytes = v.capacity() as u64;
So as far as I can tell, this crate needs nightly for its benchmarks to function. Does this all check out? And/or is this repo up for requiring nightly for benchmarking?
The text was updated successfully, but these errors were encountered:
Hi, it looks to me like the benchmarks in this crate need a bit of love. I'd like to help but I need some input.
Currently (e209a50)
cargo bench slice
tells me:That doesn't seem right. Extending with an iterator over a slice shouldn't be faster than extending with a slice directly. Here is what
perf
tells me aboutextend_with_slice
:That looks to me like the extend is gone and this just updates some internal benchmark counter.
I cannot devise a way to prevent this optimization using
bencher::black_box
. If I apply the most aggressive black-boxing I can think of:If this is optimized well, we should get performance equal to
extend_from_slice
, but we don't. It looks to me like we end up with a loop that copies each item independently, for a ~36x regression. Ow. But if that isn't bad enough, settingcodegen-units = 1
inprofile.bench
again lets LLVM optimize away the benchmark. And unfortunately the structure of these benchmarks forbids doing something likev = black_box(v);
, but if I restructure them to accommodate that, benchmarks at dominated bybencher::black_box
.If I use the clobber-all-memory black box with this diff, no combination of
codegen-units = 1
,lto = true
, andpanic = "abort"
will induce LLVM to optimize away the benchmark:So as far as I can tell, this crate needs nightly for its benchmarks to function. Does this all check out? And/or is this repo up for requiring nightly for benchmarking?
The text was updated successfully, but these errors were encountered: