-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
benchmarks: No way to get reproducible results #203
Comments
@olajep What do you think about timing several runs of the functions? Most of them run in the ns, maybe running them 1000 times (for example) would improve the precision of the measurements. Also, running the benchmark a few times is a good idea to get some statistics. Maybe report minimum, median and maximum run times. What I mean is something like: for (i = 0; i < 100; i++) {
item_preface(&data[i], ...);
for (j = 0; j < 1000; j++) {
fun()
}
item_done(&data[i], ...);
} PAPI looks cool... Maybe we could take only the ARM/x86 parts (I see they don't support Windows anymore, not sure if that would be an issue). |
@lchamon On 2015-07-22 02:51, Chamon wrote:
I think that should do it. Cheers, |
@olajep Hmmm... I thought you could maybe limit the time instead of the number of iterations. But it might complicate things more than it solves, don't know. My idea was (in retrospect, maybe not a really good one): volatile int i = 0; /* So the compiler doesn't optimize the loop */
volatile int j = 0;
item_preface(&data, item);
while (data->end - data->start < MAX_TIME) {
item->benchmark(&spec);
item_done(&data, &spec, item->name);
i++;
}
loop_time = platform_clock();
while (i - j > 0)
j++;
loop_time -= platform_clock();
data->end -= loop_time; Maybe clock count is the way to go. |
The higher the resolution of the timing, the less measurements you need to make. The Parallella has performance counters, right? And PAPI supports Linux on ARM. So it sounds like the right solution. 👍 |
On ARM, PAPI uses the perf subsystem of the Linux kernel. If you want, you can use perf directly. The SUPERCOP benchmark software does this (look in the file supercop-20141124/cpucycles/perfevent.c). But the perf API is not as nice as PAPI. |
The Linux/ARM timers from PAPI appears to use I guess for ARM there is no easy way around |
I haven't used PAPI yet, but it should be possible to check at run-time if it supports hardware counters by checking if |
@eliteraspberries I haven't used it either, take what I say with a grain of salt. I was checking out the source to see how they would be accessing the ARM performance counter from the user space, but it seems they're not (like you said, only compiling and running To me, |
I measure my code performance such as speed and the timing is occasionally off. I believe 10 iteration is minimum and 100 is enough, 1000 is a bit too much. An average or mean with plus and minus deviation will suffice to have one column of data for speed. Below is example code in c++ I used to time my code performance: |
I have tested an inline assembly solution for x86 processors that have All methods hit the same average measurement (some with more precision than others). For fast functions, the variance is considerably better using @olajep I pushed the solution using
|
Variance between runs is way too high.
I guess we could wrap the function in a for loop and use the lowest measurement, that would certainly improve things.
But I think we better use performance counters instead.
Epiphany isn't affected by this since we already use CTIMERs there.
PAPI seems to have the cross-platform support we need:
http://icl.cs.utk.edu/papi/
The text was updated successfully, but these errors were encountered: