Single-header C++11 RDTSCP clock and timing utilities released into the public domain.
While developing games, I have wanted the following features which are not provided by std::chrono
:
- triggering events after a certain amount of time
- timing function calls in a high precision manner
- The
RDTSCP
instruction and a compiler which supports C++11 or higher. - Your processor must have an Intel Nehalem (2008) or newer processor or a processeor with an invariant TSC.
If you do not meet these requirements, you can easily remove the RDTSCP
code from the library and enjoy the other features. The relevant sections of the The Intel Software Developer Manuals are at the bottom of this page.
#include "stopwatch.h"
#include <chrono>
#include <iostream>
#include <thread>
int main() {
const auto timer = stopwatch::make_timer(std::chrono::seconds(10));
while (!timer.done()) {
std::cout << std::chrono::duration_cast<std::chrono::seconds>(
timer.remaining())
.count()
<< " seconds remain." << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
std::cout << "10 seconds have elapsed" << std::endl;
}
#include "stopwatch.h"
#include <iostream>
int main() {
const auto cycles = stopwatch::time([] {
for (std::size_t i = 0; i < 10; ++i) {
std::cout << i << std::endl;
}
});
std::cout << "To print out 10 numbers, it took " << cycles.count()
<< " cycles." << std::endl;
}
Taking the median number of cycles for inserting 10000 items into the beginning of a container.
#include "stopwatch.h"
#include <deque>
#include <iostream>
#include <vector>
int main() {
const auto deque_samples = stopwatch::sample<100>([] {
std::deque<int> deque;
for (std::size_t i = 0; i < 10000; ++i) {
deque.insert(deque.begin(), i);
}
});
const auto vector_samples = stopwatch::sample<100>([] {
std::vector<int> vector;
for (std::size_t i = 0; i < 10000; ++i) {
vector.insert(vector.begin(), i);
}
});
std::cout << "median for deque: " << deque_samples[49].count() << std::endl;
std::cout << "median for vector: " << vector_samples[49].count() << std::endl;
}
Output on my MacbookPro 2016:
median for deque: 487760
median for vector: 7595754
Using another clock is as simple as passing the clock in as a template argument. An example using std::chrono::system_clock
inplace of stopwatch::rdtscp_clock
for the timing one function call
example:
const auto cycles = stopwatch::time<std::chrono::system_clock>([] {
for (std::size_t i = 0; i < 10; ++i) {
std::cout << i << std::endl;
}
});
stopwatch::time([] { ... })
became stopwatch::time<std::chrono::system_clock>([] { ... }
. That's it!
Contributions of any variety are greatly appreciated. All code is passed through clang-format
using the Google style.
The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor’s support for invariant TSC is indicated by CPUID.80000007H:EDX[8]. The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behavior moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource.
Processors based on Intel microarchitecture code name Nehalem provide an auxiliary TSC register, IA32_TSC_AUX that is designed to be used in conjunction with IA32_TSC. IA32_TSC_AUX provides a 32-bit field that is initialized by privileged software with a signature value (for example, a logical processor ID).
The primary usage of IA32_TSC_AUX in conjunction with IA32_TSC is to allow software to read the 64-bit time stamp in IA32_TSC and signature value in IA32_TSC_AUX with the instruction RDTSCP in an atomic operation. RDTSCP returns the 64-bit time stamp in EDX:EAX and the 32-bit TSC_AUX signature value in ECX. The atomicity of RDTSCP ensures that no context switch can occur between the reads of the TSC and TSC_AUX values.