my original c++ demo code is shown as follows:
int counter = 0;
while (counter < 5) {
auto start = std::chrono::high_resolution_clock::now();
// instance and result are pre-defined local variables
instance.Search(40.055948, 116.411325, &result);
auto end = std::chrono::high_resolution_clock::now();
int64_t cost_us = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
std::cout << "cost_us=" << cost_us << std::endl;
// usleep(100); // case1: sleep 100 us
// sleep(1); // case2: sleep 1 second
// case3: no sleep at all
counter++;
}s
some people may doubt my call of instance.Search() brings in something unknown, so plz refer to code below:
#include <time.h>
#include <unistd.h>
#include <iostream>
#include <set>
#include <chrono>
void test(const std::set<int>& numbers) {
for (int counter = 0; counter < 5; ++counter) {
auto start = std::chrono::high_resolution_clock::now();
auto it = numbers.lower_bound(5555555);
auto end = std::chrono::high_resolution_clock::now();
int64_t cost_us = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
std::cout << "cost_us=" << cost_us << std::endl;
// usleep(100);
// sleep(1);
}
}
int main() {
std::set<int> test_set;
for (int i = 0; i < 100000000; i++) {
test_set.insert(i);
}
test(test_set);
}
precondition:use perf stat
to count cache_miss, instructions and so on...
case 1: when adding usleep(100) at the end of while loop, result after process finishes is:
cost_us=5
cost_us=5
cost_us=5
cost_us=8
cost_us=6
Performance counter stats for './latency_perf_test_sleep_100_us':
1,785,438 ? ? cache-references
? ? ? 419,583 ? ? cache-misses ? ? ? ?# ? 23.500 % of all cache refs
? 203,832,235 ? ? cycles
? 118,093,490 ? ? instructions ? ? ? ?# ? 0.58 insn per cycle
? ?23,198,708 ? ? branches
? ? ? ?35,092 ? ? faults
? ? ? ? ? 302 ? ? migrations
?
? 1.031460583 seconds time elapsed
case 2: when adding sleep(1), the result is:
cost_us=7
cost_us=65
cost_us=21
cost_us=21
cost_us=32
Performance counter stats for './latency_perf_test_sleep_1_sec':
15,093,302 ? ? cache-references
? ? 1,303,941 ? ? cache-misses ? ? ? ? ?# ? 8.639 % of all cache refs
14,759,103,041 ? ?cycles
24,548,401,788 ? ?instructions ? ? ? ? # ? 1.66 insn per cycle
5,062,488,529 ? ? branches
? ? ? ?35,372 ? ? faults
? ? ? ? 3,444 ? ? migrations
? 6.033182248 seconds time elapsed
case 3: where no sleep()/usleep() at all, the result is:
cost_us=5
cost_us=2
cost_us=1
cost_us=1
cost_us=1
Performance counter stats for './latency_perf_test_without_sleep':
1,715,128 cache-references
? 420,368 cache-misses ? ? ? ? ? ? ?# ? 24.509 % of all cache refs
? 209,238,013 cycles
? 130,647,626 instructions ? ? ? ? ? ? ?# ? 0.62 insn per cycle
? ?25,827,456 branches
? ? ? ?35,092 faults
? ? ? ? ? 362 migrations
? 1.032256618 seconds time elapsed
As mentioned above, time cost on same function/method call differs much in different case. At first, I tended to think that sleep()
causes cache miss (data used by my call). However, after I used taskset
to bind my process with a specific cpu core, the difference didn't disappear as I expected.
I also wonder why adding sleep()
/ usleep()
leads to such a sharp increase of the instructions
counted by perf stat
.
I didn't read any source code of sleep()
or usleep()
, but I guess that the kernel may do some trick when a process calls sleep()
or usleep()
(both call nanosleep()
inside).
Could anyone explain the reason behind this weird phenomenon? Thanks in advance.
question from:
https://stackoverflow.com/questions/65901647/why-time-spent-on-same-function-method-call-differs-much-with-or-without-sleep-u