I am trying to figure out why a modified C program is running faster than its non modified counter part (I am adding very few lines of code to perform some additional work). In this context, I suspect "cache effects" to be the main explanation (instruction cache). Thus I reach the perf
(https://perf.wiki.kernel.org/index.php/Main_Page) profiling tool but unfortunately I am not able to understand the meaning of its outputs regarding cache misses.
Several events about cache are provided:
cache-references [Hardware event]
cache-misses [Hardware event]
L1-dcache-loads [Hardware cache event]
L1-dcache-load-misses [Hardware cache event]
L1-dcache-stores [Hardware cache event]
L1-dcache-store-misses [Hardware cache event]
L1-dcache-prefetches [Hardware cache event]
L1-dcache-prefetch-misses [Hardware cache event]
L1-icache-loads [Hardware cache event]
L1-icache-load-misses [Hardware cache event]
L1-icache-prefetches [Hardware cache event]
L1-icache-prefetch-misses [Hardware cache event]
LLC-loads [Hardware cache event]
LLC-load-misses [Hardware cache event]
LLC-stores [Hardware cache event]
LLC-store-misses [Hardware cache event]
LLC-prefetches [Hardware cache event]
LLC-prefetch-misses [Hardware cache event]
dTLB-loads [Hardware cache event]
dTLB-load-misses [Hardware cache event]
dTLB-stores [Hardware cache event]
dTLB-store-misses [Hardware cache event]
dTLB-prefetches [Hardware cache event]
dTLB-prefetch-misses [Hardware cache event]
iTLB-loads [Hardware cache event]
iTLB-load-misses [Hardware cache event]
branch-loads [Hardware cache event]
branch-load-misses [Hardware cache event]
node-loads [Hardware cache event]
node-load-misses [Hardware cache event]
node-stores [Hardware cache event]
node-store-misses [Hardware cache event]
node-prefetches [Hardware cache event]
node-prefetch-misses [Hardware cache event]
Where can I find explanation about these fields ? cache-misses event is always smaller than other events. What does this event measure ?
How to interpret the 26,760 L1-icache-load-misses for ls vs the 5,708 cache-misses in the following example ?
perf stat -e L1-icache-load-misses ls
caches caches~ out
Performance counter stats for 'ls':
26,760 L1-icache-load-misses
0.002816690 seconds time elapsed
perf stat -e cache-misses ls
caches caches~ out
Performance counter stats for 'ls':
5,708 cache-misses
0.002822122 seconds time elapsed
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…