User @Margaret points towards a reasonable answer in the comments - read the kernel source to see the mapping for the PMU events.
We can check arch/x86/events/intel/core.c for the event definitions. I don't actually know if "core" here refers to the Core architecture, of just that this is the core fine with most definitions - but in any case it's the file you want to look at.
The key part is this section, which defines skl_hw_cache_event_ids
:
static __initconst const u64 skl_hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] =
{
[ C(L1D ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x81d0, /* MEM_INST_RETIRED.ALL_LOADS */
[ C(RESULT_MISS) ] = 0x151, /* L1D.REPLACEMENT */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0x82d0, /* MEM_INST_RETIRED.ALL_STORES */
[ C(RESULT_MISS) ] = 0x0,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0x0,
[ C(RESULT_MISS) ] = 0x0,
},
},
...
Decoding the nested initializers, you get that the L1D-dcahe-load
corresponds to MEM_INST_RETIRED.ALL_LOAD
and L1-dcache-load-misses
to L1D.REPLACEMENT
.
We can double check this with perf:
$ ocperf stat -e mem_inst_retired.all_loads,L1-dcache-loads,l1d.replacement,L1-dcache-load-misses,L1-dcache-loads,mem_load_retired.l1_hit head -c100M /dev/zero > /dev/null
Performance counter stats for 'head -c100M /dev/zero':
11,587,793 mem_inst_retired_all_loads
11,587,793 L1-dcache-loads
20,233 l1d_replacement
20,233 L1-dcache-load-misses # 0.17% of all L1-dcache hits
11,587,793 L1-dcache-loads
11,495,053 mem_load_retired_l1_hit
0.024322360 seconds time elapsed
The "Hardware Cache" events show exactly the same values as using the underlying PMU events we guessed at by checking the source.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…