Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
831 views
in Technique[技术] by (71.8m points)

performance - According to Intel my cache should be 24-way associative though its 12-way, how is that?

According to “Intel 64 and IA-32 architectures optimization reference manual,” April 2012 page 2-23

The physical addresses of data kept in the LLC data arrays are distributed among the cache slices by a hash function, such that addresses are uniformly distributed. The data array in a cache block may have 4/8/12/16 ways corresponding to 0.5M/1M/1.5M/2M block size. However, due to the address distribution among the cache blocks from the software point of view, this does not appear as a normal N-way cache.

My computer is a 2-core Sandy Bridge with a 3 MB, 12-way set associative LLC cache. That does not seem to be coherent with Intels documentation though. According to the data it seems that I should have 24-ways. I can imagine there is something going on with the number of cores/cache-slices but I can't quite figure it out. If I have 2 cores and hence 2 cache slices 1.5 MB per slice, I would have 12 ways per cache slice according to Intel and that does not seem consistent with my CPU specs. Can someone clarify this to me?

If I wanted to evict an entire cache line would I need to access the cache in strides of 128 KB or 256 KB? In fact this is what I am trying to achieve.

Any suggested readings are very welcome.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Associativity is orthogonal to the number of slices or to the mapping done by the hash function. If a given address is mapped to some cache slice(and a given set within it), it can only compete over the ways with other lines that were mapped to the same place. Having 2 slices does not raise associativity, it only reduces the contention (since lines are evenly distributed over more sets eventually).

Therefore you have 12 ways per slice, but the overall associativity per set is still 12 ways.

If you were to test your associativity by accessing different lines mapped to the same set, you will just have a harder time picking such lines (you'll need to know the hash function), but you're still going to get thrashing after 12 lines. However, if you were to ignore the hashing, and assume lines are simply mapped by their set bits, I could appear as if you have higher associativity simply because the lines would divide uniformly between the slices, so thrashing would take longer. This isn't real associativity, but it comes close for some practical purposes. It would only work if you're using a wide physical memory range though, since the upper bits need to change for the hashing to make any impact.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...