Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
556 views
in Technique[技术] by (71.8m points)

caching - Minimum associativity for a PIPT L1 cache to also be VIPT, accessing a set without translating the index to physical

This question comes in context of a section on virtual memory in an undergraduate computer architecture course. Neither the teaching assistants nor the professor were able to answer it sufficiently, and online resources are limited.

Question:

Suppose a processor with the following specifications:

  • 8KB pages
  • 32-bit virtual addresses
  • 28-bit physical addresses
  • a two-level page table, with a 1KB page table at the first level, and 8KB page tables at the second level
  • 4-byte page table entries
  • a 16-entry 8-way set associative TLB
  • in addition to the physical frame (page) number, page table entries contain a valid bit, a readable bit, a writeable bit, an executable bit, and a kernel-only bit.

Now suppose this processor has a 32KB L1 cache whose tags are computed based on physical addresses. What is the minimum associativity that cache must have to allow the appropriate cache set to be accessed before computing the physical address that corresponds to a virtual address?

Intuition:

My intuition is that if the number of indices in the cache and the number of virtual pages (aka page table entries) is evenly divisible by each other, then we could retrieve the bytes contained within the physical page directly from the cache without ever computing that physical page, thus providing a small speed-up. However, I am unsure if this is the correct intuition and definitely don't know how to follow through with it. Could someone please explain this?

Note: I have computed the number of page table entries to be 2^19, if that helps anyone.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

What is the minimum associativity that cache must have to allow the appropriate cache set to be accessed before computing the physical address that corresponds to a virtual address?

They're only specified that the cache is physically tagged.

You can always build a virtually indexed cache, no minimum associativity. Even direct-mapped (1 way per set) works. See Cache Addressing Methods Confusion for details on VIPT vs. PIPT (and VIVT, and even the unusual PIVT).

For this question not to be trivial, I assume they also meant "without creating aliasing problems", so VIPT is just a speedup over PIPT (physically indexed, phyiscally tagged). You get the benefit of allowing TLB lookup in parallel with fetching tags (and data) for the ways of the indexed set without any downsides.

My intuition is that if the number of indices in the cache and the number of virtual pages (aka page table entries) is evenly divisible by each other, then we could retrieve the bytes contained within the physical page directly from the cache without ever computing that physical page

You need the physical address to check against the tags; remember your cache is physically tagged. (Virtually tagged caches do exist, but typically have to get flushed on context switches to a process with different page tables = different virtual address space. This used to be used for small L1 caches on old CPUs.)

Having both numbers be a power of 2 is normally assumed, so they're always evenly divisible.

Page sizes are always a power of 2 so you can split an address into page number and offset-within-page by just taking different ranges of bits in the address.

Small/fast cache sizes also always have a power of 2 number of sets so the index "function" is just taking a range of bits from the address. For a virtually-indexed cache: from the virtual address. For a physically-indexed cache: from the physical address. (Outer caches like a big shared L3 cache may have a fancier indexing function, like a hash of more address bits, to avoid aliasing for addresses offset from each other by a large power of 2.)

The cache size might not be a power of 2, but you'd do that by having a non-power-of-2 associativity (e.g. 10 or 12 ways is not rare) rather than a non-power-of-2 line size or number of sets. After indexing a set, the cache fetches the tags for all the ways of that set and compare them in parallel. (And for fast L1 caches, often fetch the data selected by the line-offset bits in parallel, too, then the comparators just mux that data into the output, or raise a flag for no match.)


Requirements for VIPT without aliasing (like PIPT)

For that case, you need all index bits to come from below the page offset. They translate "for free" from virtual to physical so a VIPT cache (that indexes a set before TLB lookup) has no homonym/synonym problems. Other than performance, it's PIPT.

My detailed answer on Why is the size of L1 cache smaller than that of the L2 cache in most of the processors? includes a section on that speed hack.

Virtually indexed physically tagged cache Synonym shows a case where the cache does not have that property, and needs page coloring by the OS to let avoid synonym problems.

How to compute cache bit widths for tags, indices and offsets in a set-associative cache and TLB has some more notes about cache size / associativity that give that property.

Formula:

  • min associativity = cache size / page size

e.g. a system with 8kiB pages needs a 32kiB L1 cache to be at least 4-way associative so that index bits only come from the low 13.

A direct-mapped cache (1 way per set) can only be as large as 1 page: byte-within-line and index bits total up to the byte-within-page offset. Every byte within a direct-mapped (1-way) cache must have a unique index:offset address, and those bits come from contiguous low bits of the full address.

To put it another way, 2^(idx_bits + within_line_bits) is the total cache size with only one way per set. 2^N is the page size, for a page offset of N (the number of byte-within-page address bits that translate for free).

The actual number of sets (in this case = lines) depends on the line size and page size. Using smaller / larger lines would just shift the divide between offset and index bits.

From there, the only way to make the cache bigger without indexing from higher address bits is to add more ways per set, not more ways.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...