What is the minimum associativity that cache must have to allow the appropriate cache set to be accessed before computing the physical address that corresponds to a virtual address?
They're only specified that the cache is physically tagged.
You can always build a virtually indexed cache, no minimum associativity. Even direct-mapped (1 way per set) works. See Cache Addressing Methods Confusion for details on VIPT vs. PIPT (and VIVT, and even the unusual PIVT).
For this question not to be trivial, I assume they also meant "without creating aliasing problems", so VIPT is just a speedup over PIPT (physically indexed, phyiscally tagged). You get the benefit of allowing TLB lookup in parallel with fetching tags (and data) for the ways of the indexed set without any downsides.
My intuition is that if the number of indices in the cache and the number of virtual pages (aka page table entries) is evenly divisible by each other, then we could retrieve the bytes contained within the physical page directly from the cache without ever computing that physical page
You need the physical address to check against the tags; remember your cache is physically tagged. (Virtually tagged caches do exist, but typically have to get flushed on context switches to a process with different page tables = different virtual address space. This used to be used for small L1 caches on old CPUs.)
Having both numbers be a power of 2 is normally assumed, so they're always evenly divisible.
Page sizes are always a power of 2 so you can split an address into page number and offset-within-page by just taking different ranges of bits in the address.
Small/fast cache sizes also always have a power of 2 number of sets so the index "function" is just taking a range of bits from the address. For a virtually-indexed cache: from the virtual address. For a physically-indexed cache: from the physical address. (Outer caches like a big shared L3 cache may have a fancier indexing function, like a hash of more address bits, to avoid aliasing for addresses offset from each other by a large power of 2.)
The cache size might not be a power of 2, but you'd do that by having a non-power-of-2 associativity (e.g. 10 or 12 ways is not rare) rather than a non-power-of-2 line size or number of sets. After indexing a set, the cache fetches the tags for all the ways of that set and compare them in parallel. (And for fast L1 caches, often fetch the data selected by the line-offset bits in parallel, too, then the comparators just mux that data into the output, or raise a flag for no match.)
Requirements for VIPT without aliasing (like PIPT)
For that case, you need all index bits to come from below the page offset. They translate "for free" from virtual to physical so a VIPT cache (that indexes a set before TLB lookup) has no homonym/synonym problems. Other than performance, it's PIPT.
My detailed answer on Why is the size of L1 cache smaller than that of the L2 cache in most of the processors? includes a section on that speed hack.
Virtually indexed physically tagged cache Synonym shows a case where the cache does not have that property, and needs page coloring by the OS to let avoid synonym problems.
How to compute cache bit widths for tags, indices and offsets in a set-associative cache and TLB has some more notes about cache size / associativity that give that property.
Formula:
- min associativity = cache size / page size
e.g. a system with 8kiB pages needs a 32kiB L1 cache to be at least 4-way associative so that index bits only come from the low 13.
A direct-mapped cache (1 way per set) can only be as large as 1 page: byte-within-line and index bits total up to the byte-within-page offset. Every byte within a direct-mapped (1-way) cache must have a unique index:offset address, and those bits come from contiguous low bits of the full address.
To put it another way, 2^(idx_bits + within_line_bits)
is the total cache size with only one way per set. 2^N is the page size, for a page offset of N (the number of byte-within-page address bits that translate for free).
The actual number of sets (in this case = lines) depends on the line size and page size. Using smaller / larger lines would just shift the divide between offset and index bits.
From there, the only way to make the cache bigger without indexing from higher address bits is to add more ways per set, not more ways.