Quoting from the IA32 manuals (Vol 3A, Chapter 8.2: Memory Ordering):
In a single-processor system for memory regions defined as write-back cacheable, the memory-ordering model respects the following principles [..]
- Reads are not reordered with other reads
- Writes are not reordered with older reads
- Writes to memory are not reordered with other writes, with the exception of
- writes executed with the
CLFLUSH
instruction
- streaming stores (writes) executed with the non-temporal move instructions ([list of instructions here])
- string operations (see Section 8.2.4.1)
- Reads may be reordered with older writes to different locations but not with older writes to the same location.
- Reads or writes cannot be reordered with I/O instructions, locked instructions, or serializing instructions
- Reads cannot pass
LFENCE
and MFENCE
instructions
- Writes cannot pass
SFENCE
and MFENCE
instructions
Note: The "In a single-processor system" above is slightly misleading. The same rules hold for each (logical) processor individually; the manual then goes on to describe the additional ordering rules between multiple processors. The only bit about it pertaining to the question is that
- Locked instructions have a total order.
In short, as long as you're writing to write-back memory (which is all memory you'll ever see as long as you're not a driver or graphics programmer), most x86 instructions are almost sequentially consistent - the only reordering an x86 CPU can perform is reorder later (independent) reads to execute before writes. The main thing about the write barriers is that they have a lock
prefix (implicit or explicit), which forbids all reordering and ensures that the operations is seen in the same order by all processors in a multi-processor system.
Also, in write-back memory, reads are never reordered, so there's no need for read barriers. Recent x86 processors have a weaker memory consistency model for streaming stores and write-combined memory (commonly used for mapped graphics memory). That's where the various fence
instructions come into play; they're not necessary for any other memory type, but some drivers in the Linux kernel do deal with write-combined memory so they just defined their read-barrier that way. The list of ordering model per memory type is in Section 11.3.1 in Vol. 3A of the IA-32 manuals. Short version: Write-Through, Write-Back and Write-Protected allow speculative reads (following the rules as detailed above), Uncachable and Strong Uncacheable memory has strong ordering guarantees (no processor reordering, reads/writes are immediately executed, used for MMIO) and Write Combined memory has weak ordering (i.e. relaxed ordering rules that need fences).
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…