Under the following assumptions:
- I assume that the code you've shown represents a sequence of x86 assembly instructions rather than actual C code that is yet to be compiled.
- I also assume that the code is being executed on a Cascade Lake processor and not on a later generation of Intel processors (I think CPL or ICX with Barlow Pass support eADR, meaning that explicit flushing is not required for persistence because the caches are in the persistence domain). This answer also applies to existing AMD+NVDIMM platforms.
The global observablility order of stores may differ from the persist order on Intel x86 processors. This is referred to as relaxed persistency. The only case in which the order is guaranteed to be the same is for a sequence of stores of type WB to the same cache line (but a store reaching GO doesn't necessarily meant it's become durable). This is because CLFLUSH
is atomic and WB stores cannot be reordered in global observability. See: On x86-64, is the “movnti” or "movntdq" instruction atomic when system crash?.
If the two stores cross a cache line boundary or if the effective memory type of the stores is WC:
The x86-TSO memory model doesn't allow reordering stores, so it's impossible for another agent to observe x[2] == 100
and x[1] != 100
during normal operation (i.e., in the volatile state without a crash). However, if the system crashed and rebooted, it's possible for the persistent state to be x[2] == 100
and x[1] != 100
. This is possible even if the system crashed after retiring clflush
because the retirement of clflush
doesn't necessarily mean that the cache line flushed has reached the persistence domain.
If you want to eliminate that possibly, you can either move clflush
as follows:
x[1]=100;
clflush(x);
x[2]=100;
clflush
on Intel processors is ordered with respect to all writes, meaning that the line is guaranteed to reach the persistence domain before any later stores become globally observable. See: Persistent Memory Programming Primary (PDF) and the Intel SDM V2. The second store could be to the same line or any other line.
If you want x[1]=100
to become persistent before x[2]=100
becomes globally observable, add sfence
after clflush
on Intel CSX or mfence
on AMD processors (clflush
is only ordered by mfence
on AMD processors). clflush
by itself sufficient to control persist order.
Alternatively, use the sequenceclflushopt+sfence
(or clwb+sfence
) as follows:
x[1]=100;
clflushopt(x);
sfence;
x[2]=100;
In this case, if a crashed happened and if x[2] == 100
in the persistent state, then it's guaranteed that x[1] == 100
. clflushopt
by itself doesn't impose any persist ordering.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…