This idea is fatally flawed, and impossible to make safe in ISO C++ with non-atomic x
. Data-race Undefined Behaviour (UB) is unavoidable because one thread writes x
unconditionally and the other reads it unconditionally.
At best you'd be rolling your own atomics by using compiler barriers to force one thread to sync actual memory state with abstract-machine state. But even then, rolling your own atomics without volatile is not very safe: https://lwn.net/Articles/793253/ explains why the Linux kernel's hand-rolled atomics use volatile
casts for pure-store and pure-load. This gives you something like relaxed-atomic on normal compilers, but of course zero guarantee from ISO C++.
When to use volatile with multi threading? basically never- you can get the same efficient asm from using atomic<int>
with mo_relaxed
. (Or on x86, even acquire and release are free in asm.)
If you were going to attempt this, in practice on most implementations, std::atomic_thread_fence(std::memory_order_seq_cst)
will block compile-time reordering of non-atomic operations across it. (e.g. in GCC I think it's basically equivalent to x86 asm("mfence" ::: "memory")
1 which blocks compile-time reordering and is also a full barrier. But I think some of that "strength" is an implementation-detail and not required by ISO C++.
Footnote 1: BTW, usually you want a dummy lock add
with stack memory, not actual mfence, because mfence is slower.
Semi-related: Your bool variables don't need to be atomic. IDK if it's more or less distracting to make them atomic; I was leaning towards being simpler if they're not. They're each written by at most 1 thread, and only read after that thread has been join
ed. You could make them plain bool, and also write them unconditionally like y_was_zero = (y == 0);
if you want. (But that's neutral as far as simplicity, although saves looking at their initializers).
- What is the least amount of synchronization (e.g. weakest memory models for all operations) necessary to guarantee this?
x
needs to be atomic<>
and both stores need to be seq_cst. (This is basically equivalent to draining the store buffer after doing the store).
Like in https://preshing.com/20120515/memory-reordering-caught-in-the-act/
In practice I think both loads can be relaxed
on most machines (maybe not POWER though where private store-forwarding is possible). For ISO C++ to guarantee it I think you need seq_cst on both loads as well, so all 4 operations are part of a global total order of operations across multiple objects that's compatible with program order. There's no synchronizes-with via release/acquire to create a happens-before relationship.
Generally seq_cst
is the only ordering in the ISO C++ memory model that must translate to blocking StoreLoad reordering in a memory model based on the existence of an actual coherent state that exist even if nobody's looking at it, and individual threads accessing that state with local reordering. (ISO C++ only talks about what other threads can observe, and hypothetical observers in theory might not constrain code-gen. But in practice they do because compilers don't do whole-program inter-thread analysis.)
If you for some reason can't make x
be atomic<>
Use C++20 atomic_ref<>
to construct a reference to x
that you can use to do xref.store(1, mo_seq_cst)
or xref.load(mo_seq_cst)
.
Or with GNU C/C++ atomic builtins, __atomic_store_n(&x, 1, __ATOMIC_SEQ_CST)
(which is exactly what C++20 atomic_ref is designed to wrap.)
Or with semi-portable stuff, *(volatile int*)&x = 1;
and a barrier, which might or might not work, depending on the compiler. A DeathStation 9000 can certainly make volatile
int assignment non-atomic if it wants to. But fortunately the compilers people choose to use in real life aim to not be terrible, and often to be usable for low-level systems programming. Still, this is not at all guaranteed by anything to work.