TL:DR: Anything is atomic if you do it with interrupts disabled on a UP system, as long as you don't count system devices observing memory with DMA.
Note the intr_disable ();
/ intr_set_level (old_level);
around the operation.
modern CPU guarantees that aligned memory operation are atomic
For multi-threaded observers, that only applies to separate loads or stores, not read-modify-write operations.
For something to be atomic, we have to consider what potential observers we care about. What matters is that nothing can observe the operation as having partially happened. The most straightforward way to achieve that is for the operation to be physically / electrically instantaneous, and affect all the bits simultaneously (e.g. a load or store on a parallel bus goes from not-started to completed at the boundary of a clock cycle, so it's atomic "for free" up to the width of the parallel bus). That's not possible for a read-modify-write, where the best we can do is stop observers from looking between the load and the store.
My answer on Atomicity on x86 explained the same thing a different way, about what it means to be atomic.
In a uniprocessor (UP) system, the only asynchronous observers are other system devices (e.g. DMA) and interrupt handlers. If we can exclude non-CPU observers from writing to our semaphore, then it's just atomicity with respect to interrupts that we care about.
This code takes the easy way out and disables interrupts. That's not necessary (or at least it wouldn't be if we were writing in asm).
An interrupt is handled between two instructions, never in the middle of an instruction. The architectural state of the machine either includes the memory-decrement or it doesn't, because dec [mem]
either ran or it didn't. We don't actually need lock dec [mem]
for this.
BTW, this is the use-case for cmpxchg
without a lock
prefix. I always used to wonder why they didn't just make lock
implicit in cmpxchg
, and the reason is that UP systems often don't need lock
prefixes.
The exceptions to this rule are interruptible instructions that can record partial progress, like rep movsb
or vpgather
/ vpscatter
See Interrupting instruction in the middle of execution These won't be atomic wrt. interrupts even when the only observer is other code on the same core. Only a single iteration of rep whatever
, or a single element of a gather or scatter, will have happened or not.
Most SIMD instructions can't record partial progress, so for example vmovdqu ymm0, [rdi]
either fully happens or not at all from the PoV of the core it runs on. (But not of course guaranteed atomic wrt. other observers in the system, like DMA or MMIO, or other cores. That's when the normal load/store atomicity guarantees matter.)
There's no reliable way to make sure the compiler emits dec [value]
instead of something like this:
mov eax, [value]
;; interrupt here = bad
dec eax
;; interrupt here = bad
mov [value], eax
ISO C11 / C++11 doesn't provide a way to request atomicity with respect to signal handlers / interrupts, but not other threads. They do provide atomic_signal_fence
as a compiler barrier (vs. thread_fence as a barrier wrt. other threads/cores), but barriers can't create atomicity, only control ordering wrt. other operations.
C11/C++11 volatile sig_atomic_t
does have this idea in mind, but it only provides atomicity for separate loads/stores, not RMW. It's a typedef for int
on x86 Linux. See that question for some quotes from the standard.
On specific implementations, gcc -Wa,-momit-lock-prefix=yes
will omit all lock prefixes. (GAS 2.28 docs) This is safe for single-threaded code, or a uniprocessor machine, if your code doesn't include device-driver hardware access that needs to do an atomic RMW on a MMIO location, or that uses a dummy lock add
as a faster mfence
.
But this is unusable in a multi-threaded program that needs to run on SMP machines, if you have some atomic RMWs between threads as well as some between a thread and a signal handler.