Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
430 views
in Technique[技术] by (71.8m points)

c++ - C++20 std::atomic<float>- std::atomic<double>.specializations

C++20 includes specializations for atomic<float> and atomic<double>. Can anyone here explain for what practical purpose this should be good for? The only purpose I can imagine is when I have a thread that changes an atomic double or float asynchronously at random points and other threads read this values asynchronously (but a volatile double or float should in fact do the same on most platforms). But the need for this should be extremely rare. I think this rare case couldn't justify an inclusion into the C++20 standard.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

EDIT: Adding Ulrich Eckhardt's comment to clarify: 'Let me try to rephrase that: Even if volatile on one particular platform/environment/compiler did the same thing as atomic<>, down to the generated machine code, then atomic<> is still much more expressive in its guarantees and furthermore, it is guaranteed to be portable. Moreover, when you can write self-documenting code, then you should do that.'

Volatile sometimes has the below 2 effects:

  1. Prevents compilers from caching the value in a register.
  2. Prevents optimizing away accesses to that value when they seem unnecessary from the POV of your program.

See also Understanding volatile keyword in c++

TLDR;

Be explicit about what you want.

  • Do not rely on 'volatile' do do what you want, if 'what' is not the original purpose of volatile, e.g. enabling external sensors or DMA to change a memory address without the compiler interfering.
  • If you want an atomic, use std::atomic.
  • If you want to disable strict aliasing optimizations, do like the Linux kernel, and disable strict aliasing optimizations on e.g. gcc.
  • If you want to disable other kinds of compiler optimizations, use compiler intrinsics or code explicit assembly for e.g ARM or x86_64.
  • If you want 'restrict' keyword semantics like in C, use the corresponding restrict intrinsic in C++ on your compiler, if available.
  • In short, do not rely on compiler- and CPU-family dependent behavior if constructs provided by the standard are clearer and more portable. Use e.g. godbolt.org to compare the assembler output if you believe your 'hack' is more efficient than doing it the right way.

From std::memory_order

Relationship with volatile

Within a thread of execution, accesses (reads and writes) through volatile glvalues cannot be reordered past observable side-effects (including other volatile accesses) that are sequenced-before or sequenced-after within the same thread, but this order is not guaranteed to be observed by another thread, since volatile access does not establish inter-thread synchronization.

In addition, volatile accesses are not atomic (concurrent read and write is a data race) and do not order memory (non-volatile memory accesses may be freely reordered around the volatile access).

One notable exception is Visual Studio, where, with default settings, every volatile write has release semantics and every volatile read has acquire semantics (MSDN), and thus volatiles may be used for inter-thread synchronization. Standard volatile semantics are not applicable to multithreaded programming, although they are sufficient for e.g. communication with a std::signal handler that runs in the same thread when applied to sig_atomic_t variables.

As a final rant: In practice, the only feasible languages for building an OS kernel are usually C and C++. Given that, I would like provisions in the 2 standards for 'telling the compiler to butt out', i.e. to be able to explicitly tell the compiler to not change the 'intent' of the code. The purpose would be to use C or C++ as a portable assembler, to an even greater degree than today.

An somewhat silly code example is worth compiling on e.g. godbolt.org for ARM and x86_64, both gcc, to see that in the ARM case, the compiler generates two __sync_synchronize (HW CPU barrier) operations for the atomic, but not for the volatile variant of the code (uncomment the one you want). The point being that using atomic gives predictable, portable behavior.

#include <inttypes.h>
#include <atomic>

std::atomic<uint32_t> sensorval;
//volatile uint32_t sensorval;

uint32_t foo()
{
    uint32_t retval = sensorval;
    return retval;
}
int main()
{
    return (int)foo();
}

Godbolt output for ARM gcc 8.3.1:

foo():
  push {r4, lr}
  ldr r4, .L4
  bl __sync_synchronize
  ldr r4, [r4]
  bl __sync_synchronize
  mov r0, r4
  pop {r4, lr}
  bx lr
.L4:
  .word .LANCHOR0

For those who want an X86 example, a colleague of mine, Angus Lepper, graciously contributed this example: godbolt example of bad volatile use on x86_64


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...