c++ - How do I reinterpret data through a different type? (type punning confusion)

Question

Welcome To Ask or Share your Answers For Others

c++ - How do I reinterpret data through a different type? (type punning confusion)

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

c++ - How do I reinterpret data through a different type? (type punning confusion)

#include <iostream>

int main(int argc, char * argv[])
{
    int a = 0x3f800000;

    std::cout << a << std::endl;

    static_assert(sizeof(float) == sizeof(int), "Oops");

    float f2 = *reinterpret_cast<float *>(&a);

    std::cout << f2 << std::endl;

    void * p = &a;
    float * pf = static_cast<float *>(p);
    float f3 = *pf;

    std::cout << f3 << std::endl;

    float f4 = *static_cast<float *>(static_cast<void *>(&a));

    std::cout << f4 << std::endl;
}

I get the following info out of my trusty compiler:

me@Mint-VM ~/projects $ g++-5.3.0 -std=c++11 -o pun pun.cpp -fstrict-aliasing -Wall
pun.cpp: In function ‘int main(int, char**)’:
pun.cpp:11:45: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     float f2 = *reinterpret_cast<float *>(&a);
                                             ^
pun.cpp:21:61: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     float f4 = *static_cast<float *>(static_cast<void *>(&a));
                                                             ^
me@Mint-VM ~/projects $ ./pun
1065353216
1
1
1
me@Mint-VM ~/projects $ g++-5.3.0 --version
g++-5.3.0 (GCC) 5.3.0
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I don't really understand when and why I get type-punned errors in some places and not in others.

So, strict aliasing:

Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias each other.)

Line 11 claims I'm breaking strict-aliasing. I don't see a case where this can possibly hurt anything - the pointer "comes into existence", is immediately dereferenced, then thrown away. In all likelyhood, this will compile down to zero instructions. This seems like absolutely no risk - I'm telling the compiler EXACTLY what I want.

Lines 15-16 proceed to NOT elicit a warning, even though the pointers to the same memory location are now here to stay. This appears to be a bug in gcc.

Line 21 elicits the warning, showing that this is NOT limited to just reinterpret_cast.

Unions are no better (emphasis mine):

...it's undefined behavior to read from the member of the union that wasn't most recently written. Many compilers implement, as a non-standard language extension, the ability to read inactive members of a union.

This link talks about using memcpy, but that seems to just hide what you're really trying to accomplish.

For some systems, it is a required operation to write a pointer to an int register, or receive an incoming byte stream and assemble those bytes into a float, or other non-integral type.

What is the correct, standard-conforming way of doing this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T03:07:50+0000

Credit to Anton here please. His answer was first and his is correct.

I am posting this exposition because I know you won't believe him until you see the assembler:

Given:

#include <cstring>
#include <iostream>

// prevent the optimiser from eliding this function altogether
__attribute__((noinline))
float convert(int in)
{
    static_assert(sizeof(float) == sizeof(int), "Oops");
    float result;
    memcpy(&result, &in, sizeof(result));
    return result;
}

int main(int argc, char * argv[])
{
    int a = 0x3f800000;
    float f = convert(a);


    std::cout << a << std::endl;
    std::cout << f << std::endl;
}

result:

1065353216
1

compiled with -O2, here's the assembler output for the function convert, with some added comments for clarity:

#
# I'll give you ￡10 for every call to `memcpy` you can find...
#
__Z7converti:                           ## @_Z7converti
    .cfi_startproc
## BB#0:
    pushq   %rbp
Ltmp0:
    .cfi_def_cfa_offset 16
Ltmp1:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
Ltmp2:
    .cfi_def_cfa_register %rbp
#
# here's the conversion - simply move the integer argument (edi)
# into the first float return register (xmm0)
#
    movd    %edi, %xmm0
    popq    %rbp
    retq
    .cfi_endproc
#
# did you see any memcpy's? 
# nope, didn't think so.
#

Just to drive the point home, here's the same function compiled with -O2 and -fomit-frame-pointer :

__Z7converti:                           ## @_Z7converti
    .cfi_startproc
## BB#0:
    movd    %edi, %xmm0
    retq
    .cfi_endproc

Remember, this function only exists because I added the attribute to prevent the compiler from inlining it. In reality, with optimisations enabled, the entire function will be optimised away. Those 3 lines of code in the function and the call at the call site will vanish.

Modern optimising compilers are awesome.

but what I really wanted was this std::cout << *reinterpret_cast<float *>(&a) << std::endl; and I think it expresses my intent perfectly well.

Well, yes it does. But c++ is designed with both correctness and performance in mind. Very often, the compiler would like to assume that two pointers or two references don't point to the same piece of memory. If it can do that, it can make all kinds of clever optimisations (usually involving not bothering make reads or writes which aren't necessary to produce the required effect). However, because a write to one pointer could affect the read from the other (if they really point at the same object), then in the interests of correctness, the compiler may not assume that the two objects are distinct, and it must perform every read and write you indicated in your code - just in case one write affects a subsequent read... unless the pointers point to different types. If they point to different types, the compiler is allowed to assume that they will never point to the same memory - this is the strict aliasing rule.

When you do this: *reinterpret_cast<float *>(&a),

you're trying to read the same memory via an int pointer and a float pointer. Because the pointers are of different types, the compiler will assume that they point to different memory addresses - even though in your mind they do not.

This is the struct aliasing rule. It's there to help programs perform quickly and correctly. A reinterpret cast like this prevents either.

Categories

c++ - How do I reinterpret data through a different type? (type punning confusion)

c++ - How do I reinterpret data through a different type? (type punning confusion)

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags