Now we all sometimes have to work with binary data. In C++ we work with sequences of bytes, and since the beginning char
was the our building block. Defined to have sizeof
of 1, it is the byte. And all library I/O functions use char
by default. All is good but there was always a little concern, a little oddity that bugged some people - the number of bits in a byte is implementation-defined.
So in C99, it was decided to introduce several typedefs to let the developers easily express themselves, the fixed-width integer types. Optional, of course, since we never want to hurt portability. Among them, uint8_t
, migrated into C++11 as std::uint8_t
, a fixed width 8-bit unsigned integer type, was the perfect choice for people who really wanted to work with 8 bit bytes.
And so, developers embraced the new tools and started building libraries that expressively state that they accept 8-bit byte sequences, as std::uint8_t*
, std::vector<std::uint8_t>
or otherwise.
But, perhaps with a very deep thought, the standardization committee decided not to require implementation of std::char_traits<std::uint8_t>
therefore prohibiting developers from easily and portably instantiating, say, std::basic_fstream<std::uint8_t>
and easily reading std::uint8_t
s as a binary data. Or maybe, some of us don't care about the number of bits in a byte and are happy with it.
But unfortunately, two worlds collide and sometimes you have to take a data as char*
and pass it to a library that expects std::uint8_t*
. But wait, you say, isn't char
variable bit and std::uint8_t
is fixed to 8? Will it result into a loss of data?
Well, there is an interesting Standardese on this. The char
defined to hold exactly one byte and byte is the lowest addressable chunk of memory, so there can't be a type with bit width lesser than that of char
. Next, it is defined to be able to hold UTF-8 code units. This gives us the minimum - 8 bits. So now we have a typedef which is required to be 8 bits wide and a type that is at least 8 bits wide. But are there alternatives? Yes, unsigned char
. Remember that signedness of char
is implementation-defined. Any other type? Thankfully, no. All other integral types have required ranges which fall outside of 8 bits.
Finally, std::uint8_t
is optional, that means that the library which uses this type will not compile if it's not defined. But what if it compiles? I can say with a great degree of confidence that this means that we are on a platform with 8 bit bytes and CHAR_BIT == 8
.
Once we have this knowledge, that we have 8-bit bytes, that std::uint8_t
is implemented as either char
or unsigned char
, can we assume that we can do reinterpret_cast
from char*
to std::uint8_t*
and vice versa? Is it portable?
This is where my Standardese reading skills fail me. I read about safely derived pointers ([basic.stc.dynamic.safety]
) and, as far as I understand, the following:
std::uint8_t* buffer = /* ... */ ;
char* buffer2 = reinterpret_cast<char*>(buffer);
std::uint8_t buffer3 = reinterpret_cast<std::uint8_t*>(buffer2);
is safe if we don't touch buffer2
. Correct me if I'm wrong.
So, given the following preconditions:
CHAR_BIT == 8
std::uint8_t
is defined.
Is it portable and safe to cast char*
and std::uint8_t*
back and forth, assuming that we're working with binary data and the potential lack of sign of char
doesn't matter?
I would appreciate references to the Standard with explanations.
EDIT: Thanks, Jerry Coffin. I'm going to add the quote from the Standard ([basic.lval], §3.10/10):
If a program attempts to access the stored value of an object through a glvalue of other than one of the
following types the behavior is undefined:
...
— a char or unsigned char type.
EDIT2: Ok, going deeper. std::uint8_t
is not guaranteed to be a typedef of unsigned char
. It can be implemented as extended unsigned integer type and extended unsigned integer types are not included in §3.10/10. What now?
See Question&Answers more detail:
os