Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
190 views
in Technique[技术] by (71.8m points)

c++ - Why is modifying a string through a retrieved pointer to its data not allowed?

In C++11, the characters of a std::string have to be stored contiguously, as § 21.4.1/5 points out:

The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

However, here is how § 21.4.7.1 lists the two functions to retrieve a pointer to the underlying storage (emphasis mine):

const charT* c_str() const noexcept;
const charT* data() const noexcept;
1 Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
2 Complexity: constant time.
3 Requires: The program shall not alter any of the values stored in the character array.

One possibility I can think of for point number 3 is that the pointer can become invalidated by the following uses of the object (§ 21.4.1/6):

  • as an argument to any standard library function taking a reference to non-const basic_string as an argument.
  • Calling non-const member functions, except operator[], at, front, back, begin, rbegin, end, and rend.

Even so, iterators can become invalidated, but we can still modify them regardless until they do. We can still use the pointer until it becomes invalidated to read from the buffer as well.

Why can't we write directly to this buffer? Is it because it would put the class in an inconsistent state, as, for example, end() would not be updated with the new end? If so, why is it permitted to write directly to the buffer of something like std::vector?

Use cases for this include being able to pass the buffer of a std::string to a C interface to retrieve a string instead of passing in a vector<char> instead and initializing the string with iterators from that:

std::string text;
text.resize(GetTextLength());
GetText(text.data());
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Why can't we write directly to this buffer?

I'll state the obvious point: because it's const. And casting away a const value and then modifying that data is... rude.

Now, why is it const? That goes back to the days when copy-on-write was considered a good idea, so std::basic_string had to allow implementations to support it. It would be very useful to get an immutable pointer to the string (for passing to C-APIs, for example) without incurring the overhead of a copy. So c_str needed to return a const pointer.

As for why it's still const? Well... that goes to an oddball thing in the standard: the null terminator.

This is legitimate code:

std::string stupid;
const char *pointless = stupid.c_str();

pointless must be a NUL-terminated string. Specifically, it must be a pointer to a NUL character. So where does the NUL character come from? There are a couple of ways for a std::string implementation to allow this to work:

  1. Use small-string optimization, which is a common technique. In this scheme, every std::string implementation has an internal buffer it can use for a single NUL character.
  2. Return a pointer to static memory, containing a NUL character. Therefore, every std::string implementation will return the same pointer if it's an empty string.

Everyone shouldn't be forced to implement SSO. So the standards committee needed a way to keep #2 on the table. And part of that is giving you a const string from c_str(). And since this memory is likely real const, not fake "Please don't modify this memory const," giving you a mutable pointer to it is a bad idea.

Of course, you can still get such a pointer by doing &str[0], but the standard is very clear that modifying the NUL terminator is a bad idea.

Now, that being said, it is perfectly valid to modify the &str[0] pointer, and the array of characters therein. So long as you stay in the half-open range [0, str.size()). You just can't do it through the pointer returned by data or c_str. Yes, even though the standard in fact requires str.c_str() == &str[0] to be true.

That's standardese for you.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...