TL;DR: s.end() + 1
is undefined behavior.
std::string
is a strange beast, mainly for historical reasons:
- It attempts to bring C compatibility, where it is known that an additional
character exists beyond the length reported by strlen
.
- It was designed with an index-based interface.
- As an after thought, when merged in the Standard library with the rest of the STL code, an iterator-based interface was added.
This led std::string
, in C++03, to number 103 member functions, and since then a few were added.
Therefore, discrepancies between the different methods should be expected.
Already in the index-based interface discrepancies appear:
§21.4.5 [string.access]
const_reference operator[](size_type pos) const;
reference operator[](size_type pos);
1/ Requires: pos <= size()
const_reference at(size_type pos) const;
reference at(size_type pos);
5/ Throws: out_of_range
if pos >= size()
Yes, you read this right, s[s.size()]
returns a reference to a NUL character while s.at(s.size())
throws an out_of_range
exception. If anyone tells you to replace all uses of operator[]
by at
because they are safer, beware the string
trap...
So, what about iterators?
§21.4.3 [string.iterators]
iterator end() noexcept;
const_iterator end() const noexcept;
const_iterator cend() const noexcept;
2/ Returns: An iterator which is the past-the-end value.
Wonderfully bland.
So we have to refer to other paragraphs. A pointer is offered by
§21.4 [basic.string]
3/ The iterators supported by basic_string
are random access iterators (24.2.7).
while §17.6 [requirements] seems devoid of anything related. Thus, strings iterators are just plain old iterators (you can probably sense where this is going... but since we came this far let's go all the way).
This leads us to:
24.2.1 [iterator.requirements.general]
5/ Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator i
for which the expression *i
is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable. [...]
So, *s.end()
is ill-formed.
24.2.3 [input.iterators]
2/ Table 107 -- Input iterator requirements (in addition to Iterator)
List for pre-condition to ++r
and r++
that r
be dereferencable.
Neither the Forward iterators, Bidirectional iterators nor Random iterator lift this restriction (and all indicate they inherit the restrictions of their predecessor).
Also, for completeness, in 24.2.7 [random.access.iterators], Table 111 -- Random access iterator requirements (in addition to bidirectional iterator) lists the following operational semantics:
r += n
is equivalent to [inc|dec]rememting r
n
times
a + n
and n + a
are equivalent to copying a
and then applying += n
to the copy
and similarly for -= n
and - n
.
Thus s.end() + 1
is undefined behavior.