The C standard says that text files must end with a newline or the data after the last newline may not be read properly.
ISO/IEC 9899:2011 §7.21.2 Streams
A text stream is an ordered sequence of characters composed into lines, each line
consisting of zero or more characters plus a terminating new-line character. Whether the
last line requires a terminating new-line character is implementation-defined. Characters
may have to be added, altered, or deleted on input and output to conform to differing
conventions for representing text in the host environment. Thus, there need not be a one-to-
one correspondence between the characters in a stream and those in the external
representation. Data read in from a text stream will necessarily compare equal to the data
that were earlier written out to that stream only if: the data consist only of printing
characters and the control characters horizontal tab and new-line; no new-line character is
immediately preceded by space characters; and the last character is a new-line character.
Whether space characters that are written out immediately before a new-line character
appear when read in is implementation-defined.
I would not have expected a missing newline at the end of file to cause trouble in bash
(or any Unix shell), but that does seem to be the problem reproducibly ($
is the prompt in this output):
$ echo xxx\c
xxx$ { echo abc; echo def; echo ghi; echo xxx\c; } > y
$ cat y
abc
def
ghi
xxx$
$ while read line; do echo $line; done < y
abc
def
ghi
$ bash -c 'while read line; do echo $line; done < y'
abc
def
ghi
$ ksh -c 'while read line; do echo $line; done < y'
abc
def
ghi
$ zsh -c 'while read line; do echo $line; done < y'
abc
def
ghi
$ for line in $(<y); do echo $line; done # Preferred notation in bash
abc
def
ghi
xxx
$ for line in $(cat y); do echo $line; done # UUOC Award pending
abc
def
ghi
xxx
$
It is also not limited to bash
— Korn shell (ksh
) and zsh
behave like that too. I live, I learn; thanks for raising the issue.
As demonstrated in the code above, the cat
command reads the whole file. The for line in `cat $DATAFILE`
technique collects all the output and replaces arbitrary sequences of white space with a single blank (I conclude that each line in the file contains no blanks).
Tested on Mac OS X 10.7.5.
What does POSIX say?
The POSIX read
command specification says:
The read utility shall read a single line from standard input.
By default, unless the -r
option is specified, <backslash> shall act as an escape character. An unescaped <backslash> shall preserve the literal value of the following character, with the exception of a <newline>. If a <newline> follows the <backslash>, the read utility shall interpret this as line continuation. The <backslash> and <newline>
shall be removed before splitting the input into fields. All other unescaped <backslash> characters shall be removed after splitting the input into fields.
If standard input is a terminal device and the invoking shell is interactive, read shall prompt for a continuation line when it reads an input line ending with a <backslash> <newline>, unless the -r
option is specified.
The terminating <newline> (if any) shall be removed from the input and the results shall be split into fields as in the shell for the results of parameter expansion (see Field Splitting); [...]
Note that '(if any)' (emphasis added in quote)! It seems to me that if there is no newline, it should still read the result. On the other hand, it also says:
STDIN
The standard input shall be a text file.
and then you get back to the debate about whether a file that does not end with a newline is a text file or not.
However, the rationale on the same page documents:
Although the standard input is required to be a text file, and therefore will always end with a <newline> (unless it is an empty file), the processing of continuation lines when the -r
option is not used can result in the input not ending with a <newline>. This occurs if the last line of the input file ends with a <backslash> <newline>. It is for this reason that "if any" is used in "The terminating <newline> (if any) shall be removed from the input" in the description. It is not a relaxation of the requirement for standard input to be a text file.
That rationale must mean that the text file is supposed to end with a newline.
The POSIX definition of a text file is:
3.395 Text File
A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character. Although POSIX.1-2008 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.
This does not stipulate 'ends with a <newline>' directly, but does defer to the C standard and it does say "A file that contains characters organized into zero or more lines" and when we look at the POSIX definition of a "Line" it says:
3.206 Line
A sequence of zero or more non- <newline> characters plus a
terminating <newline> character.
so per the POSIX definition a file must end in a terminating newline because it's made up of lines and each line must end in a terminating newline.
A solution to the 'no terminal newline' problem
Note Gordon Davisson's answer. A simple test shows that his observation is accurate:
$ while read line; do echo $line; done < y; echo $line
abc
def
ghi
xxx
$
Therefore, his technique of:
while read line || [ -n "$line" ]; do echo $line; done < y
or:
cat y | while read line || [ -n "$line" ]; do echo $line; done
will work for files without a newline at the end (at least on my machine).
I'm still surprised to find that the shells drop the last segment (it can't be called a line because it doesn't end with a newline) of the input, but there might be sufficient justification in POSIX to do so. And clearly it is best to ensure that your text files really are text files ending with a newline.