Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

php - fgetcsv/fputcsv $escape parameter fundamentally broken

Overview

fgetcsv and fputcsv support an $escape argument, however, it's either broken, or I'm not understanding how it's supposed to work. Ignore the fact that you don't see the $escape parameter documented on fputcsv, it is supported in the PHP source, there's a small bug preventing it from coming through in the documentation.

The function also supports $delimiter and $enclosure parameters, defaulting to a comma and a double quote respectively. I would expect the $escape parameter should be passed in order to have a field containing any one of those metacharacters (backslash, comma or double quote), however this certainly isn't the case. (I now understand from reading Wikipedia, these are to be enclosed in double-quotes).

What I've tried

Take for example the pitfall that has affected numerous posters in the comments section from the fgetcsv documentation. The case where we'd like to write a single backslash to a field.

$r = fopen('/tmp/test.csv', 'w');
fwrite($r, '""');
fclose($r);

$r = fopen('/tmp/test.csv', 'r');
var_dump(fgetcsv($r));
fclose($r);

This returns false. I've also tried "", however that also returns false. Padding the backslash(es) with some nebulous text gives fgetcsv the boost it needs... "hi\there" and "hihere" both parse and have the same result, but the result has only 1 backslash, so what's the point of the $escape at all?

I've observed the same behavior when not enclosing the backslash in double quotes. Writing a 'CSV' file containing the string , and \, have the same result when parsed by fgetcsv, 1 backslash.

Let's ask PHP how it might encode a backslash as a field in a CSV using fputcsv

$r = fopen('/tmp/test.csv', 'w');
fputcsv($r, array('\'));
fclose($r);
echo file_get_contents('/tmp/test.csv');

The result is a double-quote enclosed single backslash (and I've tried 3 versions of PHP > 5.5.4 when $enclose support was supposedly added to fputcsv). The hilarity of this is that fgetcsv can't even read it properly per my notes above, it returns false... I'd expect fputcsv not to enclose the backslash in double quotes or fgetcsv to be able to read "" as fputcsv has written it..., or really in my apparently misconstrued mind, for fputcsv to write a double quote enclosed pair of backslashes and for fgetcsv to be able to properly parse it!

Reproducible Test

Try writing a single quote to a file using fputcsv, then reading it via fgetcsv.

$aBackslash = array('\');

// Write a single backslash to a file using fputcsv
$r = fopen('/tmp/test.csv', 'w');
fputcsv($r, $aBackslash);
fclose($r);

// Read the file using fgetcsv
$r = fopen('/tmp/test.csv', 'r');
$aFgetcsv = fgetcsv($r);
fclose($r);

// Compare the read value from fgetcsv to our original value
if(count(array_diff($aBackslash, $aFgetcsv)))
  echo "PHP CSV support is broken
";

Questions

Taking a step back I have some questions

  • What's the point of the $escape parameter?
  • Given the loose definition of CSV files, can it be said PHP is supporting them correctly?
  • What's the 'proper' way to encode a backslash in a CSV file?

Background

I initially discovered this when a co-worker provided me a CSV file produced from Python, which wrote out a single backslash enclosed by double quotes and after fgetcsv failed to read it. I had the gaul to ask him if he could use a standard Python function. Little did I know the PHP CSV toolkit is a tangled mess! (FWIW: the Python dev tells me he's using the CSV writing module).

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

From a quick look at Python's documentation on CSV Format Parameters, the escape character used within enclosed values (i.e. inside double quotes) is another double quote.

For PHP, the default escape character is a backslash (^); to match Python's behaviour you need to use this:

$data = fgetcsv($r, 0, ',', '"', '"');

(^) Actually fgetcsv() treats both $enclosure||$enclosure and $escape||$enclosure in the same way, so the $escape argument is used to avoid treating the backslash as a special character.

(^^) Setting the $length parameter to 0 instead of a fixed hard limit makes it less efficient.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...