TL;DR:
Delete addslashes($data)
. It's redundant here.
Double-escaping .. twice
$data=fread($p,filesize($fi));
$data=addslashes($data);
$dat= pg_escape_bytea($data);
You read the data in, escape it as if it were a string literal, then convert it to bytea octal or hex escapes. It could never work that way around even if pg_escape_bytea
was sane, which it isn't.
PHP's pg_escape_bytea
appears to double-escape the output so it can be inserted into a string literal. This is incredibly ugly, but there doesn't appear to be an alternative that doesn't do this double-escaping, so you can't seem to use parameterised statements for bytea in PHP. You should still do so for everything else.
In this case, simply removing the addslashes
line for the data read in from the file is sufficient.
Test case showing that pg_escape_bytea
double-escapes (and always uses the old, inefficient octal escapes, too):
<?php
# oh-the-horror.php
print pg_escape_bytea("Blah binaryx00x01x02x03x04 blah");
?>
Run:
php oh-the-horror.php
Result:
Blah binary\000\001\002\003\004 blah
See the doubled backslashes? That's because it's assuming you're going to interpolate it into SQL as a string, which is extremely memory inefficient, ugly, and a very bad habit. You don't seem to get any alternative, though.
Among other things this means that:
pg_unescape_bytea(pg_escape_bytea("x01x02x03"));
... produces the wrong result, since pg_unescape_bytea
is not actually the reverse of pg_escape_bytea
. It also makes it impossible to feed the output of pg_escape_bytea
into pg_query_params
as a parameter, you have to interpolate it in.
Decoding
If you're using a modern PostgreSQL, it probably sets bytea_output
to hex
by default. That means that if I write my data to a bytea
field then fetch it back, it'll look something like this:
craig=> CREATE TABLE byteademo(x bytea);
CREATE TABLE
craig=> INSERT INTO byteademo(x) VALUES ('Blah binary\000\001\002\003\004 blah');
INSERT 0 1
craig=> SELECT * FROM byteademo ;
x
----------------------------------------------------------------------------
x426c61682062696e6172795c3030305c3030315c3030325c3030335c30303420626c6168
(1 row)
"Um, what", you might say? It's fine, it's just PostgreSQL's slightly more compact hex representation of bytea
. pg_unescape_bytea
will handle it fine and produce the same raw bytes as output ... if you have a modern PHP and libpq
. On older versions you'll get garbage and will need to set bytea_output
to escape
for pg_unescape_bytea
to handle it.
What you should do instead
Use PDO.
It has sane(ish) support for bytea
.
$sth = $pdo->prepare('INSERT INTO mytable(somecol, byteacol) VALUES (:somecol, :byteacol)');
$sth->bindParam(':somecol', 'bork bork bork');
$sth->bindParam(':byteacol', $thebytes, PDO::PARAM_LOB);
$sth->execute();
See:
You may also want to look in to PostgreSQL's lob (large object) support, which provides a streaming, seekable interface that's still fully transactional.
Now, on to my soap box
If PHP had a real distinction between "byte string" and "text string" types, you wouldn't even need pg_escape_bytea
as the database driver could do it for you. None of this ugliness would be required. Unfortunately, there are no separate string and bytes types in PHP.
Please, use PDO with parameterised statements as much as possible.
Where you can't, at least use pg_query_params
and parameterised statements. PHP's addslashes
is not an alternative, it's inefficient, ugly, and doesn't understand database specific escaping rules. You still have to manually escape bytea
if you're not using PDO for icky historical reasons, but everything else should go through parameterised statements.
For guidance on pg_query_params
: