Обсуждение: pg_[un]escape_bytea, pgsql 8.2.1, php 5.1.6, Linux
All... Apache 2.2.3, PostgreSQL 8.2.1, PHP 5.1.6, Linux I have inserted (via pg_query_params) into a bytea field some binary data (a JPEG image in this case) which I have escaped using pg_escape_bytea. It appears, however, that the extracted data is corrupt (NOT unescaped, more precisely), even after unescaping it with pg_unescape_bytea. If I perform another (a subsequent) pg_unescape_bytea, it appears to be partially unescaped, but there still remain errors because the rest of the image is severely distorted -- but minimally recognizeable as part of the original image. What am I missing? I'm using the lo_* functions as an alternative, but it's hard to dismiss the ease with which it appears to deal with binary data with a bytea field. Many thanks in advance! -- Gary Chambers // Nothing fancy and nothing Microsoft!
gwchamb@gmail.com wrote: > Apache 2.2.3, PostgreSQL 8.2.1, PHP 5.1.6, Linux > > I have inserted (via pg_query_params) into a bytea field some binary > data (a JPEG image in this case) which I have escaped using > pg_escape_bytea. It appears, however, that the extracted data is > corrupt (NOT unescaped, more precisely), even after unescaping it with > pg_unescape_bytea. If I perform another (a subsequent) > pg_unescape_bytea, it appears to be partially unescaped, but there > still remain errors because the rest of the image is severely > distorted -- but minimally recognizeable as part of the original > image. What am I missing? I'm using the lo_* functions as an > alternative, but it's hard to dismiss the ease with which it appears > to deal with binary data with a bytea field. Interesting problem. pg_query_params() should have been made binary-safe, but it isn't. It only accepts and passes 'text' mode arguments to PostgreSQL. So you cannot put raw bytea data into a query parameter. But you cannot use pg_escape_bytea() on the data either. pg_escape_bytea() escapes the data in preparation for two levels of parsing/unescaping: once by the SQL parser, and once by the bytea-type input function. This is what you need for a non-parameterized query, like "INSERT INTO mytable (bd) VALUES ('$data')" where bd is a bytea column, and $data went through pg_escape_bytea(). The escaping done by pg_escape_bytea() is wrong for parameterized queries. With a binary-mode query parameter (which pg_query_params() can't do anyway), you want no escaping at all. With a text-mode parameter (as pg_query_params() does), you need to escape for only the bytea-input parsing, not the SQL parsing. So for example if your data has a byte with value 1, you need to pass that as the 4 characters: \001. pg_escape_bytea() returns that as the 5 characters: \\001 (unless the new 'standard conforming strings' is on), so it won't work. Nor can I think of another PHP escaping function that does work here. To me, this means that you should probably do non-parameterized queries instead, with pg_query() and pg_escape_bytea(), with your bytea data.
Thanks for the reply! > pg_query_params() should have been made binary-safe, but it isn't. It only > accepts and passes 'text' mode arguments to PostgreSQL. So you cannot put > raw bytea data into a query parameter. Hmmm... Disappointing. Will pg_query_params ever become binary safe? I'm evaluating Postgres as an alternative to Oracle, so that's where the majority of my experience lies. > you need for a non-parameterized query, like "INSERT INTO mytable (bd) > VALUES ('$data')" where bd is a bytea column, and $data went through > pg_escape_bytea(). Understood. I do not like for several reasons that method of inserting data. It exposes me to SQL injection attacks, it's very inefficient (in Oracle, anyway -- perhaps you can correct me where Postgres is concerned), it seems uncharacteristic of a database with the qualities of Postgres, I can't have all my queries in a single source file, and I can't take advantage of the ease with which I can handle binary data with a bytea field. > To me, this means that you should probably do non-parameterized queries > instead, with pg_query() and pg_escape_bytea(), with your bytea data. Would there be any advantage to simply using a text field and base64 encoding and decoding the binary data? I really don't want to use non-parameterized queries. -- Gary Chambers // Nothing fancy and nothing Microsoft!
On Saturday 03 February 2007, Gary Chambers wrote: > > you need for a non-parameterized query, like "INSERT INTO mytable (bd) > > VALUES ('$data')" where bd is a bytea column, and $data went through > > pg_escape_bytea(). > > Understood. I do not like for several reasons that method of > inserting data. It exposes me to SQL injection attacks, it's very > inefficient (in Oracle, anyway -- perhaps you can correct me where > Postgres is concerned), it seems uncharacteristic of a database with > the qualities of Postgres, I can't have all my queries in a single > source file, and I can't take advantage of the ease with which I can > handle binary data with a bytea field. This would be a problem related to php, not postgres. I'm handling binary data in parameterized and COPY queries just fine with c++. > > To me, this means that you should probably do non-parameterized queries > > instead, with pg_query() and pg_escape_bytea(), with your bytea data. > > Would there be any advantage to simply using a text field and base64 > encoding and decoding the binary data? I really don't want to use > non-parameterized queries. base64 would solve your binary problem, but it is costly (disk space and cpu). I think you can instead use prepared statements via SQL directly (as php probably does in the end) : // initialisation pg_query('PREPARE mystatement (bytea) AS INSERT INTO mytable (bd) VALUES ($1);'); // insert loop pg_query("EXECUTE mystatement (' . pg_escape_bytea($data) . "');"); Annoying to have to do all this yourself, but it should work (and it *is* a parameterized query). BTW, if you're doing bulk inserts, consider pg_copy_from() instead. -- Vincent de Phily
vdephily@bluemetrix.com wrote: >... > I think you can instead use prepared statements via SQL directly (as php > probably does in the end) : > > // initialisation > pg_query('PREPARE mystatement (bytea) AS INSERT INTO mytable (bd) VALUES > ($1);'); > // insert loop > pg_query("EXECUTE mystatement (' . pg_escape_bytea($data) . "');"); > > Annoying to have to do all this yourself, but it should work (and it *is* a > parameterized query). It's sort of a parameterized query, but not really in the sense the original poster wants. That is, it does not protect against SQL injection attacks the way a true parameterized query does (with the variable data passed outside of the SQL statement itself, and not subject to SQL parsing). If some way around pg_escape_bytea were to be found (as perhaps happened before with multi-byte characters and PQescapeString), the above could be vulnerable. In fact I don't see where it is any safer than just doing pg_query("INSERT ... '" . pg_escape_bytea($data) . "')");
In article <302670f20702011827x28d9a689m388b2c8a703f31af@mail.gmail.com>, Gary Chambers <gwchamb@gmail.com> wrote: >I have inserted (via pg_query_params) into a bytea field some >binary data (a JPEG image in this case) which I have escaped >using pg_escape_bytea. I just use a text field and base64 encode the image. It works great. -- http://yosemitephotos.net/
Vincent... > This would be a problem related to php, not postgres. I'm handling binary data > in parameterized and COPY queries just fine with c++. I'm not ruling-out PHP, but I've seen claims of pg_[un]escape_bytea being only wrappers to the Pg functions. -- Gary Chambers // Nothing fancy and nothing Microsoft!
gwchamb@gmail.com wrote: > Vincent... > >> This would be a problem related to php, not postgres. I'm handling binary data >> in parameterized and COPY queries just fine with c++. > > I'm not ruling-out PHP, but I've seen claims of pg_[un]escape_bytea > being only wrappers to the Pg functions. That is correct, the PHP function relies on the PostgreSQL library to do the work. But the problem of using a parameterized query with bytea data is specific to PHP. PostgreSQL allows you to either escape the bytea data for use as a text-mode parameter, or pass it 'raw' as a binary-mode parameter. I don't know a good way to do the first in PHP, and the PHP PostgreSQL interface doesn't support the second at all.
On Thursday 08 February 2007, ljb wrote: > gwchamb@gmail.com wrote: > > Vincent... > > > >> This would be a problem related to php, not postgres. I'm handling > >> binary data in parameterized and COPY queries just fine with c++. > > > > I'm not ruling-out PHP, but I've seen claims of pg_[un]escape_bytea > > being only wrappers to the Pg functions. > > That is correct, the PHP function relies on the PostgreSQL library to do > the work. But the problem of using a parameterized query with bytea data > is specific to PHP. PostgreSQL allows you to either escape the bytea > data for use as a text-mode parameter, or pass it 'raw' as a binary-mode > parameter. I don't know a good way to do the first in PHP, and the PHP > PostgreSQL interface doesn't support the second at all. This sparked my interest, so I checked the libpq docs (http://www.postgresql.org/docs/8.2/interactive/libpq-exec.html) and wrote some php tests. The postgres doc say that the escape functions should only be used "for Inclusion in SQL Commands", and that "it is not necessary nor correct to do escaping when a data value is passed as a separate parameter in PQexecParams". But when I try to pass the raw data using pg_query_params(), postgres complains about UTF8 (on a supposedly bytea field). Makes you wonder which postgres function does pg_query_params() really wrap around... I reallize I went all this way to just re-discover what has been said in this thread, but if I needed the extra research, maybe other will find it insightfull. -- Vincent de Phily
ljb wrote: > gwchamb@gmail.com wrote: >> Apache 2.2.3, PostgreSQL 8.2.1, PHP 5.1.6, Linux >> >> I have inserted (via pg_query_params) into a bytea field some binary >> data (a JPEG image in this case) which I have escaped using >> pg_escape_bytea. It appears, however, that the extracted data is >> corrupt (NOT unescaped, more precisely), even after unescaping it with >> pg_unescape_bytea. If I perform another (a subsequent) >> pg_unescape_bytea, it appears to be partially unescaped, but there >> still remain errors because the rest of the image is severely >> distorted -- but minimally recognizeable as part of the original >> image. What am I missing? I'm using the lo_* functions as an >> alternative, but it's hard to dismiss the ease with which it appears >> to deal with binary data with a bytea field. > > Interesting problem. ... > The escaping done by pg_escape_bytea() is wrong for parameterized queries. > With a binary-mode query parameter (which pg_query_params() can't do > anyway), you want no escaping at all. With a text-mode parameter (as > pg_query_params() does), you need to escape for only the bytea-input > parsing, not the SQL parsing. So for example if your data has a byte with > value 1, you need to pass that as the 4 characters: \001. > pg_escape_bytea() returns that as the 5 characters: \\001 (unless the new > 'standard conforming strings' is on), so it won't work. Nor can I think of > another PHP escaping function that does work here. Isn't it possible to partly undo the effect of pg_escape_bytea()? Something like this: <?php /* some binary data */ $data = implode("", range("\000", "\377")); echo "pg_escape_bytea():\n"; echo pg_escape_bytea($data), "\n\n"; echo "pg_escape_bytea(), corrected:\n"; echo str_replace("\\\\", "\\", pg_escape_bytea($data)), "\n\n"; ?> Here's the result: $ php -f test.php | fold -w 70 pg_escape_bytea(): \\000\\001\\002\\003\\004\\005\\006\\007\\010\\011\\012\\013\\014\\015 \\016\\017\\020\\021\\022\\023\\024\\025\\026\\027\\030\\031\\032\\033 \\034\\035\\036\\037 !"#$%&''()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOP QRSTUVWXYZ[\\\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\\177\\200\\201\\202\ \203\\204\\205\\206\\207\\210\\211\\212\\213\\214\\215\\216\\217\\220\ \221\\222\\223\\224\\225\\226\\227\\230\\231\\232\\233\\234\\235\\236\ \237\\240\\241\\242\\243\\244\\245\\246\\247\\250\\251\\252\\253\\254\ \255\\256\\257\\260\\261\\262\\263\\264\\265\\266\\267\\270\\271\\272\ \273\\274\\275\\276\\277\\300\\301\\302\\303\\304\\305\\306\\307\\310\ \311\\312\\313\\314\\315\\316\\317\\320\\321\\322\\323\\324\\325\\326\ \327\\330\\331\\332\\333\\334\\335\\336\\337\\340\\341\\342\\343\\344\ \345\\346\\347\\350\\351\\352\\353\\354\\355\\356\\357\\360\\361\\362\ \363\\364\\365\\366\\367\\370\\371\\372\\373\\374\\375\\376\\377 pg_escape_bytea(), corrected: \000\001\002\003\004\005\006\007\010\011\012\013\014\015\016\017\020\0 21\022\023\024\025\026\027\030\031\032\033\034\035\036\037 !"#$%&''()* +,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmno pqrstuvwxyz{|}~\177\200\201\202\203\204\205\206\207\210\211\212\213\21 4\215\216\217\220\221\222\223\224\225\226\227\230\231\232\233\234\235\ 236\237\240\241\242\243\244\245\246\247\250\251\252\253\254\255\256\25 7\260\261\262\263\264\265\266\267\270\271\272\273\274\275\276\277\300\ 301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\32 2\323\324\325\326\327\330\331\332\333\334\335\336\337\340\341\342\343\ 344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\36 5\366\367\370\371\372\373\374\375\376\377 Isn't the latter ready for pg_query_params()? The only doubt I have is about the double '. Is it needed for bytea parsing or sql parsing? You may have to replace it with a single ', if it's only for SQL: str_replace(array("\\\\", "''"), array("\\", "'"), pg_escape_bytea($data)), "\n\n"; See http://www.postgresql.org/docs/8.2/interactive/datatype-binary.html. The man page is not clear, the single quote is listed among the escaped chars, but from the examples is seems that only SQL escaping is performed. Compare with \: SELECT E'\\\\'::bytea; -- after SQL parsing, you get \\ SELECT E'\''::bytea; -- after SQL parsing, you get ' it seems that the bytea parser is feeded with a double slash, but a single quote. If so, the octal value 39 should be removed from the table in the manual page, since it requires no escaping at bytea level. pg_escape_bytea() seems to confirm this, it returns '' for ' but \\\\ for \. \ is escaped twice, ' only once. Unfortunately I can't try it now with pg_query_params(), but I bet on the second form... well no, maybe I can, yes, here we go: <?php $data = implode("", range("\000", "\377")); $data_escaped = str_replace( array("\\\\", "''"), array("\\", "'"), pg_escape_bytea($data)); /* fill in the conn string, if required */ $db = pg_connect(""); pg_query($db, "CREATE TEMP TABLE atest (data bytea);"); pg_query_params($db, "INSERT INTO atest VALUES ($1)", array($data_escaped)); $res = pg_query($db, "SELECT data FROM atest"); $data2 = pg_unescape_bytea(pg_fetch_result($res, 0)); echo ($data2 == $data) ? "OK!" : "PANIC!", "\n"; ?> $ php -f test.php OK! Seems to work... .TM.