Обсуждение: pg_dump's checkSeek() seems inadequate
While testing a fix for the pg_restore issues mentioned a few days ago, I noticed that checkSeek() returns true on my old HPUX box even when the input is in fact not seekable, for instance a pipe. This leads to pg_restore failing completely in cases where it ought to work, like this: $ cat vector.dump | pg_restore -d vector pg_restore: [custom archiver] WARNING: ftell mismatch with expected position -- ftell used pg_restore: [custom archiver] error during file seek: Illegal seek $ The test that checkSeek() is using is to see whether this works: fseeko(fp, 0, SEEK_CUR) and apparently on this platform that's a no-op even on an otherwise unseekable file. I don't know how common this behavior is; I don't recall having seen complaints about pg_restore failing for other people. If I change the test to be fseeko(fp, 0, SEEK_SET) then it does the right thing. Since checkSeek() is applied immediately after opening the input file this should be OK, but it does limit the scope of usefulness of that function. Any thoughts about whether or not to apply such a patch? If it should be changed, should we back-patch it? regards, tom lane
I wrote: > The test that checkSeek() is using is to see whether this works: > fseeko(fp, 0, SEEK_CUR) > and apparently on this platform that's a no-op even on an otherwise > unseekable file. BTW, I looked in the archives for related discussions and found only the thread in which the current fseek-checking code was designed. There were a couple of mentions that SEEK_CUR should be changed to SEEK_SET, eg http://archives.postgresql.org/pgsql-hackers/2002-10/msg01088.php but it was apparently focused around coping with specific seek APIs that didn't have SEEK_CUR at all. We ended up not using those so the change was never made. The thought that a no-op seek might not give the right answer wasn't discussed AFAICT. regards, tom lane
On Sun, Jun 27, 2010 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > If I change the test to be > > fseeko(fp, 0, SEEK_SET) > > then it does the right thing. Since checkSeek() is applied immediately > after opening the input file this should be OK, but it does limit the > scope of usefulness of that function. > > Any thoughts about whether or not to apply such a patch? If it should > be changed, should we back-patch it? Well, I guess it depends on what you think the chances are that the revised test will fail on some other obscure platform. Have there been any reports from the field? If not, I might apply to HEAD and await developments. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Jun 27, 2010 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> If I change the test to be >> fseeko(fp, 0, SEEK_SET) >> then it does the right thing. > Well, I guess it depends on what you think the chances are that the > revised test will fail on some other obscure platform. To believe that, you'd have to believe that fseeko(fp, 0, SEEK_SET) will fail but fseeko(fp, something-not-zero, SEEK_SET) will succeed. A somewhat more plausible scenario is that somebody might hope that they could do something like this: echo 'some custom header' >pg.dumppg_dump -Fc >>pg.dump I believe that (at least on most Unixen) doing fseeko(fp, 0, SEEK_SET) would result in overwriting the custom header, where it would not have been overwritten before. However the usefulness of the above is at best far-fetched; and I'm not very sure that it works today anyway, since pg_dump/pg_restore seem to assume that manual byte counting should match the results of ftell(). regards, tom lane
On Sun, Jun 27, 2010 at 6:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sun, Jun 27, 2010 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> If I change the test to be >>> fseeko(fp, 0, SEEK_SET) >>> then it does the right thing. > >> Well, I guess it depends on what you think the chances are that the >> revised test will fail on some other obscure platform. > > To believe that, you'd have to believe that fseeko(fp, 0, SEEK_SET) > will fail but fseeko(fp, something-not-zero, SEEK_SET) will succeed. > > A somewhat more plausible scenario is that somebody might hope that > they could do something like this: > > echo 'some custom header' >pg.dump > pg_dump -Fc >>pg.dump > > I believe that (at least on most Unixen) doing fseeko(fp, 0, SEEK_SET) > would result in overwriting the custom header, where it would not have > been overwritten before. However the usefulness of the above is at > best far-fetched; and I'm not very sure that it works today anyway, > since pg_dump/pg_restore seem to assume that manual byte counting should > match the results of ftell(). That doesn't actually sound all that far-fetched. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Jun 27, 2010 at 6:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> A somewhat more plausible scenario is that somebody might hope that >> they could do something like this: >> >> echo 'some custom header' >pg.dump >> pg_dump -Fc >>pg.dump > That doesn't actually sound all that far-fetched. I've got my doubts, but I guess we could write an even-more-bulletproof test by doing ftello() and then seeing if fseeko to that position complains. This might be a good thing anyway since most of the other uses of ftello aren't really checking for errors ... regards, tom lane
Tom Lane wrote: > A somewhat more plausible scenario is that somebody might hope that > they could do something like this: > > echo 'some custom header' >pg.dump > pg_dump -Fc >>pg.dump > > I believe that (at least on most Unixen) doing fseeko(fp, 0, SEEK_SET) > would result in overwriting the custom header, where it would not have > been overwritten before. However the usefulness of the above is at > best far-fetched; and I'm not very sure that it works today anyway, > since pg_dump/pg_restore seem to assume that manual byte counting should > match the results of ftell(). > > > What would anyone hope to achieve by such a manoeuvre, even if it worked, which I am close the dead sure it would not? cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> A somewhat more plausible scenario is that somebody might hope that >> they could do something like this: >> >> echo 'some custom header' >pg.dump >> pg_dump -Fc >>pg.dump > What would anyone hope to achieve by such a manoeuvre, even if it > worked, which I am close the dead sure it would not? It looks to me like it probably would actually work, so far as pg_dump is concerned, but _discoverArchiveFormat() would break it because that tries to do an unconditional fseeko(fp, 0, SEEK_SET) (and the position counting is screwed up even if the fseeko fails). That could probably be fixed if anyone thought this scenario was interesting enough to justify work directed specifically at it. I did the ftello/fseeko dance in checkSeek() just now because it seems sensible to me to have checkSeek() actually verify functionality of both of those calls, not because I think it's real likely that the position won't be 0. regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> Tom Lane wrote: >> >>> A somewhat more plausible scenario is that somebody might hope that >>> they could do something like this: >>> >>> echo 'some custom header' >pg.dump >>> pg_dump -Fc >>pg.dump >>> > > >> What would anyone hope to achieve by such a manoeuvre, even if it >> worked, which I am close the dead sure it would not? >> > > It looks to me like it probably would actually work, so far as pg_dump > is concerned, but _discoverArchiveFormat() would break it because that > tries to do an unconditional fseeko(fp, 0, SEEK_SET) (and the position > counting is screwed up even if the fseeko fails). That could probably > be fixed if anyone thought this scenario was interesting enough to > justify work directed specifically at it. > IIRC pg_restore expects the archive to begin with the header and TOC produced by pg_dump. Maybe I'm being dense, but I'm not able to see how prefixing that with something else could possibly do something useful or workable. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >>>> echo 'some custom header' >pg.dump >>>> pg_dump -Fc >>pg.dump > IIRC pg_restore expects the archive to begin with the header and TOC > produced by pg_dump. Maybe I'm being dense, but I'm not able to see how > prefixing that with something else could possibly do something useful or > workable. You'd have to do something like (1) open the file as stdin(2) read the custom header(3) exec pg_restore, telling it to read stdin A possible application for this would be for the header to contain information needed to prepare pg_restore's arguments, like where to contact the server. I still think it's a bit too klugy to justify the effort though. In a more abstract sense, what this would be is basically a custom label for a dump file. I could see that being useful, but if we wanted to support it then it ought to be an actual Feature and not a kluge like this. Something like pg_dump --label 'any string' ... other args ... pg_restore --print-label dump.file regards, tom lane
Tom Lane wrote: > In a more abstract sense, what this would be is basically a custom > label for a dump file. I could see that being useful, but if we > wanted to support it then it ought to be an actual Feature and not > a kluge like this. Something like > > pg_dump --label 'any string' ... other args ... > > pg_restore --print-label dump.file > > > Right, or possibly a file that would be added. I don't think we should waste any effort at all on the kluge, or worrying about whether or not it would work. cheers andrew