On 10-08-01 03:03 PM, Tom Lane wrote:
> The archiver will retry, *if the archive command returns non-zero exit
> status*. It sounds to me like you're using an archive command script
> that dutifully logs a failure but is careless about returning the proper
> exit status.
That was my first thought, too, but the PostgreSQL log says this...
2010-07-31 06:29:11 EDT LOG: archive command failed with exit code 1
...so it definitely knew about it. It was also suspicious that
00000001000002BD00000072.00000020.backup hung around in the pg_xlog
directory; if the server thought the archive command was successful it
would presumably have cleaned it up.
> I'm afraid you're probably screwed as far as replaying any data beyond
> the lost WAL segment goes. Even if you forced the system to try to
> replay it, you'd have corrupted database state because of the omission
> of the changes that were in the lost segment. If you still have the
> original $PGDATA tree (ie you didn't blow it away while trying the PITR
> idea) then you might be able to get a closer approximation to current
> time by doing resetxlog and starting up --- though the consistency of
> the DB would still be questionable, so a dump and reload would be
> advisable.
>
> regards, tom lane
Luckily, we were able to rebuild our data from out-of-band data, but
it's good to know about resetxlog.
Thanks for your help.
jk