Hi!
We have a problem very similar to
https://www.postgresql.org/message-id/etPan.54ebb1ea.6b68079a.341@cheyne.local
(which seemingly got no reply).
After running out of disk space our Postgres instance fails to start,
with recovery process indefinitely running:
postgres: startup recovering 00000001000002C70000008B
(the hex address doesn't change)
After doing strace on the startup process, I see the following:
open("pg_xact/01A2", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 91
lseek(91, 98304, SEEK_SET) = 98304
write(91, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
8192) = 8192
fsync(91) = 0
close(91) = 0
read(6, 0x7ffea45345c7, 1) = -1 EAGAIN (Resource
temporarily unavailable)
open("pg_xact/01A2", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 91
lseek(91, 98304, SEEK_SET) = 98304
write(91, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
8192) = 8192
fsync(91) = 0
close(91) = 0
read(6, 0x7ffea45345c7, 1) = -1 EAGAIN (Resource
temporarily unavailable)
Which, again, repeats indefinitely. It looks like PG is seeking the
same position 98304 in a pg_xact/01A2 file, writes 8192 zeros to it,
then does it again and again on the same file position.
Seems like some file got corrupted when PG failed to write the changes to disk.
Just to experiment, I tried backing up and deleting that pg_xact/01A2
file, which does not change anything: postgres recreates it and
continues to write zeros on position 98304. Afterwards I restored the
file from the backup.
Interestingly, when I shut down the main PG process and inspect the
pg_xact/01A2 file, it has some non-zero data on position 98304, so it
looks like PG is writing something in there just before shutting down.
While PG is running, it's always zeros on that position.
We don't have a backup of the database, and while it doesn't contain
any critical data, it still would be nice to recover it or at least
some of the tables which were not modified recently. Any ideas?
Best,
Vasily