Обсуждение: Minor bug/inconveniance with restore from backup, using PITR base backup and archived wal files
Minor bug/inconveniance with restore from backup, using PITR base backup and archived wal files
От
Tommy Gildseth
Дата:
I've recently been testing our backup/restore procedures, and discovered a minor inconvenience. I emptied out the data directory(on a test-box), and restored it from a backup. I made sure that pg_xlog and pg_xlog/archive_status was empty. I then set up the recovery.conf file in the root of the data directory, and pointed restore_command in the direction of the archived wal files. After starting postgresql, the cluster started restoring from the archive logs, and everything looks fine, and doing what it should. The problem came at the end of the recovery, after the log line: [2008-09-23 15:33:14.764 CEST] [pgtest01] [:] [] [18393] [] LOG: archive recovery complete followed immediately after by this line: [2008-09-24 13:04:52.168 CEST] pgtest01] [:] [] [13324] [] PANIC: could not open file "pg_xlog/00000002000000030000009F" (log file 3, segment 159): Permission denied and, the cluster shutting down. After a bit of looking around, and with some help from the fine people in #postgresql on freenode, I think I figured out what was going on. The last wal archive file was 00000001000000030000009F, and after finishing recovery, postgresql created the file 00000002000000030000009F (ie. 00000002 instead of 00000001) in pg_xlog. The wal-files were archived read-only, and this file permission seemed to be carried over to the new file created by postgresql in pg_xlog, causing the cluster to fall over and die. Changing permissions on the last wal-file in the archive directory to read/write made this problem go away. Moving the last wal file from the archive folder to pg_xlog/ or, if I had partial archive log files in pg_xlog, also made this problem go away, irrespective of whether the file was chmod read only, or read/write. Simply chmoding the wal-file in pg_xlog/, and trying to restart the cluster did not work, and I found no other way to recover from this, other than to start over again from the beginning. -- Tommy Gildseth
Tommy Gildseth wrote: > I've recently been testing our backup/restore procedures, and discovered > a minor inconvenience. Running 8.2.9 btw -- Tommy Gildseth
Tommy Gildseth <tommy.gildseth@usit.uio.no> writes: > ... problem came at the end of the recovery, after the log line: > [2008-09-23 15:33:14.764 CEST] [pgtest01] [:] [] [18393] [] LOG: archive > recovery complete > followed immediately after by this line: > [2008-09-24 13:04:52.168 CEST] pgtest01] [:] [] [13324] [] PANIC: > could not open file "pg_xlog/00000002000000030000009F" (log file 3, > segment 159): Permission denied > and, the cluster shutting down. > After a bit of looking around, and with some help from the fine people > in #postgresql on freenode, I think I figured out what was going on. > The last wal archive file was 00000001000000030000009F, and after > finishing recovery, postgresql created the file 00000002000000030000009F > (ie. 00000002 instead of 00000001) in pg_xlog. It's customary for PG to "create" new XLOG segments by recycling old ones. > The wal-files were > archived read-only, and this file permission seemed to be carried over > to the new file created by postgresql in pg_xlog, causing the cluster to > fall over and die. I would say that the bug is in your restore script: it should have made sure that the files it copies into the xlog directory are given the right ownership/permissions. regards, tom lane
Tom Lane wrote: > Tommy Gildseth <tommy.gildseth@usit.uio.no> writes: >> After a bit of looking around, and with some help from the fine people >> in #postgresql on freenode, I think I figured out what was going on. >> The last wal archive file was 00000001000000030000009F, and after >> finishing recovery, postgresql created the file 00000002000000030000009F >> (ie. 00000002 instead of 00000001) in pg_xlog. > > It's customary for PG to "create" new XLOG segments by recycling old > ones. > >> The wal-files were >> archived read-only, and this file permission seemed to be carried over >> to the new file created by postgresql in pg_xlog, causing the cluster to >> fall over and die. > > I would say that the bug is in your restore script: it should have made > sure that the files it copies into the xlog directory are given the > right ownership/permissions. Well, the restore command(script) is simply copied from the suggestion in the manual (restore_command = 'cp /path/to/my/archived/wal/files/%f "%p"'). In my opinion, it's not very obvious that the last wal file needs read/write permissions set, and it's certainly not documented anywhere on http://www.postgresql.org/docs/current/static/continuous-archiving.html that I can see. There's also the matter of the inconsistency that postgresql knows to recycle *and* chmod the file if it's originally located in pg_xlog/ folder, but not if it's originally located in the wal files archive folder. I guess it's more of a gotcha than a bug per se. -- Tommy Gildseth