Обсуждение: Dropped table, no backup, restore from file system backup or WAL files?
Less than a week ago we installed a database and new software in a production environment. Today someone was attempting to install the same software in a test environment, and realized too late that a script he ranwas run against the production database. It dropped three tables. We need to recover those tables. This is what we have to work with. 1) We have a file system backup from 3AM. 2) We have not yet instituted a daily postgres backup with pg_dump. 3) We have all the WAL files since going into production (unarchived, only 6 files needed to cover the period). There are two ways I can think of to try to recover the data. 1) I presume that we can restore the entire directory from the 3am backup (to a different physical location of course) andthen export the data in the three tables to csv files and reimport it. With that approach, is there anything that shouldbe done to test the integrity of the data? 2) I should think that I could also restore the data from the WAL files, but when I create a recovery.conf file and use pg_resetxlog.exe,I can get it to do its thing without complaint (renames recovery.conf to recovery.done) or I get the followingerror in pg_log: LOG: database system was shut down at 2010-07-06 13:27:42 EDT LOG: starting archive recovery LOG: restore_command = 'donothing.bat' LOG: invalid magic number 0000 in log file 0, segment 31, offset 0 LOG: invalid primary checkpoint record. LOG: invalid magic number 0000 in log file 0, segment 31, offset 0 LOG: invalid secondary checkpoint record PANIC: could not locate a valid checkpoint record In neither case does it restore any data. Note: for the recovery procedure, I have a BAT file that literally does nothing. The reason is that I've copied the properWAL files into the pg_xlog directory, so there is no archiving per se. I have looked at the options pg_resetxlog.exe has and can't figure out if there's some parameter I can set to make it restorethe data. Is it not possible, without doing a checkpoint? Can I construct a check point manually? John
On 2010-07-07 03:24, John T. Dow wrote: > In neither case does it restore any data. > > Note: for the recovery procedure, I have a BAT file that literally does nothing. The reason is that I've copied the properWAL files into the pg_xlog directory, so there is no archiving per se. > Ok, try to fix that.. as per: 24.3.3 here: http://www.postgresql.org/docs/8.4/interactive/continuous-archiving.html Assuming that the filesystem backup was made using pg_start_backup()/pg_stop_backup() it will just work. Jesper
I changed the recovery.conf file so it has one statement: restore_command = 'copy D:\devj\WCSD\received\pg_xlog\%f %p' This is what I now get in pg_log: LOG: database system was shut down at 2010-07-07 14:48:34 EDT LOG: starting archive recovery LOG: restore_command = 'copy D:\devj\WCSD\received\pg_xlog\%f %p' LOG: could not open file "pg_xlog/0000000700000000000000BE" (log file 0, segment 190): No such file or directory LOG: invalid primary checkpoint record LOG: could not open file "pg_xlog/0000000700000000000000BE" (log file 0, segment 190): No such file or directory LOG: invalid secondary checkpoint record PANIC: could not locate a valid checkpoint record And as before, nothing is restored. These are the "archived" files: 0000000100000000000000B6 0000000100000000000000B7 0000000100000000000000B8 0000000100000000000000B9 0000000100000000000000BA 0000000100000000000000BB 0000000100000000000000BC It's looking for timeline 7 but the timeline of the archived files is 1. (Big deal, I can rename them.) It's also looking for file BE, but the files are B6 through BC. I also tried with this: restore_command = 'copy D:\devj\WCSD\received\%f %p' John On Wed, 07 Jul 2010 06:33:01 +0200, Jesper Krogh wrote: >On 2010-07-07 03:24, John T. Dow wrote: >In neither case does it restore any data. > >> Note: for the recovery procedure, I have a BAT file that literally does nothing. The reason is that I've copied the properWAL files into the pg_xlog directory, so there is no archiving per se >Ok, try to fix that.. as per: >24.3.3 here: >http://www.postgresql.org/docs/8.4/interactive/continuous-archiving.html > >Assuming that the filesystem backup was made using >pg_start_backup()/pg_stop_backup() it will just work. > >-- >Jesper
On Wed, 07 Jul 2010 22:18:13 +0200, Jesper Krogh wrote: >On 2010-07-07 20:54, John T. Dow wrote: >> I changed the recovery.conf file so it has one statement: >> >> restore_command = 'copy D:\devj\WCSD\received\pg_xlog\%f %p' >> >> >> >> This is what I now get in pg_log: >> >> LOG: database system was shut down at 2010-07-07 14:48:34 EDT >> LOG: starting archive recovery >> LOG: restore_command = 'copy D:\devj\WCSD\received\pg_xlog\%f %p' >> LOG: could not open file "pg_xlog/0000000700000000000000BE" (log file 0, segment 190): No such file or directory >> LOG: invalid primary checkpoint record >> LOG: could not open file "pg_xlog/0000000700000000000000BE" (log file 0, segment 190): No such file or directory >> LOG: invalid secondary checkpoint record >> PANIC: could not locate a valid checkpoint record >> >> >> >> And as before, nothing is restored. >> >> These are the "archived" files: >> >> 0000000100000000000000B6 >> 0000000100000000000000B7 >> 0000000100000000000000B8 >> 0000000100000000000000B9 >> 0000000100000000000000BA >> 0000000100000000000000BB >> 0000000100000000000000BC >> >> It's looking for timeline 7 but the timeline of the archived files is 1. (Big deal, I can rename them.) >> >> It's also looking for file BE, but the files are B6 through BC. >> >> >> I also tried with this: restore_command = 'copy D:\devj\WCSD\received\%f %p' >> >> >then you either have a wrong base backup (did you run >pg_start_backup/pg_stop_backup)? or >the wrong set of WAL-files for your database. > >-- >Jesper We did not do pg_start_backup/pg_stop_backup. I was naively hoping that the WAL files would work on a "similar" database (ie same tables, but created independently). I understand that WAL files can only be applied to the database which they are based on, or an exact copy of the database.Therefore we need to copy the existing database files (file copy) and then try this technique on the copy. Thatwill take some doing, as people are not in the same city, don't work the same hours, etc. What if we never did pg_start_backup? Will it work anyway? Also, it is unclear from the documentation whether you must have archived wal files to replay the wal files. There seemsto be no way to replay them other than with recovery.conf, and recovery.conf seems to require a restore_command thatactually does something. Is it necessary to clear any wal files from the pg_xlog directory, and is it necessary thatall wal files to be replayed must be copied into pg_xlog using the restore_command? This things aren't really statedvery clearly. John
"John T. Dow" <john@johntdow.com> writes: > We did not do pg_start_backup/pg_stop_backup. Ugh. > What if we never did pg_start_backup? Will it work anyway? You could maybe make it work, if you had full_page_writes turned on and have a continuous series of WAL files extending back to before the manual filesystem backup was started. What pg_start_backup mainly does for you is to automate things and make sure there is a well-defined spot at which a successful replay can be started. It's *not* going to "just work" without pg_start_backup, though. You'd need to manually fake up a suitable backup label file, and maybe some other hacking. Otherwise what's likely to happen is that the recovery goes through but leaves you with a corrupted database anyway. If the data is worth this much trouble to you, I'd suggest hiring a Postgres consultant who's experienced in data recovery. > Also, it is unclear from the documentation whether you must have > archived wal files to replay the wal files. There seems to be no way > to replay them other than with recovery.conf, and recovery.conf seems > to require a restore_command that actually does something. You can just have it copy from pg_xlog, if all the files you need are in pg_xlog. That's a pretty uncommon situation though, so there's not any special easy case for it. regards, tom lane
Tom You've helped clarify things. Thanks. Our situation is kind of a special case and I was trying to learn from it what can be done with the WAL files. We only started the production system a few days ago, so we have all the WAL files. I don't care of we'd end up with a corrupteddatabase because I wouldn't do a recovery to the original database but to a copy. The intention is to copy out thedata for three small tables. As I understand it, one should do a pg_start_backup, then do a file system backup, then do pg_stop_backup. Several questions: First, if the file system back is run automatically at 3AM, what's the best way to do the start/stop backup? Second, what about doing a pg_dump instead of a file system backup? Third, the manual in 22.3.2 says "It is also possible to make a backup dump while the postmaster is stopped. In this case,you obviously cannot use pg_start_backup or pg_stop_backup, and you will therefore be left to your own devices to keeptrack of which backup dump is which and how far back the associated WAL files go. It is generally better to follow theon-line backup procedure above." That implies that I can do what I'm trying to do, but it doesn't say how. You indicatethat it's messy. I hoped that meant that one can manually replay wal files, one by one, but I guess not. John On Thu, 08 Jul 2010 16:58:18 -0400, Tom Lane wrote: >"John T. Dow" <john@johntdow.com> writes: >> We did not do pg_start_backup/pg_stop_backup. > >Ugh. > >> What if we never did pg_start_backup? Will it work anyway? > >You could maybe make it work, if you had full_page_writes turned on >and have a continuous series of WAL files extending back to before >the manual filesystem backup was started. What pg_start_backup mainly >does for you is to automate things and make sure there is a well-defined >spot at which a successful replay can be started. It's *not* going to >"just work" without pg_start_backup, though. You'd need to manually >fake up a suitable backup label file, and maybe some other hacking. >Otherwise what's likely to happen is that the recovery goes through >but leaves you with a corrupted database anyway. > >If the data is worth this much trouble to you, I'd suggest hiring a >Postgres consultant who's experienced in data recovery. > >> Also, it is unclear from the documentation whether you must have >> archived wal files to replay the wal files. There seems to be no way >> to replay them other than with recovery.conf, and recovery.conf seems >> to require a restore_command that actually does something. > >You can just have it copy from pg_xlog, if all the files you need are >in pg_xlog. That's a pretty uncommon situation though, so there's not >any special easy case for it. > > regards, tom lane > >-- >Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-novice