Обсуждение: pg_xlog and standby
hello everybody: im trying to reconfigure a warm-standby server. the problem is that for some reason, one day the standby server stopped recovering the archives. this leaded to a full disk on that server, so i turned off (commented) the archive_command on the main server. i want to restart the procedure described in http://www.postgresql.org/docs/8.1/interactive/backup-online.html#BACKUP-PITR-RECOVERY but i dont know how to "safely clean" the main server $DATA/pg_xlog/ dir. with "safely clean" i mean how do i know which archives can i delete (or move somewhere) without disrupting the normal operation of the server. im using postgres 8.2.5 from source on debian etch. thanks in advance! -- Roberto Scattini ___ _ ))_) __ )L __ ((__)(('(( ((_)
On Jan 23, 2008, at 9:28 AM, Roberto Scattini wrote: > hello everybody: > > im trying to reconfigure a warm-standby server. the problem is that > for some reason, one day the standby server stopped recovering the > archives. this leaded to a full disk on that server, so i turned off > (commented) the archive_command on the main server. > i want to restart the procedure described in > http://www.postgresql.org/docs/8.1/interactive/backup- > online.html#BACKUP-PITR-RECOVERY > but i dont know how to "safely clean" the main server $DATA/pg_xlog/ > dir. > with "safely clean" i mean how do i know which archives can i delete > (or move somewhere) without disrupting the normal operation of the > server. > > im using postgres 8.2.5 from source on debian etch. > > thanks in advance! You don't. The main server should not be keeping archived WAL files directly in pg_xlog/. As it queues WAL files to be archived it puts them in pg_xlog/archive_status/ with file names suffixed with .ready, once they are archived that suffix changes to .done after which, at some point (I'm not sure how long/many) they are removed. Now, if you took your standby server offline, but didn't disable your archive_command then you've basically been accumulating WALs with the .ready prefix in the archive_status directory that, if you're going to start from scratch with your standby, you can safely delete. Just make sure you have a couple of WAL files successfully archived (suffix has changed to .done in the archive_status dir and you've verified that they've reached whatever directory your standby expects them to be in) before call pg_start_backup() and starting your new base backup. IMO, the most important point to be had here is DO NOT delete WALs that sit directly under pg_xlog/. Mistakes with the rest can be worked with, you could run into serious problems with your primary when deleting WALs directly under pg_xlog/. Also, do you know why your standby stopped recovering? I'd say you should make sure you know why and how, otherwise you run the risk of the same thing happening again. Erik Jones DBA | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
On Jan 23, 2008 2:28 PM, Erik Jones <erik@myemma.com> wrote: > > You don't. The main server should not be keeping archived WAL files > directly in pg_xlog/. As it queues WAL files to be archived it puts > them in pg_xlog/archive_status/ with file names suffixed with .ready, > once they are archived that suffix changes to .done after which, at > some point (I'm not sure how long/many) they are removed. > mmmmmmmm, ok. the problem that im having is that i have A LOT of archive files on pg_xlog dir, and thats because the archive_command keeps failing (the standby server had filled his disk with archives received but not proccesed), so now, i dont know how i can remove those files and start again... > Now, if you took your standby server offline, but didn't disable your > archive_command then you've basically been accumulating WALs with > the .ready prefix in the archive_status directory that, if you're > going to start from scratch with your standby, you can safely > delete. Just make sure you have a couple of WAL files successfully > archived (suffix has changed to .done in the archive_status dir and > you've verified that they've reached whatever directory your standby > expects them to be in) before call pg_start_backup() and starting > your new base backup. > > IMO, the most important point to be had here is DO NOT delete WALs > that sit directly under pg_xlog/. Mistakes with the rest can be > worked with, you could run into serious problems with your primary > when deleting WALs directly under pg_xlog/. > yeah, i agree. but now i have aprox 40GB of archive files in pg_xlog dir in the production server. :S > Also, do you know why your standby stopped recovering? I'd say you > should make sure you know why and how, otherwise you run the risk of > the same thing happening again. i dont know exactly, but it is very possible that it could be an unfinished server re-config. > > Erik Jones thanks for your help! -- Roberto Scattini ___ _ ))_) __ )L __ ((__)(('(( ((_)
On Jan 23, 2008, at 2:18 PM, Roberto Scattini wrote: > On Jan 23, 2008 2:28 PM, Erik Jones <erik@myemma.com> wrote: >> >> You don't. The main server should not be keeping archived WAL files >> directly in pg_xlog/. As it queues WAL files to be archived it puts >> them in pg_xlog/archive_status/ with file names suffixed with .ready, >> once they are archived that suffix changes to .done after which, at >> some point (I'm not sure how long/many) they are removed. >> > > mmmmmmmm, ok. the problem that im having is that i have A LOT of > archive files on pg_xlog dir, and thats because the archive_command > keeps failing (the standby server had filled his disk with archives > received but not proccesed), so now, i dont know how i can remove > those files and start again... > >> Now, if you took your standby server offline, but didn't disable your >> archive_command then you've basically been accumulating WALs with >> the .ready prefix in the archive_status directory that, if you're >> going to start from scratch with your standby, you can safely >> delete. Just make sure you have a couple of WAL files successfully >> archived (suffix has changed to .done in the archive_status dir and >> you've verified that they've reached whatever directory your standby >> expects them to be in) before call pg_start_backup() and starting >> your new base backup. >> >> IMO, the most important point to be had here is DO NOT delete WALs >> that sit directly under pg_xlog/. Mistakes with the rest can be >> worked with, you could run into serious problems with your primary >> when deleting WALs directly under pg_xlog/. >> > > yeah, i agree. but now i have aprox 40GB of archive files in pg_xlog > dir in the production server. :S Watch your directory terminology. The WALs that have backed up should be in $PGDATA/pg_xlog/archive_status/ not $PGDATA/pg_xlog/. Since you are going to start from scratch with you're standby you're free to delete all of the WAL files in $PGDATA/pg_xlog/ archive_status/ but leave any files directly under $PGDATA/pg_xlog alone. Erik Jones DBA | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
On Wed, 2008-01-23 at 18:18 -0200, Roberto Scattini wrote: > the standby server had filled his disk with archives > received but not proccesed Sounds like your standby has fallen badly behind. You should always monitor the lag between primary and standby. You will need to take steps to ensure the lag is reduced, or you will continue to have problems with this technique. All asynchronous replication systems have a potential for falling behind the master. Fully synchronous replication techniques don't: they force the master to slow down to a manageable pace. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
On Wed, 23 Jan 2008, Roberto Scattini wrote: > the problem that im having is that i have A LOT of > archive files on pg_xlog dir, and thats because the archive_command > keeps failing (the standby server had filled his disk with archives > received but not proccesed), so now, i dont know how i can remove > those files and start again... Under normal operation the checkpoint process will look at the number of already created archive files, keep around up to (2*checkpoint_segments+1) of them for future use, and delete the rest of them. You never delete them yourself, the server will take care of that automatically once it gets to where it makes that decision. If you set checkpoint_segments to some very high number they can end up taking many GB worth of storage, increasing that parameter has at least two costs associated with it (the other being a longer recovery time). Managing old archive logs on the backup server is your problem and related tools like pg_standby help deal with that. Managing them on the primary server is that server's problem and you shouldn't touch them. You can execute a manual CHECKPOINT at the psql prompt if you want to force this reclaimation to happen (there has to have been some activity since the last checkpoint for this to work which doesn't sound like a problem on your server). -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD