Обсуждение: Postgres 9.2 PITR testing to before DROP DATABASE ends up removing file system files

Поиск
Список
Период
Сортировка

Postgres 9.2 PITR testing to before DROP DATABASE ends up removing file system files

От
Joshua Boyd
Дата:
I am testing out point in time recovery from a hot physical backup in a disaster recovery situation - I turned on archiving of files, created a hot physical backup, then (after letting it run for a few days) issued a "DROP DATABASE".  The pg_log file shows the DROP DATABASE command was issued at '2014-11-28 10:20:00.010 PST'.  I shut down the server, moved the pgdata directory to pgdata_backup ... restored the files in the hot physical backup I made, copied the wal archive files from pgdata_backup to the (new) pgdata archive, cleared out the new pg_xlog dir and copied the files from the old pg_xlog into the new..  Set up a recovery.conf file as such:

restore_command = 'gunzip -c /home/pg2dev/joshtest/pgdata/archive/%f.gz > %p'
recovery_target_time = '2014-11-28 10:20:00.010 PST'
recovery_target_inclusive = false

then I started the server up.  the pg_log shows the following:

2014-11-28 14:22:55.059 PST     LOG:  database system was interrupted; last known up at 2014-11-24 11:34:14 PST
2014-11-28 14:22:55.060 PST     LOG:  starting point-in-time recovery to 2014-11-28 10:20:00.01-08
2014-11-28 14:22:55.239 PST     LOG:  restored log file "0000000100000000000000A0" from archive
2014-11-28 14:22:55.243 PST     LOG:  redo starts at 0/A0000080
2014-11-28 14:22:55.244 PST     LOG:  consistent recovery state reached at 0/A1000000
2014-11-28 14:22:55.412 PST     LOG:  restored log file "0000000100000000000000A1" from archive
2014-11-28 14:22:55.674 PST     LOG:  restored log file "0000000100000000000000A2" from archive
2014-11-28 14:22:55.777 PST     LOG:  recovery stopping before commit of transaction 235078, time 2014-11-28 10:20:00.179303-08
2014-11-28 14:22:55.777 PST     LOG:  redo done at 0/A2F0F4B0
2014-11-28 14:22:55.777 PST     LOG:  last completed transaction was at log time 2014-11-28 09:47:07.132608-08

Which looks fine and dandy.  I connect to database with psql and list the databases with \l ... the database that WAS dropped is listed, which looks fine and dandy.

I try to connect to the database and it gives:

psql: FATAL:  database "jasperserver_restore" does not exist
DETAIL:  The database subdirectory "base/907110" is missing.

And then I look in pgdata/base .. and sure enough, that directory is missing.  I examine my hot physical backup file and that directory exists within it.

So .... even though the recovery SAYS "recovery stopping before commit of transaction 235078" ... it doesn't appear that it's 100% accurate.  It didn't commit the transaction, clearly, because the database is still listed in the data dictionary ... however, the filesystem files are gone.  Please - am I doing something wrong, or would this be considered a bug?

--
Joshua Boyd

Re: Postgres 9.2 PITR testing to before DROP DATABASE ends up removing file system files

От
Adrian Klaver
Дата:
On 11/28/2014 02:29 PM, Joshua Boyd wrote:
> I am testing out point in time recovery from a hot physical backup in a
> disaster recovery situation - I turned on archiving of files, created a
> hot physical backup,

How did you take the backup?

Archiving how and to where?

then (after letting it run for a few days) issued a
> "DROP DATABASE".  The pg_log file shows the DROP DATABASE command was
> issued at '2014-11-28 10:20:00.010 PST'.  I shut down the server, moved
> the pgdata directory to pgdata_backup ... restored the files in the hot
> physical backup I made, copied the wal archive files from pgdata_backup
> to the (new) pgdata archive,

The above I do not understand.
You where archiving the WALs in your pgdata directory?

Restored the backup how?

  cleared out the new pg_xlog dir and copied
> the files from the old pg_xlog into the new..  Set up a recovery.conf

All the files or only the unarchived ones?

> file as such:
>
> restore_command = 'gunzip -c /home/pg2dev/joshtest/pgdata/archive/%f.gz
>  > %p'
> recovery_target_time = '2014-11-28 10:20:00.010 PST'
> recovery_target_inclusive = false
>
> then I started the server up.  the pg_log shows the following:

>
> And then I look in pgdata/base .. and sure enough, that directory is
> missing.  I examine my hot physical backup file and that directory
> exists within it.
>
> So .... even though the recovery SAYS "recovery stopping before commit
> of transaction 235078" ... it doesn't appear that it's 100% accurate.
> It didn't commit the transaction, clearly, because the database is still
> listed in the data dictionary ... however, the filesystem files are
> gone.  Please - am I doing something wrong, or would this be considered
> a bug?
>
> --
> Joshua Boyd


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Postgres 9.2 PITR testing to before DROP DATABASE ends up removing file system files

От
Joshua Boyd
Дата:
Having continued my research, the problem I encountered is the exact same that's been recorded here: 

https://www.marshut.net/kstxxk/pitr-failing-to-stop-before-drop-database.html

I took the backup by following the procedures as laid forth in the continuous archiving document (http://www.postgresql.org/docs/9.2/static/continuous-archiving.html)

I configured the archiving to gzip the wal files to pgdata/archive
Then I started the backup process by issuing "select pg_start_backup('mylabel')"
next tarball'd the contents of pgdata, excluding the pgdata/archive dir and the pgdata/pg_xlog dir (although preserving the directory structure)
after that I issued "select pg_stop_backup()"
then I added the contents of pgdata/archive and pgdata/pg_xlog to the tarball above
then I gzipped the tarball.

The above is how I archived and backed up..

To restore the backup, I shut down the server, moved pgdata to pgdata_backup, untarballed the backup tarball, removed all the files in the new pgdata/pg_xlog dir, copied the files from pgdata_backup/archive and pgdata_backup/pg_xlog into the new pgdata dir, set up the recovery.conf file giving it a timestamp gathered from the pgdata_backup/pg_log/<> log files..  I copied ALL the pg_xlog files ... not simply the "unarchived ones". All of the unarchived ones should have been removed when I removed the contents of the pg_xlog dir after restoring the tarball..

I think I answered all the questions - please let me know if I missed some.  Based on the url I pasted at the top, though, it appears I'm not the only one who's encountered this problem.


On Tue, Dec 2, 2014 at 3:39 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 11/28/2014 02:29 PM, Joshua Boyd wrote:
I am testing out point in time recovery from a hot physical backup in a
disaster recovery situation - I turned on archiving of files, created a
hot physical backup,

How did you take the backup?

Archiving how and to where?

then (after letting it run for a few days) issued a
"DROP DATABASE".  The pg_log file shows the DROP DATABASE command was
issued at '2014-11-28 10:20:00.010 PST'.  I shut down the server, moved
the pgdata directory to pgdata_backup ... restored the files in the hot
physical backup I made, copied the wal archive files from pgdata_backup
to the (new) pgdata archive,

The above I do not understand.
You where archiving the WALs in your pgdata directory?

Restored the backup how?

 cleared out the new pg_xlog dir and copied
the files from the old pg_xlog into the new..  Set up a recovery.conf

All the files or only the unarchived ones?

file as such:

restore_command = 'gunzip -c /home/pg2dev/joshtest/pgdata/archive/%f.gz
 > %p'
recovery_target_time = '2014-11-28 10:20:00.010 PST'
recovery_target_inclusive = false

then I started the server up.  the pg_log shows the following:


And then I look in pgdata/base .. and sure enough, that directory is
missing.  I examine my hot physical backup file and that directory
exists within it.

So .... even though the recovery SAYS "recovery stopping before commit
of transaction 235078" ... it doesn't appear that it's 100% accurate.
It didn't commit the transaction, clearly, because the database is still
listed in the data dictionary ... however, the filesystem files are
gone.  Please - am I doing something wrong, or would this be considered
a bug?

--
Joshua Boyd


--
Adrian Klaver
adrian.klaver@aklaver.com



--
Joshua Boyd

Re: Postgres 9.2 PITR testing to before DROP DATABASE ends up removing file system files

От
Adrian Klaver
Дата:
On 12/02/2014 03:50 PM, Joshua Boyd wrote:
> Having continued my research, the problem I encountered is the exact
> same that's been recorded here:
>
> https://www.marshut.net/kstxxk/pitr-failing-to-stop-before-drop-database.html
>
>
>
> I took the backup by following the procedures as laid forth in the
> continuous archiving document
> (http://www.postgresql.org/docs/9.2/static/continuous-archiving.html)
>
> I configured the archiving to gzip the wal files to pgdata/archive
> Then I started the backup process by issuing "select
> pg_start_backup('mylabel')"
> next tarball'd the contents of pgdata, excluding the pgdata/archive dir
> and the pgdata/pg_xlog dir (although preserving the directory structure)
> after that I issued "select pg_stop_backup()"
> then I added the contents of pgdata/archive and pgdata/pg_xlog to the
> tarball above
> then I gzipped the tarball.

The above is not necessary, you are going to empty the pg_xlog directory
later anyway and you want the archive contents from the later date when
you do below.

>
> The above is how I archived and backed up..
>
> To restore the backup, I shut down the server, moved pgdata to
> pgdata_backup, untarballed the backup tarball, removed all the files in
> the new pgdata/pg_xlog dir,

See, you empty the pg_xlog dir you saved.

  copied the files from pgdata_backup/archive
> and pgdata_backup/pg_xlog into the new pgdata dir, set up the
> recovery.conf file giving it a timestamp gathered from the
> pgdata_backup/pg_log/<> log files..  I copied ALL the pg_xlog files ...
> not simply the "unarchived ones".


All of the unarchived ones should have
> been removed when I removed the contents of the pg_xlog dir after
> restoring the tarball..

Well those where the WALs from the time of the backup and should already
be archived, assuming sufficient time/activity between backup and recovery.

>
> I think I answered all the questions - please let me know if I missed
> some.  Based on the url I pasted at the top, though, it appears I'm not
> the only one who's encountered this problem.

Yes from the answers it would seem the solution is to push your
recovery_target_time further into the future. If I am following the
discussion DROP DATABASE removes the files from the file system first
and then clears the system catalog. So by ending the recovery at the
moment of DROP DATABASE you are getting to the point that it drops the
files but before it removes the entry. Try letting the recovery run a
little longer to see if that works. Also seems there is a patch in the
works to correct this:)

>
>

>
>         --
>         Joshua Boyd
>
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>
>
>
>
> --
> Joshua Boyd


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Postgres 9.2 PITR testing to before DROP DATABASE ends up removing file system files

От
Adrian Klaver
Дата:
On 12/02/2014 03:50 PM, Joshua Boyd wrote:
> Having continued my research, the problem I encountered is the exact
> same that's been recorded here:
>
> https://www.marshut.net/kstxxk/pitr-failing-to-stop-before-drop-database.html
>

>
> I think I answered all the questions - please let me know if I missed
> some.  Based on the url I pasted at the top, though, it appears I'm not
> the only one who's encountered this problem.
>

Re-read the initial post and realized you wanted the state of the
recovered cluster to include the database that was dropped. In that case
I would say stop the recovery just before the DROP DATABASE.

>
> --
> Joshua Boyd


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Postgres 9.2 PITR testing to before DROP DATABASE ends up removing file system files

От
Adrian Klaver
Дата:
On 12/03/2014 03:52 PM, Joshua Boyd wrote:
> I tried that when I was testing .. if I stopped at the most recent
> insert/update/delete previous to the drop database (with telling it to
> include the change) it DIDN'T include the change (I assume because the
> commit timestamp was slightly after the transaction timestamp) .. and if
> I told it to stop a little later than that, it removed the files
> (because it got to the drop database statement and stopped before the
> commit, but still deleted the files). For some reason it ignored SELECT
> statements - I assume those are not actually written into the wal files,
> and that would be why.  But .. that's only a guess.  I'm not that
> educated with regard to the inner workings of Postgres..  :)
>
> For the meantime, until the patch is released, the method I have
> wrangled to "get around the issue" is to actually restore twice ... the
> first time using a timestamp and I record the xid reported in the pg_log
> that it stopped before commit of..  Then re-restore by using the xid.
> That works, keeps the most recent insert/update/delete, and DOESN'T
> delete files..  Kinda a pain, but it works.

Yea, DROP DATABASE is not transactional(or maybe semi-transactional) and
lives in a gray area.

>
> Anyway .. thanks for the assistance.  :)
>



--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Postgres 9.2 PITR testing to before DROP DATABASE ends up removing file system files

От
Joshua Boyd
Дата:
I tried that when I was testing .. if I stopped at the most recent insert/update/delete previous to the drop database (with telling it to include the change) it DIDN'T include the change (I assume because the commit timestamp was slightly after the transaction timestamp) .. and if I told it to stop a little later than that, it removed the files (because it got to the drop database statement and stopped before the commit, but still deleted the files). For some reason it ignored SELECT statements - I assume those are not actually written into the wal files, and that would be why.  But .. that's only a guess.  I'm not that educated with regard to the inner workings of Postgres..  :)

For the meantime, until the patch is released, the method I have wrangled to "get around the issue" is to actually restore twice ... the first time using a timestamp and I record the xid reported in the pg_log that it stopped before commit of..  Then re-restore by using the xid. That works, keeps the most recent insert/update/delete, and DOESN'T delete files..  Kinda a pain, but it works.

Anyway .. thanks for the assistance.  :)

On Wed, Dec 3, 2014 at 3:37 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 12/02/2014 03:50 PM, Joshua Boyd wrote:
Having continued my research, the problem I encountered is the exact
same that's been recorded here:

https://www.marshut.net/kstxxk/pitr-failing-to-stop-before-drop-database.html



I think I answered all the questions - please let me know if I missed
some.  Based on the url I pasted at the top, though, it appears I'm not
the only one who's encountered this problem.


Re-read the initial post and realized you wanted the state of the recovered cluster to include the database that was dropped. In that case I would say stop the recovery just before the DROP DATABASE.


--
Joshua Boyd


--
Adrian Klaver
adrian.klaver@aklaver.com



--
Joshua Boyd