Обсуждение: Incomplete description of pg_start_backup?

Поиск
Список
Период
Сортировка

Incomplete description of pg_start_backup?

От
Dmitry Koterov
Дата:
<div dir="ltr">I think that at <a
href="http://www.postgresql.org/docs/current/static/functions-admin.html">http://www.postgresql.org/docs/current/static/functions-admin.html</a> and <a
href="http://www.postgresql.org/docs/current/static/continuous-archiving.html">http://www.postgresql.org/docs/current/static/continuous-archiving.html</a> two
importantpoints on how pg_start_backup() works are missing:<div style="style"><br /></div><div style="style">1. After
pg_start_backup()and till pg_stop_backup() VACUUM is denied (e.g. autovacuum is turned off), so the new data is always
appendedto data files, is never written at their middle. This allows to archive the data directory using any copying
tools(rsync, tar, cp etc.). If you forget to call pg_stop_backup() by accident, data files will grow forever. So
pg_start_backup()switches the database to "append-only mode" which is safe to backup without stopping (data files
temporarilybecome append-only, WAL logs are append-only always).</div><div style="style"><br /></div><div
style="style">2.After pg_start_backup() and till pg_stop_backup() full_page_writes is forced to be ON.</div><div
style="style"><br/></div><div style="style">BTW are these points fully correct? If yes, it would be great to update the
documentation,because in google there are a lot of questions on how exactly backup with pg_start_backup() works and why
cp,tar etc. are safe after pg_start_backup(), but no clear answers. If no, could you please give a couple of comments
onall these?</div></div> 

Re: Incomplete description of pg_start_backup?

От
Jeff Janes
Дата:
On Tue, Apr 30, 2013 at 3:24 PM, Dmitry Koterov <dmitry@koterov.ru> wrote:
I think that at http://www.postgresql.org/docs/current/static/functions-admin.html and http://www.postgresql.org/docs/current/static/continuous-archiving.html two important points on how pg_start_backup() works are missing:

1. After pg_start_backup() and till pg_stop_backup() VACUUM is denied (e.g. autovacuum is turned off), so the new data is always appended to data files, is never written at their middle.

This is not the case.  Autovacuum continues to run during the backup.

 
This allows to archive the data directory using any copying tools (rsync, tar, cp etc.). If you forget to call pg_stop_backup() by accident, data files will grow forever. So pg_start_backup() switches the database to "append-only mode" which is safe to backup without stopping (data files temporarily become append-only, WAL logs are append-only always).

No, it doesn't work that way.  I don't know why appending would be any safer than normal updates would be anyway.  WAL replay fixes up any problems that might arise.


2. After pg_start_backup() and till pg_stop_backup() full_page_writes is forced to be ON.

Effectively yes, this is documented in one of your links above (and is one of the reasons vacuuming during the backup is not a problem)

Cheers,

Jeff

Re: Incomplete description of pg_start_backup?

От
Dmitry Koterov
Дата:
Could you please provide a bit more detailed explanation on how it works? 

And how could postgres write at the middle of archiving files during an active pg_start_backup? if it could, here might be a case when a part of archived data file contains an overridden information "from the future", while wal files contain only information like "i want to write X to Z", not "i've overridden the following X with Y at the Z position". The appending is much better here, because unexpected appended data "from the future" may just be ignored.

On Wednesday, May 1, 2013, Jeff Janes wrote:
On Tue, Apr 30, 2013 at 3:24 PM, Dmitry Koterov <dmitry@koterov.ru> wrote:
I think that at http://www.postgresql.org/docs/current/static/functions-admin.html and http://www.postgresql.org/docs/current/static/continuous-archiving.html two important points on how pg_start_backup() works are missing:

1. After pg_start_backup() and till pg_stop_backup() VACUUM is denied (e.g. autovacuum is turned off), so the new data is always appended to data files, is never written at their middle.

This is not the case.  Autovacuum continues to run during the backup.

 
This allows to archive the data directory using any copying tools (rsync, tar, cp etc.). If you forget to call pg_stop_backup() by accident, data files will grow forever. So pg_start_backup() switches the database to "append-only mode" which is safe to backup without stopping (data files temporarily become append-only, WAL logs are append-only always).

No, it doesn't work that way.  I don't know why appending would be any safer than normal updates would be anyway.  WAL replay fixes up any problems that might arise.


2. After pg_start_backup() and till pg_stop_backup() full_page_writes is forced to be ON.

Effectively yes, this is documented in one of your links above (and is one of the reasons vacuuming during the backup is not a problem)

Cheers,

Jeff

Re: Incomplete description of pg_start_backup?

От
Jeff Janes
Дата:
On Mon, May 13, 2013 at 4:31 PM, Dmitry Koterov <dmitry@koterov.ru> wrote:
Could you please provide a bit more detailed explanation on how it works? 

And how could postgres write at the middle of archiving files during an active pg_start_backup? if it could, here might be a case when a part of archived data file contains an overridden information "from the future",

The data files cannot contain information from the future.  If the backup is restored, it must be restored to the time of pg_stop_backup (at least), which means the data would at that point be from the past/present, not the future.

Cheers,

Jeff

Re: Incomplete description of pg_start_backup?

От
Dmitry Koterov
Дата:
I don't get still.

Suppose we have a data file with blocks with important (non-empty) data:

A B C D

1. I call pg_start_backup().
2. Tar starts to copy A block to the destination archive...
3. During this copying, somebody removes data from a table which is situated in B block. So this data is a subject for vacuuming, and the block is marked as a free space.
4. Somebody writes data to a table, and this data is placed to a free space - to B block. This is also added to the WAL log (so the data is stored at 2 places: at B block and at WAL).
5. Tar (at last!) finishes copying of A block and begins to copy B block.
6. It finishes, then it copies C and D to the archive too.
7. Then we call pg_stop_backup() and also archive collected WAL (which contains the new data of B block as we saw above).

The question is - where is the OLD data of B block in this scheme? Seems it is NOT in the backup! So it cannot be restored. (And, in case when we never overwrite blocks between pg_start_backup...pg_stop_backup, but always append the new data, it is not a problem.) Seems to me this is not documented at all! That is what my initial e-mail about.

(I have one hypothesis on that, but I am not sure. Here is it: does vacuum saves ALL deleted data of B block to WAL on step 3 prior deletion? If yes, it is, of course, a part of the backup. But it wastes space a lot...)




On Tue, May 14, 2013 at 6:05 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, May 13, 2013 at 4:31 PM, Dmitry Koterov <dmitry@koterov.ru> wrote:
Could you please provide a bit more detailed explanation on how it works? 

And how could postgres write at the middle of archiving files during an active pg_start_backup? if it could, here might be a case when a part of archived data file contains an overridden information "from the future",

The data files cannot contain information from the future.  If the backup is restored, it must be restored to the time of pg_stop_backup (at least), which means the data would at that point be from the past/present, not the future.

Cheers,

Jeff

Re: Incomplete description of pg_start_backup?

От
Heikki Linnakangas
Дата:
On 24.05.2013 14:33, Dmitry Koterov wrote:
> I don't get still.
>
> Suppose we have a data file with blocks with important (non-empty) data:
>
> A B C D
>
> 1. I call pg_start_backup().
> 2. Tar starts to copy A block to the destination archive...
> 3. During this copying, somebody removes data from a table which is
> situated in B block. So this data is a subject for vacuuming, and the block
> is marked as a free space.
> 4. Somebody writes data to a table, and this data is placed to a free space
> - to B block. This is also added to the WAL log (so the data is stored at 2
> places: at B block and at WAL).
> 5. Tar (at last!) finishes copying of A block and begins to copy B block.
> 6. It finishes, then it copies C and D to the archive too.
> 7. Then we call pg_stop_backup() and also archive collected WAL (which
> contains the new data of B block as we saw above).
>
> The question is - *where is the OLD data of B block in this scheme?* Seems
> it is NOT in the backup!

Correct.

> So it cannot be restored.

Right, the old data on block B is gone. The backup is fine, the old data 
on block B is is not needed to recover the backup.

> (And, in case when we never overwrite blocks between
> pg_start_backup...pg_stop_backup, but always append the new data, it
> is not a problem.) Seems to me this is not documented at all! That is
> what my initial e-mail about.

When you restore the backup, the database is restored to the state it 
was when pg_stop_backup() was called. What did you expect?

> (I have one hypothesis on that, but I am not sure. Here is it: does vacuum
> saves ALL deleted data of B block to WAL on step 3 prior deletion? If yes,
> it is, of course, a part of the backup. But it wastes space a lot...)

Nope, it doesn't do that.

- Heikki