Обсуждение: archiving question

Поиск
Список
Период
Сортировка

archiving question

От
"Zwettler Markus (OIZ)"
Дата:

When there is a Postgres archiver stuck because of filled pg_xlog and archive directories…

 

… and the pg_xlog directory had been filled with dozens of GBs of xlogs…

 

…it takes ages until the archive_command had moved all xlogs from the pg_xlog directory to the archive directory afterwards…

 

… and you get crazy if you have a 8GB archive directory while the pg_xlog directory had been pumped up to 100GB L

 

 

Any idea on this one?

 

 

 

 

Re: archiving question

От
Stephen Frost
Дата:
Greetings,

* Zwettler Markus (OIZ) (Markus.Zwettler@zuerich.ch) wrote:
> When there is a Postgres archiver stuck because of filled pg_xlog and archive directories...
>
> ... and the pg_xlog directory had been filled with dozens of GBs of xlogs...
>
> ...it takes ages until the archive_command had moved all xlogs from the pg_xlog directory to the archive directory
afterwards...
>
> ... and you get crazy if you have a 8GB archive directory while the pg_xlog directory had been pumped up to 100GB :(
>
>
> Any idea on this one?

Parallelizing the archive-push operation can be quite helpful to address
this.

Thanks,

Stephen

Вложения

AW: archiving question

От
"Zwettler Markus (OIZ)"
Дата:
>
> Greetings,
>
> * Zwettler Markus (OIZ) (Markus.Zwettler@zuerich.ch) wrote:
> > When there is a Postgres archiver stuck because of filled pg_xlog and archive
> directories...
> >
> > ... and the pg_xlog directory had been filled with dozens of GBs of xlogs...
> >
> > ...it takes ages until the archive_command had moved all xlogs from the
> pg_xlog directory to the archive directory afterwards...
> >
> > ... and you get crazy if you have a 8GB archive directory while the
> > pg_xlog directory had been pumped up to 100GB :(
> >
> >
> > Any idea on this one?
>
> Parallelizing the archive-push operation can be quite helpful to address this.
>
> Thanks,
>
> Stephen


What do you mean hear?

Afaik, Postgres runs the archive_command per log, means log by log by log.

How should we parallelize this?




Re: archiving question

От
Michael Paquier
Дата:
On Thu, Dec 05, 2019 at 03:04:55PM +0000, Zwettler Markus (OIZ) wrote:
> What do you mean hear?
>
> Afaik, Postgres runs the archive_command per log, means log by log by log.
>
> How should we parallelize this?

You can, in theory, skip the archiving for a couple of segments and
then do the operation at once without the need to patch Postgres.
--
Michael

Вложения

AW: archiving question

От
"Zwettler Markus (OIZ)"
Дата:
> -----Ursprüngliche Nachricht-----
> Von: Michael Paquier <michael@paquier.xyz>
> Gesendet: Freitag, 6. Dezember 2019 02:43
> An: Zwettler Markus (OIZ) <Markus.Zwettler@zuerich.ch>
> Cc: Stephen Frost <sfrost@snowman.net>; pgsql-general@lists.postgresql.org
> Betreff: Re: archiving question
>
> On Thu, Dec 05, 2019 at 03:04:55PM +0000, Zwettler Markus (OIZ) wrote:
> > What do you mean hear?
> >
> > Afaik, Postgres runs the archive_command per log, means log by log by log.
> >
> > How should we parallelize this?
>
> You can, in theory, skip the archiving for a couple of segments and then do the
> operation at once without the need to patch Postgres.
> --
> Michael


Sorry, I am still confused.

Do you mean I should move (mv * /backup_dir) the whole pg_xlog directory away and move it back (mv /backup_dir/*
/pg_xlog)in case of recovery? 

Markus








Re: archiving question

От
Magnus Hagander
Дата:
On Fri, Dec 6, 2019 at 10:50 AM Zwettler Markus (OIZ) <Markus.Zwettler@zuerich.ch> wrote:
> -----Ursprüngliche Nachricht-----
> Von: Michael Paquier <michael@paquier.xyz>
> Gesendet: Freitag, 6. Dezember 2019 02:43
> An: Zwettler Markus (OIZ) <Markus.Zwettler@zuerich.ch>
> Cc: Stephen Frost <sfrost@snowman.net>; pgsql-general@lists.postgresql.org
> Betreff: Re: archiving question
>
> On Thu, Dec 05, 2019 at 03:04:55PM +0000, Zwettler Markus (OIZ) wrote:
> > What do you mean hear?
> >
> > Afaik, Postgres runs the archive_command per log, means log by log by log.
> >
> > How should we parallelize this?
>
> You can, in theory, skip the archiving for a couple of segments and then do the
> operation at once without the need to patch Postgres.
> --
> Michael


Sorry, I am still confused.

Do you mean I should move (mv * /backup_dir) the whole pg_xlog directory away and move it back (mv /backup_dir/* /pg_xlog) in case of recovery?


No, *absolutely* not.

What you can do is have archive_command copy things one by one to a local directory (still sequentially), and then you can have a separate process that sends these to the archive -- and *this* process can be parallelized. 

//Magnus
 

AW: archiving question

От
"Zwettler Markus (OIZ)"
Дата:
> On Fri, Dec 6, 2019 at 10:50 AM Zwettler Markus (OIZ) <mailto:Markus.Zwettler@zuerich.ch> wrote:
>> -----Ursprüngliche Nachricht-----
>> Von: Michael Paquier <mailto:michael@paquier.xyz>
>> Gesendet: Freitag, 6. Dezember 2019 02:43
>> An: Zwettler Markus (OIZ) <mailto:Markus.Zwettler@zuerich.ch>
>> Cc: Stephen Frost <mailto:sfrost@snowman.net>; mailto:pgsql-general@lists.postgresql.org
>> Betreff: Re: archiving question
>> 
>> On Thu, Dec 05, 2019 at 03:04:55PM +0000, Zwettler Markus (OIZ) wrote:
>> > What do you mean hear?
>> >
>> > Afaik, Postgres runs the archive_command per log, means log by log by log.
>> >
>> > How should we parallelize this?
>> 
>> You can, in theory, skip the archiving for a couple of segments and then do the
>> operation at once without the need to patch Postgres.
>> --
>> Michael
>
>
>Sorry, I am still confused.
>
>Do you mean I should move (mv * /backup_dir) the whole pg_xlog directory away and move it back (mv /backup_dir/*
/pg_xlog)in case of recovery?
 
>
>No, *absolutely* not.
>
>What you can do is have archive_command copy things one by one to a local directory (still sequentially), and then you
canhave a separate process that sends these to the archive -- and *this* process can be parallelized. 
 
>
>//Magnus
 


That has been my initial question.

Is there a way to tune this sequential archive_command log by log copy in case I have tons of logs within the pg_xlog
directory?

Markus


Re: archiving question

От
Magnus Hagander
Дата:
On Fri, Dec 6, 2019 at 12:06 PM Zwettler Markus (OIZ) <Markus.Zwettler@zuerich.ch> wrote:
> On Fri, Dec 6, 2019 at 10:50 AM Zwettler Markus (OIZ) <mailto:Markus.Zwettler@zuerich.ch> wrote:
>> -----Ursprüngliche Nachricht-----
>> Von: Michael Paquier <mailto:michael@paquier.xyz>
>> Gesendet: Freitag, 6. Dezember 2019 02:43
>> An: Zwettler Markus (OIZ) <mailto:Markus.Zwettler@zuerich.ch>
>> Cc: Stephen Frost <mailto:sfrost@snowman.net>; mailto:pgsql-general@lists.postgresql.org
>> Betreff: Re: archiving question
>>
>> On Thu, Dec 05, 2019 at 03:04:55PM +0000, Zwettler Markus (OIZ) wrote:
>> > What do you mean hear?
>> >
>> > Afaik, Postgres runs the archive_command per log, means log by log by log.
>> >
>> > How should we parallelize this?
>>
>> You can, in theory, skip the archiving for a couple of segments and then do the
>> operation at once without the need to patch Postgres.
>> --
>> Michael
>
>
>Sorry, I am still confused.
>
>Do you mean I should move (mv * /backup_dir) the whole pg_xlog directory away and move it back (mv /backup_dir/* /pg_xlog) in case of recovery?
>
>No, *absolutely* not.
>
>What you can do is have archive_command copy things one by one to a local directory (still sequentially), and then you can have a separate process that sends these to the archive -- and *this* process can be parallelized. 
>
>//Magnus
 


That has been my initial question.

Is there a way to tune this sequential archive_command log by log copy in case I have tons of logs within the pg_xlog directory?

It will be called one by one, there is no changing that. What you *do* with that command is up to you, so you can certainly tune that. But as soon as your command has returned PostgreSQL wil lhave the "right" to remove the file if it thinks it's time. But you could for example have a daemon that opens a file handle to the file in response to your archive command thereby preventing it from actually being removed, and then archives them in private, in which case the archiving only has to wait for it to acknowledge the process has started, not finished. 

There's always a risk involved in returning from archive_command before the file is safely stored on a different machine/storage somewhere. The more async you make it the bigger that risk is, but it increases your ability to parallelize.

--