Re: trying again to get incremental backup

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: trying again to get incremental backup
Дата
Msg-id CA+TgmoYUhrgcNin=qrY+J+S-f6kctf9DaUKoaD-e3cww_ox9vg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: trying again to get incremental backup  (Jakub Wartak <jakub.wartak@enterprisedb.com>)
Ответы Re: trying again to get incremental backup  (Jakub Wartak <jakub.wartak@enterprisedb.com>)
Re: trying again to get incremental backup  (Peter Eisentraut <peter@eisentraut.org>)
Re: trying again to get incremental backup  (Peter Eisentraut <peter@eisentraut.org>)
Re: trying again to get incremental backup  (Peter Eisentraut <peter@eisentraut.org>)
Список pgsql-hackers
On Fri, Dec 8, 2023 at 5:02 AM Jakub Wartak
<jakub.wartak@enterprisedb.com> wrote:
> While we are at it, maybe around the below in PrepareForIncrementalBackup()
>
>                 if (tlep[i] == NULL)
>                         ereport(ERROR,
>
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>                                          errmsg("timeline %u found in
> manifest, but not in this server's history",
>                                                         range->tli)));
>
> we could add
>
>     errhint("You might need to start a new full backup instead of
> incremental one")
>
> ?

I can't exactly say that such a hint would be inaccurate, but I think
the impulse to add it here is misguided. One of my design goals for
this system is to make it so that you never have to take a new
incremental backup "just because," not even in case of an intervening
timeline switch. So, all of the errors in this function are warning
you that you've done something that you really should not have done.
In this particular case, you've either (1) manually removed the
timeline history file, and not just any timeline history file but the
one for a timeline for a backup that you still intend to use as the
basis for taking an incremental backup or (2) tried to use a full
backup taken from one server as the basis for an incremental backup on
a completely different server that happens to share the same system
identifier, e.g. because you promoted two standbys derived from the
same original primary and then tried to use a full backup taken on one
as the basis for an incremental backup taken on the other.

The scenario I was really concerned about when I wrote this test was
(2), because that could lead to a corrupt restore. This test isn't
strong enough to prevent that completely, because two unrelated
standbys can branch onto the same new timelines at the same LSNs, and
then these checks can't tell that something bad has happened. However,
they can detect a useful subset of problem cases. And the solution is
not so much "take a new full backup" as "keep straight which server is
which." Likewise, in case (1), the relevant hint would be "don't
manually remove timeline history files, and if you must, then at least
don't nuke timelines that you actually still care about."

> > I have a fix for this locally, but I'm going to hold off on publishing
> > a new version until either there's a few more things I can address all
> > at once, or until Thomas commits the ubsan fix.
> >
>
> Great, I cannot get it to fail again today, it had to be some dirty
> state of the testing env. BTW: Thomas has pushed that ubsan fix.

Huzzah, the cfbot likes the patch set now. Here's a new version with
the promised fix for your non-reproducible issue. Let's see whether
you and cfbot still like this version.

--
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
Следующее
От: Alexander Korotkov
Дата:
Сообщение: Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)