Re: Avoiding another needless ERROR during nbtree page deletion

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Avoiding another needless ERROR during nbtree page deletion
Дата
Msg-id CAH2-Wzn5PjqCT5OyBUDE_zyhqxvDiRmh5F_1QhogfXL9Zf=F4g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Avoiding another needless ERROR during nbtree page deletion  (Heikki Linnakangas <hlinnaka@iki.fi>)
Ответы Re: Avoiding another needless ERROR during nbtree page deletion  (Peter Geoghegan <pg@bowt.ie>)
Re: Avoiding another needless ERROR during nbtree page deletion  (Greg Stark <stark@mit.edu>)
Список pgsql-hackers
On Sun, May 21, 2023 at 11:51 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Any idea what might cause this corruption?

Not really, no. As far as I know the specific case that was brought to
my attention (that put me on the path to writing this patch) was just
an isolated incident. The interesting detail (if any) is that it was a
relatively recent version of Postgres (13), and that there were no
other known problems. This means that there is a plausible remaining
gap in the defensive checks in nbtree VACUUM on recent versions -- we
might have expected to avoid a hard ERROR in some other way, from one
of the earlier checks, but that didn't happen on at least one
occasion.

You can find several references to the "right sibling's left-link
doesn't match:" error message by googling. Most of them are clearly
from the page split ERROR. But there are some from VACUUM, too:


https://stackoverflow.com/questions/49307292/error-in-postgresql-right-siblings-left-link-doesnt-match-block-5-links-to-8

Granted, that was from a 9.2 database -- before your 9.4 work that
made this whole area much more robust.

> This comment notes that this is similar to what we did with the left
> sibling, but there isn't really any mention at the left sibling code
> about avoiding hard ERRORs. Feels a bit backwards. Maybe move the
> comment about avoiding the hard ERROR to where the left sibling is
> handled. Or explain it in the function comment and just have short
> "shouldn't happen, but avoid hard ERROR if the index is corrupt" comment
> here.

Good point. Will do it that way.

> > Also attached is a bugfix for a minor issue in amcheck's
> > bt_index_parent_check() function, which I noticed in passing, while I
> > tested the first patch.

> You could check that the left sibling is indeed a half-dead page.

It's very hard to see, but...I think that we do. Sort of. Since
bt_recheck_sibling_links() is prepared to check that the left
sibling's right link points back to the target page.

One problem with that is that it only happens in the AccessShareLock
case, whereas we're concerned with fixing an issue in the ShareLock
case. Another problem is that it's awkward and complicated to explain.
It's not obvious that it's worth trying to explain all this and/or
making sure that it happens in the ShareLock case, so that we have
everything covered. I'm unsure.

> ERRCODE_NO_DATA doesn't look right. Let's just leave out the errcode.

Agreed.

--
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nikita Malakhov
Дата:
Сообщение: Re: RFI: Extending the TOAST Pointer
Следующее
От: Kirk Wolak
Дата:
Сообщение: Re: Should CSV parsing be stricter about mid-field quotes?