Обсуждение: Avoiding another needless ERROR during nbtree page deletion

Поиск
Список
Период
Сортировка

Avoiding another needless ERROR during nbtree page deletion

От
Peter Geoghegan
Дата:
Work from commit 5b861baa (later backpatched as commit 43e409ce)
taught nbtree to press on with vacuuming an index when page deletion
fails to "re-find" a downlink in the target page's parent (or in some
page to the right of the parent) due to index corruption.

To recap, avoiding ERRORs during vacuuming (even those caused by index
corruption) is useful because there is no reason to expect the error
to go away on its own; we're relying on the DBA to notice the error
and REINDEX before wraparound/xidStopLimit kicks in. This is at least
the case on versions before 14, where the failsafe can eventually
kick-in and avoid catastrophe (though the failsafe can only be
expected to avoid the worst consequences).

It has come to my attention that there is a remaining issue of the
same general nature in nbtree VACUUM's page deletion code. Though this
remaining issue seems significantly less likely to come up in
practice, there is no reason to take any chances here. Attached patch
fixes it.

Also attached is a bugfix for a minor issue in amcheck's
bt_index_parent_check() function, which I noticed in passing, while I
tested the first patch. We assumed that we'd always land on the
leftmost page on each level first (the leftmost according to internal
pages one level up). That assumption is faulty because page deletion
of the leftmost page is quite possible. Page deletion can be
interrupted, leaving a half-dead leaf page (possibly the leftmost leaf
page) without any downlink one level up, while still leaving a left
sibling link on the leaf level (in the leaf page that isn't about to
become the leftmost, but won't until the interrupted page deletion can
be completed).

IMV this should be backpatched all the way. The issue in question is
rather unlikely to come up. But the fix that I've come up with is very
well targeted. It seems just about impossible for it to affect any
user that didn't already have a serious problem (without the fix).

-- 
Peter Geoghegan

Вложения

Re: Avoiding another needless ERROR during nbtree page deletion

От
Heikki Linnakangas
Дата:
On 20/05/2023 05:17, Peter Geoghegan wrote:
> It has come to my attention that there is a remaining issue of the
> same general nature in nbtree VACUUM's page deletion code. Though this
> remaining issue seems significantly less likely to come up in
> practice, there is no reason to take any chances here.

Yeah, let's be consistent. Any idea what might cause this corruption?

> Attached patch fixes it.
> +    /*
> +     * Validate target's right sibling page's left link points back to target.
> +     *
> +     * This is known to fail in the field in the presence of index corruption,
> +     * so we go to the trouble of avoiding a hard ERROR.  This is fairly close
> +     * to what we did earlier on when we located the target's left sibling
> +     * (iff target has a left sibling).
> +     */

This comment notes that this is similar to what we did with the left 
sibling, but there isn't really any mention at the left sibling code 
about avoiding hard ERRORs. Feels a bit backwards. Maybe move the 
comment about avoiding the hard ERROR to where the left sibling is 
handled. Or explain it in the function comment and just have short 
"shouldn't happen, but avoid hard ERROR if the index is corrupt" comment 
here.

> Also attached is a bugfix for a minor issue in amcheck's
> bt_index_parent_check() function, which I noticed in passing, while I
> tested the first patch. We assumed that we'd always land on the
> leftmost page on each level first (the leftmost according to internal
> pages one level up). That assumption is faulty because page deletion
> of the leftmost page is quite possible. Page deletion can be
> interrupted, leaving a half-dead leaf page (possibly the leftmost leaf
> page) without any downlink one level up, while still leaving a left
> sibling link on the leaf level (in the leaf page that isn't about to
> become the leftmost, but won't until the interrupted page deletion can
> be completed).

You could check that the left sibling is indeed a half-dead page.

>     ereport(DEBUG1,
>         (errcode(ERRCODE_NO_DATA),
>          errmsg("block %u is not leftmost in index \"%s\"",
>         current, RelationGetRelationName(state->rel))));

ERRCODE_NO_DATA doesn't look right. Let's just leave out the errcode.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Avoiding another needless ERROR during nbtree page deletion

От
Peter Geoghegan
Дата:
On Sun, May 21, 2023 at 11:51 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Any idea what might cause this corruption?

Not really, no. As far as I know the specific case that was brought to
my attention (that put me on the path to writing this patch) was just
an isolated incident. The interesting detail (if any) is that it was a
relatively recent version of Postgres (13), and that there were no
other known problems. This means that there is a plausible remaining
gap in the defensive checks in nbtree VACUUM on recent versions -- we
might have expected to avoid a hard ERROR in some other way, from one
of the earlier checks, but that didn't happen on at least one
occasion.

You can find several references to the "right sibling's left-link
doesn't match:" error message by googling. Most of them are clearly
from the page split ERROR. But there are some from VACUUM, too:


https://stackoverflow.com/questions/49307292/error-in-postgresql-right-siblings-left-link-doesnt-match-block-5-links-to-8

Granted, that was from a 9.2 database -- before your 9.4 work that
made this whole area much more robust.

> This comment notes that this is similar to what we did with the left
> sibling, but there isn't really any mention at the left sibling code
> about avoiding hard ERRORs. Feels a bit backwards. Maybe move the
> comment about avoiding the hard ERROR to where the left sibling is
> handled. Or explain it in the function comment and just have short
> "shouldn't happen, but avoid hard ERROR if the index is corrupt" comment
> here.

Good point. Will do it that way.

> > Also attached is a bugfix for a minor issue in amcheck's
> > bt_index_parent_check() function, which I noticed in passing, while I
> > tested the first patch.

> You could check that the left sibling is indeed a half-dead page.

It's very hard to see, but...I think that we do. Sort of. Since
bt_recheck_sibling_links() is prepared to check that the left
sibling's right link points back to the target page.

One problem with that is that it only happens in the AccessShareLock
case, whereas we're concerned with fixing an issue in the ShareLock
case. Another problem is that it's awkward and complicated to explain.
It's not obvious that it's worth trying to explain all this and/or
making sure that it happens in the ShareLock case, so that we have
everything covered. I'm unsure.

> ERRCODE_NO_DATA doesn't look right. Let's just leave out the errcode.

Agreed.

--
Peter Geoghegan



Re: Avoiding another needless ERROR during nbtree page deletion

От
Peter Geoghegan
Дата:
On Mon, May 22, 2023 at 9:22 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > This comment notes that this is similar to what we did with the left
> > sibling, but there isn't really any mention at the left sibling code
> > about avoiding hard ERRORs. Feels a bit backwards. Maybe move the
> > comment about avoiding the hard ERROR to where the left sibling is
> > handled. Or explain it in the function comment and just have short
> > "shouldn't happen, but avoid hard ERROR if the index is corrupt" comment
> > here.
>
> Good point. Will do it that way.

Attached is v2, which does it that way. It also adjusts the approach
taken to release locks and pins when the left sibling validation check
fails. This makes it simpler and more consistent with surrounding
code. I might not include this change in the backpatch.

Not including a revised amcheck patch here, since I'm not exactly sure
what to do with your feedback on that one just yet.

--
Peter Geoghegan

Вложения

Re: Avoiding another needless ERROR during nbtree page deletion

От
Peter Geoghegan
Дата:
On Mon, May 22, 2023 at 10:59 AM Peter Geoghegan <pg@bowt.ie> wrote:
> Attached is v2, which does it that way. It also adjusts the approach
> taken to release locks and pins when the left sibling validation check
> fails.

I pushed this just now, backpatching all the way.

> Not including a revised amcheck patch here, since I'm not exactly sure
> what to do with your feedback on that one just yet.

I'd like to go with a minimal approach in my patch to address the
remaining issue in amcheck. Something similar to the patch that was
posted as part of v1. While it seems important to address the issue,
making sure that we have coverage of the leftmost page really being
half-dead (as opposed to something that would constitute corruption)
seems much less important. Ideally we'd have exhaustive coverage, but
it's not a priority for me right now.

--
Peter Geoghegan



Re: Avoiding another needless ERROR during nbtree page deletion

От
Greg Stark
Дата:


On Mon, May 22, 2023, 12:31 Peter Geoghegan <pg@bowt.ie> wrote:
On Sun, May 21, 2023 at 11:51 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Any idea what might cause this corruption?

Not really, no. As far as I know the specific case that was brought to
my attention (that put me on the path to writing this patch) was just
an isolated incident. The interesting detail (if any) is that it was a
relatively recent version of Postgres (13), and that there were no
other known problems. This means that there is a plausible remaining
gap in the defensive checks in nbtree VACUUM on recent versions -- we
might have expected to avoid a hard ERROR in some other way, from one
of the earlier checks, but that didn't happen on at least one
occasion.

What error would one expect to see? I did have a case where vacuum was erroring on a btree in $previous_job. 

Re: Avoiding another needless ERROR during nbtree page deletion

От
Peter Geoghegan
Дата:
On Thu, Jun 1, 2023 at 6:47 AM Greg Stark <stark@mit.edu> wrote:
> What error would one expect to see? I did have a case where vacuum was erroring on a btree in $previous_job.

You mean in general? It's usually this one:

https://gitlab.com/gitlab-org/gitlab/-/issues/381443

In the case of this particular issue, the error is "right sibling's
left-link doesn't match". Per:


https://stackoverflow.com/questions/49307292/error-in-postgresql-right-siblings-left-link-doesnt-match-block-5-links-to-8

--
Peter Geoghegan