Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Дата
Msg-id CAAKRu_awHB439Eajub_-bfhM48hLDUKaG9W-FQtcybpmvvT41g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae  (Bowen Shi <zxwsbg12138@gmail.com>)
Список pgsql-bugs
On Tue, May 28, 2024 at 5:03 AM Bowen Shi <zxwsbg12138@gmail.com> wrote:
>
> On Thu, May 23, 2024 at 12:57 AM Melanie Plageman <melanieplageman@gmail.com> wrote:
>>
>> One option is to add the logic in fix_hang_15.patch to master as well
>> (always remove tuples older than OldestXmin). This addresses your
>> concern about gaining confidence in a single solution.
>>
>> However, I can see how removing more tuples could be concerning. In
>> the case that the horizon moves backwards because of a standby
>> reconnecting, I think the worst case is that removing that tuple
>> causes a recovery conflict on the standby (depending on the value of
>> max_standby_streaming_delay et al).
>
>
> I'm confused about this part. The comment above OldestXmin says:
> /*
>  * OldestXmin is the Xid below which tuples deleted by any xact (that
>  * committed) should be considered DEAD, not just RECENTLY_DEAD.
>  */
> With the fix_hang_15 patch, why is there a risk here when we use Oldestxmin to judge whether a tuple could be moved?

The risk is not so much about whether or not it is okay to remove the
tuple. At least for the case of a lagging standby reconnecting during
vacuum and moving maybe_needed backwards, there is no real danger in
removing tuples considered non-removable when compared to the new
value of maybe_needed. The standby will not replay the tuple removal
until the tuple is removable. If the horizon doesn't move forward on
the standby after some time, a recovery conflict will occur --
canceling the culprit holding back the horizon and allowing the vacuum
to proceed.

I have not, however, investigated the second way maybe_needed may go
backwards cited by Matthias in [1] to see if it is similarly okay to
remove tuples older than OldestXmin but newer than maybe_needed in
this case.

I think Noah's concern about using OldestXmin is more that it is a
different behavior than what we have on master. Picking one solution
and standardizing on it would reduce maintenance complexity across all
branches.

You make a good point about the comment though. If OldestXmin is only
used during vacuum for freezing tuples and determining full page
visibility status, then we should update the comment above it in
VacuumCutoffs and remove mention of HEAPTUPLE_DEAD.

I'll propose both this comment update and the one Noah suggested to
heap_vacuum_rel() against master in a separate thread. Then we can
backport the comments (as relevant) when we fix the back branches.

> I think the keypoint is: OldestXmin and VisTest, which one is more accurate when we judge to remove the tuple.

GlobalVisState will have a potentially more up-to-date horizon value
than OldestXmin when determining whether or not to remove the tuple.
But I wouldn't call it incorrect to remove a tuple older than
OldestXmin but younger than maybe_needed.

- Melanie

[1]
https://www.postgresql.org/message-id/CAEze2WjMTh4KS0%3DQEQB-Jq%2BtDLPR%2B0%2BzVBMfVwSPK5A%3DWZa95Q%40mail.gmail.com



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Jeff Davis
Дата:
Сообщение: Re: JIT crash introduced by 6185c9737c with LLVM 14
Следующее
От: Melanie Plageman
Дата:
Сообщение: Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae