Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Дата
Msg-id 20240416193402.9a.nmisch@google.com
обсуждение исходный текст
Ответ на Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae  (Andres Freund <andres@anarazel.de>)
Список pgsql-bugs
On Tue, Apr 16, 2024 at 11:01:08AM -0700, Andres Freund wrote:
> On 2024-04-15 20:58:25 -0700, Noah Misch wrote:
> > On Mon, Apr 15, 2024 at 02:10:20PM -0700, Andres Freund wrote:
> > > On 2024-04-15 13:52:04 -0700, Noah Misch wrote:
> > > > I have observed the infinite loop in production with v15.5, so that
> > > > non-reproduce outcome is a limitation in the test procedure.  (v14.2 added
> > > > those two commits.)
> > >
> > > How closely have you analyzed those production occurences?  It's not too hard
> > > to imagine some form of corruption that leads to such a loop, but which isn't
> > > related to the horizon going backwards?  E.g. a corrupted HOT chain can lead
> > > to heap_page_prune() not acting on a DEAD tuple, but lazy_scan_prune() would
> > > then encounter a DEAD tuple.

I've not seen this recur for any one table, so I think we can rule out
corruption modes that would reach the loop every time.  (If a hypothesized
loop explanation calls for both corruption and horizon movement, that could
still apply.)

> > One occurrence had these facts:
> >
> > HeapTupleHeaderGetXmin             = 95271613
> > HeapTupleHeaderGetUpdateXid        = 95280147
> > vacrel->OldestXmin                 = 95317451
> > vacrel->vistest->definitely_needed = 95318928
> > vacrel->vistest->maybe_needed      = 93624425
> >
> > How compatible are those with the corruption vectors you have in view?
> 
> Do you have more information about the page this was on? E.g. pageinspect
> output? Or at least the infomasks of that tuple?

No, unfortunately.

> I assume this was a normal
> data table (i.e. not a [shared|user] catalog table or temp table)?

Normal data table

> Do you know what ComputeXidHorizonsResultLastXmin, RecentXmin were set to?

No.

> > I tried briefly to understand
> > https://postgr.es/m/flat/20240415173913.4zyyrwaftujxthf2@awork3.anarazel.de
> > but I felt verifying its argument was going to be a big job for me.  Would
> > those errors happen transiently, like the infinite loop, or would they
> > persist until something resets the tuple fields (e.g. ATRewriteTables())?
> 
> I think they'd be transient, because the visibility information during the
> next vacuum would presumably not be "skewed" anymore?

That is good.

> Of course it's possible
> you'd re-encounter the problem, if you constantly have horizons going back and
> forth. But I'd still classify that as transient.

Certainly.



В списке pgsql-bugs по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: BUG #18440: Query does not prune partitions correctly or use index when prepared statements are used
Следующее
От: Shlok Kyal
Дата:
Сообщение: Re: BUG #18433: Logical replication timeout