Re: Eager page freeze criteria clarification

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Eager page freeze criteria clarification
Дата
Msg-id CAH2-WzmY_ywKHgVQ-0a7MVwq8MAmzztsDsSjgdns7OtMmbFhhQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Eager page freeze criteria clarification  (Andres Freund <andres@anarazel.de>)
Ответы Re: Eager page freeze criteria clarification  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Wed, Sep 27, 2023 at 6:35 PM Andres Freund <andres@anarazel.de> wrote:
> >   if insert LSN - RedoRecPtr < insert LSN - page LSN
> >   page is older than the most recent checkpoint start, so freeze it
> >   regardless of whether or not it would emit an FPI
> >
> > What aggressiveness levels should there be? What should change at each
> > level? What criteria should pages have to meet to be subject to the
> > aggressiveness level?
>
> I'm thinking something very roughly along these lines could make sense:
>
> page_lsn_age = insert_lsn - page_lsn;

While there is no reason to not experiment here, I have my doubts
about what you've sketched out. Most importantly, it doesn't have
anything to say about the cost of not freezing -- just the cost of
freezing. But isn't the main problem *not* freezing when we could and
should have? (Of course the cost of freezing is very relevant, but
it's still secondary.)

But even leaving that aside, I just don't get why this will work with
the case that you yourself emphasized earlier on: a workload with
inserts plus "hot tail" updates. If you run TPC-C according to spec,
there is about 12 or 14 hours between the initial inserts into the
orders and order lines table (by a new order transaction), and the
subsequent updates (from the delivery transaction). When I run the
benchmark, I usually don't stick with the spec (it's rather limiting
on modern hardware), so it's more like 2 - 4 hours before each new
order is delivered (meaning updated in those two big tables). Either
way, it's a fairly long time relative to everything else.

Won't the algorithm that you've sketched always think that
"unfreezing" pages doesn't affect recently frozen pages with such a
workload? Isn't the definition of "recently frozen" that emerges from
this algorithm not in any way related to the order delivery time, or
anything like that? You know, rather like vacuum_freeze_min_age.

Separately, at one point you also said "Yes. If the ratio of
opportunistically frozen pages (which I'd define as pages that were
frozen not because they strictly needed to) vs the number of unfrozen
pages increases, we need to make opportunistic freezing less
aggressive and vice versa".

Can we expect a discount for freezing that happened to be very cheap
anyway, when that doesn't work out?

What about a page that we would have had to have frozen anyway (based
on the conventional vacuum_freeze_min_age criteria) not too long after
it was frozen by this new mechanism, that nevertheless became unfrozen
some time later? That is, a page where "the unfreezing" cannot
reasonably be blamed on the initial so-called opportunistic freezing,
because really it was a total accident involving when VACUUM showed
up? You know, just like we'd expect with the TPC-C tables.

Aside: "unfrozen pages" seems to refer to pages that were frozen, and
became unfrozen. Not pages that are simply frozen. Lots of
opportunities for confusion here.

I'm not saying that it's wrong to freeze like this in the specific case of
TPC-C. But do you really need to invent all this complicated
infrastructure, just to avoid freezing the same pages again in a tight
loop?

On a positive note, I like that what you've laid out freezes eagerly
when an FPI won't result -- this much we can all agree on. I guess
that that part is becoming uncontroversial.

--
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: Requiring recovery.signal or standby.signal when recovering with a backup_label
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: Requiring recovery.signal or standby.signal when recovering with a backup_label