Re: Disabling Heap-Only Tuples

Поиск
Список
Период
Сортировка
От Matthias van de Meent
Тема Re: Disabling Heap-Only Tuples
Дата
Msg-id CAEze2WjZCrey70=MWyyGRn51cGpDW43aUoWyAKXg5U-5h53Lpw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Disabling Heap-Only Tuples  (Thom Brown <thom@linux.com>)
Ответы Re: Disabling Heap-Only Tuples  (Thom Brown <thom@linux.com>)
Список pgsql-hackers
On Wed, 5 Jul 2023 at 14:39, Thom Brown <thom@linux.com> wrote:
>
> On Wed, 5 Jul 2023 at 13:12, Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> >
> > On Wed, 5 Jul 2023 at 13:03, Thom Brown <thom@linux.com> wrote:
> > >
> > > On Wed, 5 Jul 2023 at 11:57, Matthias van de Meent
> > > <boekewurm+postgres@gmail.com> wrote:
> > > >
> > > > On Wed, 5 Jul 2023 at 12:45, Thom Brown <thom@linux.com> wrote:
> > > > > Heap-Only Tuple (HOT) updates are a significant performance
> > > > > enhancement, as they prevent unnecessary page writes. However, HOT
> > > > > comes with a caveat: it means that if we have lots of available space
> > > > > earlier on in the relation, it can only be used for new tuples or in
> > > > > cases where there's insufficient space on a page for an UPDATE to use
> > > > > HOT.
> > > > >
> > > > > This mechanism limits our options for condensing tables, forcing us to
> > > > > resort to methods like running VACUUM FULL/CLUSTER or using external
> > > > > tools like pg_repack. These either require exclusive locks (which will
> > > > > be a deal-breaker on large tables on a production system), or there's
> > > > > risks involved. Of course we can always flood pages with new versions
> > > > > of a row until it's forced onto an early page, but that shouldn't be
> > > > > necessary.
> > > > >
> > > > > Considering these trade-offs, I'd like to propose an option to allow
> > > > > superusers to disable HOT on tables. The intent is to trade some
> > > > > performance benefits for the ability to reduce the size of a table
> > > > > without the typical locking associated with it.
> > > >
> > > > Interesting use case, but I think that disabling HOT would be missing
> > > > the forest for the trees. I think that a feature that disables
> > > > block-local updates for pages > some offset would be a better solution
> > > > to your issue: Normal updates also prefer the new tuple to be stored
> > > > in the same pages as the old tuple if at all possible, so disabling
> > > > HOT wouldn't solve the issue of tuples residing in the tail of your
> > > > table - at least not while there is still empty space in those pages.
> > >
> > > Hmm... I see your point.  It's when an UPDATE isn't going to land on
> > > the same page that it relocates to the earlier available page.  So I
> > > guess I'm after whatever mechanism would allow that to happen reliably
> > > and predictably.
> > >
> > > So $subject should really be "Allow forcing UPDATEs off the same page".
> >
> > You'd probably want to do that only for a certain range of the table -
> > for a table with 1GB of data and 3GB of bloat there is no good reason
> > to force page-crossing updates in the first 1GB of the table - all
> > tuples of the table will eventually reside there, so why would you
> > take a performance penalty and move the tuples from inside that range
> > to inside that same range?
>
> I'm thinking more of a case of:
>
> <magic to stop UPDATES from landing on same page>
>
> UPDATE bigtable
> SET primary key = primary key
> WHERE ctid IN (
>     SELECT ctid
>     FROM bigtable
>     ORDER BY ctid DESC
>     LIMIT 100000);

So what were you thinking of? A session GUC? A table option?

The benefit of a table option is that it is retained across sessions
and thus allows tables that get enough updates to eventually get to a
cleaner state. The main downside of such a table option is that it
requires a temporary table-level lock to update the parameter.

The benefit of a session GUC is that you can set it without impacting
other sessions, but the downside is that you need to do the
maintenance in that session, and risk that cascading updates to other
tables (e.g. through AFTER UPDATE triggers) are also impacted by this
non-local update GUC.

> > Something else to note: Indexes would suffer some (large?) amount of
> > bloat in this process, as you would be updating a lot of tuples
> > without the HOT optimization, thus increasing the work to be done by
> > VACUUM.
> > This may result in more bloat in indexes than what you get back from
> > shrinking the table.
>
> This could be the case, but I guess indexes are expendable to an
> extent, unlike tables.

I don't think that's accurate - index rebuilds are quite expensive.
But, that's besides the point of this thread.

Somewhat related: did you consider using pg_repack instead of this
potential feature?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: logicalrep_message_type throws an error
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: logicalrep_message_type throws an error