Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

Поиск
Список
Период
Сортировка
От Matthias van de Meent
Тема Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths
Дата
Msg-id CAEze2Wj3=PwJzo8ux_4W_eEpMvd7ZMYSBQO4r7iJT01geCabaA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths  (Andres Freund <andres@anarazel.de>)
Ответы Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Thank you all for the feedback. Please find attached v2 of the
patchset, which contains updated comments and applies the suggested
changes.

On Sat, 12 Mar 2022 at 02:03, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-03-11 22:42:42 +0200, Heikki Linnakangas wrote:
> > Have you been able to create a test case for that? The largest record I can
> > think of is a commit record with a huge number of subtransactions, dropped
> > relations, and shared inval messages. I'm not sure if you can overflow a
> > uint32 with that, but exceeding MaxAllocSize seems possible.
>
> MaxAllocSize is pretty easy:
> SELECT pg_logical_emit_message(false, long, long) FROM repeat(repeat(' ', 1024), 1024*1023) as l(long);
>
> on a standby:
>
> 2022-03-11 16:41:59.336 PST [3639744][startup][1/0:0] LOG:  record length 2145386550 at 0/3000060 too long

Thanks for the reference. I was already playing around with 2PC log
records (which can theoretically contain >4GB of data); but your
example is much easier and takes significantly less time.

I'm not sure whether or not to include this in the test suite, though,
as this would require a machine with at least 1GB of memory available
for this test alone, and I don't know the current requirements for
running the test suite.

> > I wonder if these checks hurt performance. These are very cheap, but then
> > again, this codepath is very hot. It's probably fine, but it still worries
> > me a little. Maybe some of these could be Asserts.
>
> I wouldn't expect the added branch itself to hurt much in XLogRegisterData() -
> it should be statically predicted to be not taken with the unlikely. I don't
> think it's quite inner-loop enough for the instructions or the number of
> "concurrently out of order branches" to be a problem.
>
> FWIW, often the added elog()s are worse, because they require a decent amount
> of code and restrict the optimizer somewhat (e.g. no sibling calls, more local
> variables etc). They can't even be deduplicated because of the line-numbers
> embedded.
>
> So maybe just collapse the new elog() with the previous elog, with a common
> unlikely()?

Updated.

> > > @@ -734,6 +744,10 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
> > >             if (needs_data)
> > >             {
> > > +                   /* protect against overflow */
> > > +                   if (unlikely(regbuf->rdata_len > UINT16_MAX))
> > > +                           elog(ERROR, "too much WAL data for registered buffer");
> > > +
> > >                     /*
> > >                      * Link the caller-supplied rdata chain for this buffer to the
> > >                      * overall list.
>
> FWIW, this branch I'm a tad more concerned about - it's in a loop body where
> plausibly a lot of branches could be outstanding at the same time.
>
> ISTM that this could just be an assert?

This specific location has been replaced with an Assert, while
XLogRegisterBufData always does the unlikely()-ed bounds check.

Kind regards,

Matthias

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Allow async standbys wait for sync replication