RE: Logical Replica ReorderBuffer Size Accounting Issues

Поиск

Список

Период

Сортировка

От	Wei Wang (Fujitsu)
Тема	RE: Logical Replica ReorderBuffer Size Accounting Issues
Дата	23 мая 2023 г. 07:11:29
Msg-id	OSZPR01MB6278C3FCBCE47A42CCF05DE99E409@OSZPR01MB6278.jpnprd01.prod.outlook.com обсуждение исходный текст
Ответ на	Re: Logical Replica ReorderBuffer Size Accounting Issues (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы	Re: Logical Replica ReorderBuffer Size Accounting Issues (Masahiko Sawada <sawada.mshk@gmail.com>)
Список	pgsql-bugs

Дерево обсуждения

On Thu, May 9, 2023 at 22:58 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Tue, May 9, 2023 at 6:06 PM Wei Wang (Fujitsu)
> > > I think there are two separate issues. One is a pure memory accounting
> > > issue: since the reorderbuffer accounts the memory usage by
> > > calculating actual tuple size etc. it includes neither the chunk
> > > header size nor fragmentations within blocks. So I can understand why
> > > the output of MemoryContextStats(rb->context) could be two or three
> > > times higher than logical_decoding_work_mem and doesn't match rb->size
> > > in some cases.
> > >
> > > However it cannot explain the original issue that the memory usage
> > > (reported by MemoryContextStats(rb->context)) reached 5GB in spite of
> > > logilca_decoding_work_mem being 256MB, which seems like a memory leak
> > > bug or something we ignore the memory limit.
> >
> > Yes, I agree that the chunk header size or fragmentations within blocks may
> > cause the allocated space to be larger than the accounted space. However, since
> > these spaces are very small (please refer to [1] and [2]), I also don't think
> > this is the cause of the original issue in this thread.
> >
> > I think that the cause of the original issue in this thread is the
> > implementation of 'Generational allocator'.
> > Please consider the following user scenario:
> > The parallel execution of different transactions led to very fragmented and
> > mixed-up WAL records for those transactions. Later, when walsender serially
> > decodes the WAL, different transactions' chunks were stored on a single block
> > in rb->tup_context. However, when a transaction ends, the chunks related to
> > this transaction on the block will be marked as free instead of being actually
> > released. The block will only be released when all chunks in the block are
> > free. In other words, the block will only be released when all transactions
> > occupying the block have ended. As a result, the chunks allocated by some
> > ending transactions are not being released on many blocks for a long time.
> Then
> > this issue occurred. I think this also explains why parallel execution is more
> > likely to trigger this issue compared to serial execution of transactions.
> > Please also refer to the analysis details of code in [3].
> 
> After some investigation, I don't think the implementation of
> generation allocator is problematic but I agree that your scenario is
> likely to explain the original issue. Especially, the output of
> MemoryContextStats() shows:
> 
>           Tuples: 4311744512 total in 514 blocks (12858943 chunks);
> 6771224 free (12855411 chunks); 4304973288 used
> 
> First, since the total memory allocation was 4311744512 bytes in 514
> blocks we can see there were no special blocks in the context (8MB *
> 514 = 4311744512 bytes). Second, it shows that the most chunks were
> free (12858943 chunks vs. 12855411 chunks) but most memory were used
> (4311744512 bytes vs. 4304973288 bytes), which means that there were
> some in-use chunks at the tail of each block, i.e. the most blocks
> were fragmented. I've attached another test to reproduce this
> behavior. In this test, the memory usage reaches up to almost 4GB.
> 
> One idea to deal with this issue is to choose the block sizes
> carefully while measuring the performance as the comment shows:
> 
>     /*
>      * XXX the allocation sizes used below pre-date generation context's block
>      * growing code.  These values should likely be benchmarked and set to
>      * more suitable values.
>      */
>     buffer->tup_context = GenerationContextCreate(new_ctx,
>                                                   "Tuples",
>                                                   SLAB_LARGE_BLOCK_SIZE,
>                                                   SLAB_LARGE_BLOCK_SIZE,
>                                                   SLAB_LARGE_BLOCK_SIZE);
> 
> For example, if I use SLAB_DEFAULT_BLOCK_SIZE, 8kB, the maximum memory
> usage was about 17MB in the test.

Thanks for your idea.
I did some tests as you suggested. I think the modification mentioned above can
work around this issue in the test 002_rb_memory_2.pl on [1] (To reach the size
of large transactions, I set logical_decoding_work_mem to 1MB). But the test
repreduce.sh on [2] still reproduces this issue. It seems that this modification
will fix a subset of use cases, But the issue still occurs for other use cases.

I think that the size of a block may lead to differences in the number of
transactions stored on the block. For example, before the modification, a block
could store some changes of 10 transactions, but after the modification, a block
may only store some changes of 3 transactions. I think this means that once
these three transactions are committed, this block will be actually released.
As a result, the probability of the block being actually released is increased
after the modification. Additionally, I think that the parallelism of the test
repreduce.sh is higher than that of the test 002_rb_memory_2.pl, which is also
the reason why this modification only fixed the issue in the test
002_rb_memory_2.pl.

Please let me know if I'm missing something.

Attach the modification patch that I used (tmp-modification.patch), as well as
the two tests mentioned above.

[1] - https://www.postgresql.org/message-id/CAD21AoAa17DCruz4MuJ_5Q_-JOp5FmZGPLDa%3DM9d%2BQzzg8kiBw%40mail.gmail.com
[2] -
https://www.postgresql.org/message-id/OS3PR01MB6275A7E5323601D59D18DB979EC29%40OS3PR01MB6275.jpnprd01.prod.outlook.com

Regards,
Wang wei

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее

От: Michael Paquier
Дата: 23 мая 2023 г., 01:43:11
Сообщение: Re: BUG #17938: could not open shared memory segment "/PostgreSQL.615216676": No such file or directory

Следующее

От: PG Bug reporting form
Дата: 23 мая 2023 г., 09:47:19
Сообщение: BUG #17939: CREATE EXTENSION pltcl; looks in the wrong folder

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

RE: Logical Replica ReorderBuffer Size Accounting Issues

Вложения

Предыдущее

Следующее