Re: BUG #17974: Walsenders memory usage suddenly spike to 80G+ causing OOM and server reboot

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: BUG #17974: Walsenders memory usage suddenly spike to 80G+ causing OOM and server reboot
Дата
Msg-id 20230614221503.ux62wn25bhydnzjm@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: BUG #17974: Walsenders memory usage suddenly spike to 80G+ causing OOM and server reboot  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: BUG #17974: Walsenders memory usage suddenly spike to 80G+ causing OOM and server reboot  (Michael Guissine <mguissine@gmail.com>)
Re: BUG #17974: Walsenders memory usage suddenly spike to 80G+ causing OOM and server reboot  (Michael Guissine <mguissine@gmail.com>)
Список pgsql-bugs
Hi,

On 2023-06-14 10:23:32 +0900, Michael Paquier wrote:
> On Wed, Jun 14, 2023 at 12:05:32AM +0000, PG Bug reporting form wrote:
> > We are running relatively large and busy Postgres database on RDS and using
> > logical replication extensively. We currently have 7 walsenders and while we
> > often see replication falls behind due to high transactional volume, we've
> > never experienced memory issues in 14.6 and below. After recent upgrade to
> > 14.8, we already had several incidents where walsender processes RES memory
> > would suddenly increase to over 80GB each causing freeable memory on the
> > instance to go down to zero.

When postgres knows it ran out of memory (instead of having gotten killed by
the OOM killer), it'll dump memory context information to the log. Could you
check whether there are related log entries?  They should precede an "out of
memory" ERROR.


> > Interesting that even after Instance reboot,
> > the memory used by walsender processes won't get released until we restart
> > the replication and drop the logical slots. The logical_decoding_work_mem
> > was set to 512MB in time of the last incident but we recently lowered it to
> > 128MB.

That seems very unlikely to be the case. If you restarted postgres or postgres
and the OS, there's nothing to have allocated the memory. What exactly do you
mean by "Instance reboot"?


> > Any known issues in pg 14.8 that would trigger this behaviour?
>
> Yes, there are known issues with memory handling in logical
> replication setups.  See for example this thread:
> https://www.postgresql.org/message-id/CAMnUB3oYugXCBLSkih+qNsWQPciEwos6g_AMbnz_peNoxfHwyw@mail.gmail.com

Why would 14.8 have made that problem worse?

Greetings,

Andres Freund



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: BUG #17973: Reinit of pgstats entry for dropped DB can break autovacuum daemon
Следующее
От: Andres Freund
Дата:
Сообщение: Re: BUG #17973: Reinit of pgstats entry for dropped DB can break autovacuum daemon