Обсуждение: One PG process eating more than 40GB of RAM and getting killed by OOM

Поиск
Список
Период
Сортировка

One PG process eating more than 40GB of RAM and getting killed by OOM

От
Jean-Christophe Boggio
Дата:
Hello,

On my dev laptop, I have ~40GB free RAM. When launching a heavy 
calculation in PostgreSQL (within a stored procedure), it consumes as 
much memory as is available and then gets killed by OOM. There is only 
one connected session.

I have the following settings, which look reasonable (to me):

shared_buffers = 512MB                  # min 128kB
#huge_pages = try                       # on, off, or try
temp_buffers = 512MB                    # min 800kB
#max_prepared_transactions = 0          # zero disables the feature
work_mem = 1GB                          # min 64kB
#hash_mem_multiplier = 1.0              # 1-1000.0 multiplier on hash 
table work_mem
maintenance_work_mem = 1GB              # min 1MB
#autovacuum_work_mem = -1               # min 1MB, or -1 to use 
maintenance_work_mem
#logical_decoding_work_mem = 64MB       # min 64kB
#max_stack_depth = 2MB                  # min 100kB
#shared_memory_type = mmap              # the default is the first option
dynamic_shared_memory_type = posix      # the default is the first option
#temp_file_limit = -1                   # limits per-process temp file space

This is PostgreSQL 14.7 running on Ubuntu 23.04

What can I do to prevent the crash?

Thanks for your help,




Re: One PG process eating more than 40GB of RAM and getting killed by OOM

От
MichaelDBA
Дата:
Turn off the OOM killer so you would get a nicer me ssage in PG log file instead of crashing the PG service.
vm.overcommit_memory=2


Jean-Christophe Boggio wrote on 10/13/2023 9:06 AM:
Hello,

On my dev laptop, I have ~40GB free RAM. When launching a heavy calculation in PostgreSQL (within a stored procedure), it consumes as much memory as is available and then gets killed by OOM. There is only one connected session.

I have the following settings, which look reasonable (to me):

shared_buffers = 512MB                  # min 128kB
#huge_pages = try                       # on, off, or try
temp_buffers = 512MB                    # min 800kB
#max_prepared_transactions = 0          # zero disables the feature
work_mem = 1GB                          # min 64kB
#hash_mem_multiplier = 1.0              # 1-1000.0 multiplier on hash table work_mem
maintenance_work_mem = 1GB              # min 1MB
#autovacuum_work_mem = -1               # min 1MB, or -1 to use maintenance_work_mem
#logical_decoding_work_mem = 64MB       # min 64kB
#max_stack_depth = 2MB                  # min 100kB
#shared_memory_type = mmap              # the default is the first option
dynamic_shared_memory_type = posix      # the default is the first option
#temp_file_limit = -1                   # limits per-process temp file space

This is PostgreSQL 14.7 running on Ubuntu 23.04

What can I do to prevent the crash?

Thanks for your help,





Regards,

Michael Vitale

Michaeldba@sqlexec.com

703-600-9343 


Вложения

Re: One PG process eating more than 40GB of RAM and getting killed by OOM

От
Johannes Truschnigg
Дата:
You will want to try decreasing work_mem to a sane number first, without
looking at anything else really.

Check out the official docs:
https://www.postgresql.org/docs/current/runtime-config-resource.html#GUC-WORK-MEM

The gist is that work_mem is not a limit that's effective per
session/connection/query, but per sort- or hash-node, of which there can be
many in complex queries. Which is why 1GB of work_mem can end up consuming
several multiples of that, if you are (un)lucky enough.

--
with best regards:
- Johannes Truschnigg ( johannes@truschnigg.info )

www:   https://johannes.truschnigg.info/
phone: +436502133337
xmpp:  johannes@truschnigg.info

Вложения

Re: One PG process eating more than 40GB of RAM and getting killed by OOM

От
Michael Banck
Дата:
Hi,

On Fri, Oct 13, 2023 at 03:06:57PM +0200, Jean-Christophe Boggio wrote:
> On my dev laptop, I have ~40GB free RAM. When launching a heavy calculation
> in PostgreSQL (within a stored procedure), it consumes as much memory as is
> available and then gets killed by OOM. There is only one connected session.
> 
> I have the following settings, which look reasonable (to me):
> 
> shared_buffers = 512MB                  # min 128kB

That's not a lot.

> work_mem = 1GB                          # min 64kB

On the other hand, that's a lot. So if that query is run in parallel and
has a lot of different things it does, it might use work_mem several
times over, resulting in a lot of used memory.

So first thing to try would be to lower work_mem to someting much lower
like 64MB or 128MB.

> This is PostgreSQL 14.7 running on Ubuntu 23.04

The other thing you can try is to see whether turning jit off (or just
jit_inline_above_cost = -1) is helping.


Michael



Re: One PG process eating more than 40GB of RAM and getting killed by OOM

От
Jean-Christophe Boggio
Дата:
Le 13/10/2023 à 15:12, MichaelDBA a écrit :
Turn off the OOM killer so you would get a nicer me ssage in PG log file instead of crashing the PG service.
vm.overcommit_memory=2

Did this and now the process dies much quicker (without seemingly consume all the available memory)... Also I can not launch thunderbird anymore with this setting...

Anyway, I also reduced work_mem to 128Mb

You can find the corresponding logs attached.

Thanks for your help,

JC

--

Jean-Christophe Boggio

Independant consultant and developer

cat@thefreecat.org

Office : +33 2 46 65 56 96

Mobile : +33 6 60 70 83 70

104bis rue Avisseau

37000 Tours

France

Вложения

Re: One PG process eating more than 40GB of RAM and getting killed by OOM

От
Tom Lane
Дата:
Jean-Christophe Boggio <postgresql@thefreecat.org> writes:
> You can find the corresponding logs attached.

Your query seems to be creating TupleSort contexts and then leaking
them, which is surely a bug.  Can you make a self-contained test
case that reproduces this?

            regards, tom lane



Re: One PG process eating more than 40GB of RAM and getting killed by OOM

От
Jeff Janes
Дата:
On Fri, Oct 13, 2023 at 10:53 AM Jean-Christophe Boggio <postgresql@thefreecat.org> wrote:
Le 13/10/2023 à 15:12, MichaelDBA a écrit :
Turn off the OOM killer so you would get a nicer me ssage in PG log file instead of crashing the PG service.
vm.overcommit_memory=2

Did this and now the process dies much quicker (without seemingly consume all the available memory)... Also I can not launch thunderbird anymore with this setting...


Yes, turning off overcommit doesn't play with graphical environments, in my experience. But a production database probably shouldn't be running on a system like that.  On non-prod systems, you can either turn it off temporarily, or you could try to catch the problem before it becomes fatal and get the log with pg_log_backend_memory_contexts.
 

Anyway, I also reduced work_mem to 128Mb

You can find the corresponding logs attached.

We can see what the problem is (over 137,000 concurrent tuple sorts), but we can't tell what is ultimately causing it.  You will need to dig into, or disclose, the contents of the procedure.

Cheers,

Jeff

Re: One PG process eating more than 40GB of RAM and getting killed by OOM

От
Jean-Christophe Boggio
Дата:
Le 13/10/2023 à 18:48, Jeff Janes a écrit :
> Yes, turning off overcommit doesn't play with graphical environments, 
> in my experience. But a production database probably shouldn't be 
> running on a system like that. On non-prod systems, you can either 
> turn it off temporarily, or you could try to catch the problem before 
> it becomes fatal and get the log with pg_log_backend_memory_contexts.

As I said, this is my dev laptop and no, I would never waste precious 
RAM this way on a production server ;-)

>  We can see what the problem is (over 137,000 concurrent tuple sorts), 
> but we can't tell what is ultimately causing it.  You will need to dig 
> into, or disclose, the contents of the procedure.

I have no problem disclosing this code and data to the PG dev team (this 
is client data though so please keep it for yourselves). Where can I 
send you a link to the dump ?

Best,

JC




Re: One PG process eating more than 40GB of RAM and getting killed by OOM

От
Tom Lane
Дата:
Jean-Christophe Boggio <postgresql@thefreecat.org> writes:
> Le 13/10/2023 à 18:48, Jeff Janes a écrit :
>> We can see what the problem is (over 137,000 concurrent tuple sorts), 
>> but we can't tell what is ultimately causing it.  You will need to dig 
>> into, or disclose, the contents of the procedure.

> I have no problem disclosing this code and data to the PG dev team (this 
> is client data though so please keep it for yourselves). Where can I 
> send you a link to the dump ?

I'm interested in taking a look, you can send me the link privately.

            regards, tom lane



Re: One PG process eating more than 40GB of RAM and getting killed by OOM

От
Tom Lane
Дата:
Jean-Christophe Boggio <postgresql@thefreecat.org> writes:
> I have no problem disclosing this code and data to the PG dev team (this
> is client data though so please keep it for yourselves). Where can I
> send you a link to the dump ?

Thanks for sending the data.  I'm not observing any leak on current
Postgres, and after checking the commit log I realized that your
symptoms look mighty like this previous report:

https://www.postgresql.org/message-id/b2bd02dff61af15e3526293e2771f874cf2a3be7.camel%40cybertec.at

which was fixed here:

Author: Tomas Vondra <tomas.vondra@postgresql.org>
Branch: master [98640f960] 2023-07-02 20:03:30 +0200
Branch: REL_16_STABLE Release: REL_16_0 [9ae7b5d1f] 2023-07-02 20:04:16 +0200
Branch: REL_15_STABLE Release: REL_15_4 [0c5fe4ff6] 2023-07-02 20:04:40 +0200
Branch: REL_14_STABLE Release: REL_14_9 [c1affa38c] 2023-07-02 20:05:14 +0200
Branch: REL_13_STABLE Release: REL_13_12 [3ce761d5c] 2023-07-02 20:05:35 +0200

    Fix memory leak in Incremental Sort rescans

    The Incremental Sort had a couple issues, resulting in leaking memory
    during rescans, possibly triggering OOM. The code had a couple of
    related flaws:

    1. During rescans, the sort states were reset but then also set to NULL
       (despite the comment saying otherwise). ExecIncrementalSort then
       sees NULL and initializes a new sort state, leaking the memory used
       by the old one.

    2. Initializing the sort state also automatically rebuilt the info about
       presorted keys, leaking the already initialized info. presorted_keys
       was also unnecessarily reset to NULL.

    Patch by James Coleman, based on patches by Laurenz Albe and Tom Lane.
    Backpatch to 13, where Incremental Sort was introduced.

    Author: James Coleman, Laurenz Albe, Tom Lane
    Reported-by: Laurenz Albe, Zu-Ming Jiang
    Backpatch-through: 13
    Discussion: https://postgr.es/m/b2bd02dff61af15e3526293e2771f874cf2a3be7.camel%40cybertec.at
    Discussion: https://postgr.es/m/db03c582-086d-e7cd-d4a1-3bc722f81765%40inf.ethz.ch


So I think the answer for you is "update to Postgres 14.9".

            regards, tom lane



Re: One PG process eating more than 40GB of RAM and getting killed by OOM

От
Jean-Christophe Boggio
Дата:
Hello Tom,

Le 14/10/2023 à 00:39, Tom Lane a écrit :
> Thanks for sending the data.  I'm not observing any leak on current  > Postgres, and after checking the commit log I
realizedthat your > 
 
symptoms look mighty like this previous report: > [...] > So I think the 
answer for you is "update to Postgres 14.9".
You are right. I thought I always used the latest version but updates 
were deactivated...

Sorry for wasting your time, after upgrade, everything is working perfectly.

Have a nice day,

JC