Re: Memory leak from ExecutorState context?
От | Konstantin Knizhnik |
---|---|
Тема | Re: Memory leak from ExecutorState context? |
Дата | |
Msg-id | 8338f4ee-47f0-f622-e37f-b1ce1dbfdf38@garret.ru обсуждение исходный текст |
Ответ на | Re: Memory leak from ExecutorState context? (Melanie Plageman <melanieplageman@gmail.com>) |
Список | pgsql-hackers |
On 21.04.2023 1:51 AM, Melanie Plageman wrote: > On Thu, Apr 20, 2023 at 12:42 PM Konstantin Knizhnik <knizhnik@garret.ru> wrote: >> On 11.04.2023 8:14 PM, Jehan-Guillaume de Rorthais wrote: >>> On Sat, 8 Apr 2023 02:01:19 +0200 >>> Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote: >>> >>>> On Fri, 31 Mar 2023 14:06:11 +0200 >>>> Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote: >>>> >>>> [...] >>>> >>>> After rebasing Tomas' memory balancing patch, I did some memory measures >>>> to answer some of my questions. Please, find in attachment the resulting >>>> charts "HJ-HEAD.png" and "balancing-v3.png" to compare memory consumption >>>> between HEAD and Tomas' patch. They shows an alternance of numbers >>>> before/after calling ExecHashIncreaseNumBatches (see the debug patch). I >>>> didn't try to find the exact last total peak of memory consumption during the >>>> join phase and before all the BufFiles are destroyed. So the last number >>>> might be underestimated. >>> I did some more analysis about the total memory consumption in filecxt of HEAD, >>> v3 and v4 patches. My previous debug numbers only prints memory metrics during >>> batch increments or hash table destruction. That means: >>> >>> * for HEAD: we miss the batches consumed during the outer scan >>> * for v3: adds twice nbatch in spaceUsed, which is a rough estimation >>> * for v4: batches are tracked in spaceUsed, so they are reflected in spacePeak >>> >>> Using a breakpoint in ExecHashJoinSaveTuple to print "filecxt->mem_allocated" >>> from there, here are the maximum allocated memory for bufFile context for each >>> branch: >>> >>> batches max bufFiles total spaceAllowed rise >>> HEAD 16384 199966960 ~194MB >>> v3 4096 65419456 ~78MB >>> v4(*3) 2048 34273280 48MB nbatch*sizeof(PGAlignedBlock)*3 >>> v4(*4) 1024 17170160 60.6MB nbatch*sizeof(PGAlignedBlock)*4 >>> v4(*5) 2048 34273280 42.5MB nbatch*sizeof(PGAlignedBlock)*5 >>> >>> It seems account for bufFile in spaceUsed allows a better memory balancing and >>> management. The precise factor to rise spaceAllowed is yet to be defined. *3 or >>> *4 looks good, but this is based on a single artificial test case. >>> >>> Also, note that HEAD is currently reporting ~4MB of memory usage. This is by >>> far wrong with the reality. So even if we don't commit the balancing memory >>> patch in v16, maybe we could account for filecxt in spaceUsed as a bugfix? >>> >>> Regards, >> Thank you for the patch. >> I faced with the same problem (OOM caused by hash join). >> I tried to create simplest test reproducing the problem: >> >> create table t(pk int, val int); >> insert into t values (generate_series(1,100000000),0); >> set work_mem='64kB'; >> explain (analyze,buffers) select count(*) from t t1 join t t2 on >> (t1.pk=t2.pk); >> >> >> There are three workers and size of each exceeds 1.3Gb. >> >> Plan is the following: >> >> Finalize Aggregate (cost=355905977972.87..355905977972.88 rows=1 >> width=8) (actual time=2 >> 12961.033..226097.513 rows=1 loops=1) >> Buffers: shared hit=32644 read=852474 dirtied=437947 written=426374, >> temp read=944407 w >> ritten=1130380 >> -> Gather (cost=355905977972.65..355905977972.86 rows=2 width=8) >> (actual time=212943. >> 505..226097.497 rows=3 loops=1) >> Workers Planned: 2 >> Workers Launched: 2 >> Buffers: shared hit=32644 read=852474 dirtied=437947 >> written=426374, temp read=94 >> 4407 written=1130380 >> -> Partial Aggregate (cost=355905976972.65..355905976972.66 >> rows=1 width=8) (ac >> tual time=212938.410..212940.035 rows=1 loops=3) >> Buffers: shared hit=32644 read=852474 dirtied=437947 >> written=426374, temp r >> ead=944407 written=1130380 >> -> Parallel Hash Join (cost=1542739.26..303822614472.65 >> rows=20833345000002 width=0) (actual time=163268.274..207829.524 >> rows=33333333 loops=3) >> Hash Cond: (t1.pk = t2.pk) >> Buffers: shared hit=32644 read=852474 >> dirtied=437947 written=426374, temp read=944407 written=1130380 >> -> Parallel Seq Scan on t t1 >> (cost=0.00..859144.78 rows=41666678 width=4) (actual >> time=0.045..30828.051 rows=33333333 loops=3) >> Buffers: shared hit=16389 read=426089 written=87 >> -> Parallel Hash (cost=859144.78..859144.78 >> rows=41666678 width=4) (actual time=82202.445..82202.447 rows=33333333 >> loops=3) >> Buckets: 4096 (originally 4096) Batches: >> 32768 (originally 8192) Memory Usage: 192kB >> Buffers: shared hit=16095 read=426383 >> dirtied=437947 written=426287, temp read=267898 written=737164 >> -> Parallel Seq Scan on t t2 >> (cost=0.00..859144.78 rows=41666678 width=4) (actual >> time=0.054..12647.534 rows=33333333 loops=3) >> Buffers: shared hit=16095 read=426383 >> dirtied=437947 writ >> ten=426287 >> Planning: >> Buffers: shared hit=69 read=38 >> Planning Time: 2.819 ms >> Execution Time: 226113.292 ms >> (22 rows) >> >> >> >> ----------------------------- >> >> So we have increased number of batches to 32k. >> I applied your patches 0001-0004 but unfortunately them have not reduced >> memory consumption - still size of each backend is more than 1.3Gb. > Is this EXPLAIN ANALYZE run on an instance with Jehan-Guillaume's > patchset applied or without? > > I'm asking because the fourth patch in the series updates spaceUsed with > the size of the BufFile->buffer, but I notice in your EXPLAIN ANALZYE, > Memory Usage for the hashtable is reported as 192 kB, which, while > larger than the 64kB work_mem you set, isn't as large as I might expect. > > - Melanie Yes, this is explain analyze for the Postgres version with applied 4 patches: 0001-v4-Describe-hybrid-hash-join-implementation.patch 0002-v4-Allocate-hash-batches-related-BufFile-in-a-dedicated.patch 0003-v4-Add-some-debug-and-metrics.patch 0004-v4-Limit-BufFile-memory-explosion-with-bad-HashJoin.patch Just as workaround I tried the attached patch - it prevents backups memory footprint growth by limiting number of created batches. I am not sure that it is right solution, because in any case we allocate more memory than specified by work_mem. The alternative is to prohibit hash join plan in this case. But it is also not so good solution, because merge join is used to be much slower.
Вложения
В списке pgsql-hackers по дате отправления: