Hi,
On 2/20/24 15:05, Alexey Ermakov wrote:
> On 2024-02-16 20:40, Andrei Lepikhov wrote:
>> Interesting. It correlates with one performance issue I have been
>> trying to catch already 3 months. Could you provide some reproduction
>> of that behavior?
>>
> Yes, I'm still trying to make reproducer, it will take some time. Thanks.
>
I wonder if this might be yet another manifestation of the hashjoin
batch explosion issue we have. The plan has a hash join, and the fact
that it runs with a bit more memory would be consistent too.
The hashjoin batch explosion happens when we find a batch that's too
large to fit into a work_mem, and increasing the number of batches does
not really make it smaller (e.g. because there's a lot of rows with
exactly the same key). We end up doubling the number of batches, but
each batch needs a 8kB file buffer, so it's not hard to consume a lot of
memory due to this. Chances are the DSA allocation fails simply because
the system hits overcommit limit, or something like that.
It's a bit weird it needs 1.8GB of memory, but perhaps that's also
linked to the number of batches, somehow?
Anyway, if you could set a breakpoint on the error, and see how many
batches the hash join has, that'd be helpful. I'd probably try doing
that with non-parallel query, it makes it easier to debug and it may
even report the number of batches if it completes.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company