Re: BUG #17791: Assert on procarray.c

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: BUG #17791: Assert on procarray.c
Дата
Msg-id 20230215050612.po5rjq6zd7oq7cu6@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: BUG #17791: Assert on procarray.c  (Robins Tharakan <tharakan@gmail.com>)
Ответы Re: BUG #17791: Assert on procarray.c  (Robins Tharakan <tharakan@gmail.com>)
Список pgsql-bugs
Hi,

On 2023-02-15 14:46:13 +1030, Robins Tharakan wrote:
> Thanks for taking a look and possibly you're correct with your
> assumption. I mean I see a ton of FATALs but let me know if I am
> mistaken in assuming them to be harmless (since they just convey that
> the client's gone away)?

Those are indeed not very interesting - although it'd be interesting to know
what caused the clients to go away.


> Nonetheless, I have provided error logs going back till Oct 22 just in
> case the engine can recover from some of those scenarios. Two things
> about the test scenario that may be relevant:
> 
> 1. Since performance was the least of my worries, the postgres server
> and the client workload are on the same box. Add dblink / FDW to this
> mix, and it is easy to end up with a ton of loopback connections
> (think SELECT dblink_conect() FROM pg_catalog.pg_class) - IMO
> noteworthy, since there are a ton of "Broken pipe"s and one instance
> of 'too many file descriptors'.

I think the "too many file descriptors" bit might be the interesting part.

I suspect the reason you're not seeing this on newer versions is that 13+ has

commit 3d475515a15f70a4a3f36fbbba93db6877ff8346
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   2020-02-24 17:28:33 -0500

    Account explicitly for long-lived FDs that are allocated outside fd.c.


But I can't yet explain precisely why that causes the assertion failures. A
vague guess is that we fail to write 2PC state files due to the lack of FD
accounting, throw an error due to that, and then fail with that assert during
handling the error.


It might be worth trying to reproduce the issue with a much lower ulimit -S
-n, to reach the problematic state more quickly. A reproducer would be very
useufl.


> 2. All versions are subjected to similar workload and it is possible
> that v13+ has generally improved in this area, and thus this possibly
> crashes less? Unsure.

What range of versions / commits are you testing this workload on?

Are you testing 11 as well? Because I don't see why we'd have the issue on 12,
but not 11.

Greetings,

Andres Freund



В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #17794: dates with zero or negative years are not accepted
Следующее
От: Alexander Bluce
Дата:
Сообщение: Re: BUG #17782: ERROR: variable not found in subplan target lists