Re: pgsql: Add kqueue(2) support to the WaitEventSet API.

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: pgsql: Add kqueue(2) support to the WaitEventSet API.
Дата
Msg-id CA+hUKGLh5NSUhPmipF+3n2-y_ncuVWu3DSg7vKhvY2HwxEBiNQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pgsql: Add kqueue(2) support to the WaitEventSet API.  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: pgsql: Add kqueue(2) support to the WaitEventSet API.  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Fri, Feb 21, 2020 at 8:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > It seems fairly obvious now that I look at it, but: the epoll and kqueue
> > variants of CreateWaitEventSet are both *fundamentally* unsafe, because
> > they assume that they can always get a FD when they want one, which is
> > not a property that we generally want backend code to have.  The only
> > reason we've not seen this before with epoll is a lack of testing
> > under lots-of-FDs stress.
> > The fact that they'll likely leak those FDs on subsequent failures is
> > another not-very-nice property.
>
> Hmmm ... actually, there's a third problem, which is the implicit
> assumption that we can have as many concurrently-active WaitEventSets
> as we like.  We can't, if they depend on FDs --- that's a precious
> resource.  It doesn't look like we actually ever have more than about
> two per process right now, but I'm concerned about what will happen
> as the usage of the feature increases.

One thing I've been planning to do for 13 is to get rid of all the
temporary create/destroy WaitEventSets from the main backend loops.
My goal was cutting down on stupid system calls, but this puts a new
spin on it.  I have a patch set to do a bunch of that[1], but now I'm
thinking that perhaps I need to be even more aggressive about it and
set up the 'common' long lived WES up front at backend startup, rather
than doing it on demand, so that there is no chance of failure due to
lack of fds once you've started up.  I also recently figured out how
to handle some more places with the common WES.  I'll post a new patch
set over on that thread shortly.

That wouldn't mean that the postgres_fdw.sql can't fail on a ulimit -n
= 128 system, though, it might just mean that it's postgres_fdw's
socket() call that hits EMFILE rather than WES's kqueue() call while
running that test.  I suppose there are two kinds of system: those
where ulimit -n is higher than max_files_per_process (defaults, on
Linux: 1024 vs 1000) so you have more allowance for sockets and the
like, and those where it isn't, like coypu, where NUM_RESERVED_FDS is
the only thing ensuring we have some spare fds.  I don't know the
history but it looks like NUM_RESERVED_FDS was deliberately or
accidentally tuned to be just enough to be able to run the regression
tests (the interesting ones being the ones that use sockets, like
postgres_fdw.sql), but with a new long lived kqueue() fd in the
picture, it might have to be increased to cover it, no?

About the potential for leaks, Horiguchi-san realised this hazard and
posted a patch[2] to allow WaitEventSets to be cleaned up by the
resource owner machinery.  That's useful for the temporary
WaitEventSet objects that we'd genuinely need in the patch set that's
part of: that's for creating a query-lifetime WES to manage N
connections to remote shards, and it needs to be cleaned up on
failure.  For the temporary ones created by WaitLatch(), I suspect
they don't really belong in a resource owner: instead we should get
rid of it using my WaitMyLatch() patch set, and if there are any
places where we can't for some reason (I hope not), perhaps a
try/catch block should be used to fix that.

[1]
https://www.postgresql.org/message-id/flat/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/20191206.171211.1119526746053895900.horikyota.ntt%40gmail.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Add PGURI env var for passing connection string to psql in Docker
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.