Re: Performance degradation in commit ac1d794

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Performance degradation in commit ac1d794
Дата
Msg-id 20160317020426.uz5oobr6b72mpcdw@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Performance degradation in commit ac1d794  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Performance degradation in commit ac1d794  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Performance degradation in commit ac1d794  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi,

On 2016-03-16 15:41:28 -0400, Robert Haas wrote:
> On Wed, Mar 16, 2016 at 3:25 PM, Andres Freund <andres@anarazel.de> wrote:
> >> > - Given current users we don't need a large amount of events, so having
> >> >   to iterate through the registered events doesn't seem bothersome. We
> >> >   could however change the api to be something like
> >> >
> >> >   int WaitLatchEventSet(LatchEventSet *set, OccurredEvents *, int nevents, long timeout);
> >> >
> >> >   which would return the number of events that happened, and would
> >> >   basically "fill" one of the (usually stack allocated) OccurredEvent
> >> >   structures with what happened.
> >>
> >> I definitely think something along these lines is useful.  I want to
> >> be able to have an Append node with 100 ForeignScans under it and kick
> >> off all the scans asynchronously and wait for all of the FDs at once.
> >
> > So you'd like to get only an event for the FD with data back? Or are you
> > ok with iterating through hundred elements in an array, to see which are
> > ready?
>
> I'd like to get an event back for the FD with data.  Iterating sounds
> like it could be really slow.  Say you get lots of little packets back
> from the same connection, while the others are idle.  Now you've got
> to keep iterating through them all over and over again.  Blech.

I've a (working) WIP version that works like I think you want.  It's
implemented in patch 05 (rework with just poll() support) and (add epoll
suppport). It's based on patches posted here earlier, but these aren't
interesting for the discussion.

The API is now:

typedef struct WaitEventSet WaitEventSet;

typedef struct WaitEvent
{
    int        pos;        /* position in the event data structure */
    uint32    events;        /* tripped events */
    int        fd;            /* fd associated with event */
} WaitEvent;

/*
 * Create a WaitEventSet with space for nevents different events to wait for.
 *
 * latch may be NULL.
 */
extern WaitEventSet *CreateWaitEventSet(int nevents);

/* ---
 * Add an event to the set. Possible events are:
 * - WL_LATCH_SET: Wait for the latch to be set
 * - WL_POSTMASTER_DEATH: Wait for postmaster to die
 * - WL_SOCKET_READABLE: Wait for socket to become readable
 *   can be combined in one event with WL_SOCKET_WRITEABLE
 * - WL_SOCKET_WRITABLE: Wait for socket to become readable
 *   can be combined with WL_SOCKET_READABLE
 *
 * Returns the offset in WaitEventSet->events (starting from 0), which can be
 * used to modify previously added wait events.
 */
extern int AddWaitEventToSet(WaitEventSet *set, uint32 events, int fd, Latch *latch);

/*
 * Change the event mask and, if applicable, the associated latch of of a
 * WaitEvent.
 */
extern void ModifyWaitEvent(WaitEventSet *set, int pos, uint32 events, Latch *latch);

/*
 * Wait for events added to the set to happen, or until the timeout is
 * reached.  At most nevents occurrent events are returned.
 *
 * Returns the number of events occurred, or 0 if the timeout was reached.
 */
extern int WaitEventSetWait(WaitEventSet *set, long timeout, WaitEvent* occurred_events, int nevents);


I've for now left the old latch API in place, and only converted
be-secure.c to the new style.  I'd appreciate some feedback before I go
around and convert and polish everything.

Questions:
* I'm kinda inclined to merge the win32 and unix latch
  implementations. There's already a fair bit in common, and this is
  just going to increase that amount.

* Right now the caller has to allocate the WaitEvents he's waiting for
  locally (likely on the stack), but we also could allocate them as part
  of the WaitEventSet. Not sure if that'd be a benefit.

* I can do a blind rewrite of the windows implementation, but I'm
  obviously not going to get that entirely right. So I need some help
  from a windows person to test this.

* This approach, with a 'stateful' wait event data structure, will
  actually allow to fix a couple linering bugs we have on the windows
  port. C.f. http://www.postgresql.org/message-id/4351.1336927207@sss.pgh.pa.us

- Andres

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: RFC: replace pg_stat_activity.waiting with something more descriptive
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [BUGS] pgbench -C -M prepared gives an error