Обсуждение: Weird glitches for N messages sent to list simultaneously

Поиск
Список
Период
Сортировка

Weird glitches for N messages sent to list simultaneously

От
Tom Lane
Дата:
I've been noticing this for quite awhile: if I commit to all active
branches, that causes N (currently 6) messages to be sent to
pgsql-committers at once.  Very often, only N-1 or N-2 of those messages
come back promptly; the rest are delayed for periods of minutes.
(It seems like they will come through if someone else posts a message
to some PG list, though I wouldn't want to swear there is a connection.
Otherwise it can take quite a few minutes.)

I noticed this just now with respect to f64340e7436d0f84 et al, and
thought to look at the mail archives page for pgsql-committers, where
I see just four messages, same as what came back to me.  So it's not
just me; there is something odd going on in message submission.
Anybody care to speculate what?

[ Waits a bit before posting ... the missing two messages did show up,
8 minutes after the first. ]
        regards, tom lane



Re: Weird glitches for N messages sent to list simultaneously

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> I've been noticing this for quite awhile: if I commit to all active
> branches, that causes N (currently 6) messages to be sent to
> pgsql-committers at once.  Very often, only N-1 or N-2 of those messages
> come back promptly; the rest are delayed for periods of minutes.
> (It seems like they will come through if someone else posts a message
> to some PG list, though I wouldn't want to swear there is a connection.
> Otherwise it can take quite a few minutes.)
>
> I noticed this just now with respect to f64340e7436d0f84 et al, and
> thought to look at the mail archives page for pgsql-committers, where
> I see just four messages, same as what came back to me.  So it's not
> just me; there is something odd going on in message submission.
> Anybody care to speculate what?

Unfortunately, this appears to be "something odd happening in
Majordomo."  I've checked the exim logs, and the messages are all
delivered to majordomo at the same time, and I see them show up in mj2's
"enqueue" debug log all at the same time.

From the exim logs, the outbound messages for the delayed ones are
received by exim after the delay, so it's clearly something in mj2 land.

What it looks like is that, when a message comes in, mj2 does a queue
run, which will happily process everything in the queue that it sees
when it starts up, but if messages come in while the queue is being run
they seem to get missed until the next queue run, which isn't happening
til the next email arrives.

I've got basically no idea why it isn't immediately re-running the
queue when it finishes, or why there isn't some other process to kick it
when there's messages in the queue to be processed..

Looks like Alvaro is taking a look at it also, perhaps he'll have more
insight into what's happening inside mj2.

Thanks!

Stephen

Re: Weird glitches for N messages sent to list simultaneously

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> I've been noticing this for quite awhile: if I commit to all active
> branches, that causes N (currently 6) messages to be sent to
> pgsql-committers at once.  Very often, only N-1 or N-2 of those messages
> come back promptly; the rest are delayed for periods of minutes.
> (It seems like they will come through if someone else posts a message
> to some PG list, though I wouldn't want to swear there is a connection.
> Otherwise it can take quite a few minutes.)
> 
> I noticed this just now with respect to f64340e7436d0f84 et al, and
> thought to look at the mail archives page for pgsql-committers, where
> I see just four messages, same as what came back to me.  So it's not
> just me; there is something odd going on in message submission.
> Anybody care to speculate what?

I looked into the exim logs and all the servers (gemulon the git master,
mahout an intermediate step, magus/makus the SMTP dispatchers) resend
the messages quite promptly -- the 6 messages are in the hands of Malur
(the majordomo2 server) in about 7 seconds total.  Malur accepts the
first 3 messages within 1 second, and the the other three take about 4
seconds total to accept.  From there, malur seems to take its sweet time
to deliver the messages, without any trace to explain the delay.  I
checked the system load munin charts too, and it doesn't appear that
anything goes up the roof after those pgsql-committers postings.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Weird glitches for N messages sent to list simultaneously

От
Alvaro Herrera
Дата:
Stephen Frost wrote:

> What it looks like is that, when a message comes in, mj2 does a queue
> run, which will happily process everything in the queue that it sees
> when it starts up, but if messages come in while the queue is being run
> they seem to get missed until the next queue run, which isn't happening
> til the next email arrives.
> 
> I've got basically no idea why it isn't immediately re-running the
> queue when it finishes, or why there isn't some other process to kick it
> when there's messages in the queue to be processed..
> 
> Looks like Alvaro is taking a look at it also, perhaps he'll have more
> insight into what's happening inside mj2.

Thanks for the vote of confidence, but I don't actually know what is
happening inside Mj2 because the trace logs are essentially useless.
The theory of the race condition sounds like a good guess.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services