Обсуждение: hot_standby_feedback doesn't work on busy servers in 9.3+

Поиск
Список
Период
Сортировка

hot_standby_feedback doesn't work on busy servers in 9.3+

От
Andres Freund
Дата:
Hello,

In 9.3+, when the primary is busy hot_standby_feedback doesn't properly
work with symptoms like a) spurious query cancels because no feedback
messages have been sent yet, b) bloat on the primary because we don't
regularly send feedback leading to stale xmins on the primary.

The relevant code is in walreceiver.c's WalReceiverMain():

len = walrcv_receive(NAPTIME_PER_CYCLE, &buf);
if (len != 0)
{
...
    /* Let the master know that we received some data. */
    XLogWalRcvSendReply(false, false);

    /*
     * If we've written some records, flush them to disk and
     * let the startup process and primary server know about
     * them.
     */
    XLogWalRcvFlush(false);
}
else
{
...
    XLogWalRcvSendReply(requestReply, requestReply);
    XLogWalRcvSendHSFeedback(false);
}

So, when the connection always has data, we'll not send feedback. That's
pretty easy to demonstrate by running pgbench -jc 8 or so against the
primary.

Looking into this I also noticed that the busy path is odd, because a)
why are we sending a reply before flushing things to disk? b)
XLogWalRcvFlush() will do it's own XLogWalRcvSendReply().

To a good part that seems to have been introduced in
6f60fdd7015b032bf49273c99f80913d57eac284.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: hot_standby_feedback doesn't work on busy servers in 9.3+

От
Amit Kapila
Дата:
On Wed, Jan 15, 2014 at 6:31 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> Hello,
>
> In 9.3+, when the primary is busy hot_standby_feedback doesn't properly
> work with symptoms like a) spurious query cancels because no feedback
> messages have been sent yet, b) bloat on the primary because we don't
> regularly send feedback leading to stale xmins on the primary.
>
> The relevant code is in walreceiver.c's WalReceiverMain():
>
> len = walrcv_receive(NAPTIME_PER_CYCLE, &buf);
> if (len != 0)
> {
> ...
>     /* Let the master know that we received some data. */
>     XLogWalRcvSendReply(false, false);
>
>     /*
>      * If we've written some records, flush them to disk and
>      * let the startup process and primary server know about
>      * them.
>      */
>     XLogWalRcvFlush(false);
> }
> else
> {
> ...
>     XLogWalRcvSendReply(requestReply, requestReply);
>     XLogWalRcvSendHSFeedback(false);
> }
>
> So, when the connection always has data, we'll not send feedback. That's
> pretty easy to demonstrate by running pgbench -jc 8 or so against the
> primary.
>
> Looking into this I also noticed that the busy path is odd, because a)
> why are we sending a reply before flushing things to disk? b)
> XLogWalRcvFlush() will do it's own XLogWalRcvSendReply().

   I think call to reply in XLogWalRcvFlush() might not actually send
   reply because of time difference of last Reply message which we
   sent before flush call.

   The only point that occurs to me for having such a code is that
   incase flush call fails due to disk space or some other such issue,
   it can atleast send the correct write position to primary.

>
> To a good part that seems to have been introduced in
> 6f60fdd7015b032bf49273c99f80913d57eac284.

   I don't see that above commit has introduced this behaviour, as
   the code in question seems to be there without commit as well,
   means that HS feedback is only sent when we don't receive data in WAL
   receiver.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: hot_standby_feedback doesn't work on busy servers in 9.3+

От
Andres Freund
Дата:
Hi,

On 2014-01-16 10:26:51 +0530, Amit Kapila wrote:
> > Looking into this I also noticed that the busy path is odd, because a)
> > why are we sending a reply before flushing things to disk? b)
> > XLogWalRcvFlush() will do it's own XLogWalRcvSendReply().
>
>    I think call to reply in XLogWalRcvFlush() might not actually send
>    reply because of time difference of last Reply message which we
>    sent before flush call.

Yes, but most of the time that will lead to only outdated data to be
sent since that's the first call. Hardly a severe issue, odd nonetheless.

>    The only point that occurs to me for having such a code is that
>    incase flush call fails due to disk space or some other such issue,
>    it can atleast send the correct write position to primary.

Unconvinced.

That's really a separate issue though.

> > To a good part that seems to have been introduced in
> > 6f60fdd7015b032bf49273c99f80913d57eac284.
>
>    I don't see that above commit has introduced this behaviour, as
>    the code in question seems to be there without commit as well,
>    means that HS feedback is only sent when we don't receive data in WAL
>    receiver.

Part of that commit is the following hunk:

@@ -609,19 +665,25 @@ XLogWalRcvFlush(bool dying)

        /* Also let the master know that we made some progress */
        if (!dying)
-       {
-           XLogWalRcvSendReply();
-           XLogWalRcvSendHSFeedback();
-       }
+           XLogWalRcvSendReply(false, false);
    }
 }

Before that commit XLogWalRcvFlush, which is in the "busy" path, sent
hot standby feedback. After that only the idle path does so.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: hot_standby_feedback doesn't work on busy servers in 9.3+

От
Amit Kapila
Дата:
On Thu, Jan 16, 2014 at 2:14 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-01-16 10:26:51 +0530, Amit Kapila wrote:
>
>> > To a good part that seems to have been introduced in
>> > 6f60fdd7015b032bf49273c99f80913d57eac284.
>>
>>    I don't see that above commit has introduced this behaviour, as
>>    the code in question seems to be there without commit as well,
>>    means that HS feedback is only sent when we don't receive data in WAL
>>    receiver.
>
> Part of that commit is the following hunk:
>
> @@ -609,19 +665,25 @@ XLogWalRcvFlush(bool dying)
>
>         /* Also let the master know that we made some progress */
>         if (!dying)
> -       {
> -           XLogWalRcvSendReply();
> -           XLogWalRcvSendHSFeedback();
> -       }
> +           XLogWalRcvSendReply(false, false);
>     }
>  }
>
> Before that commit XLogWalRcvFlush, which is in the "busy" path, sent
> hot standby feedback. After that only the idle path does so.

Okay, Thanks for pointing out.
This is clearly a problem and I think that part of patch should be reverted.
I have figured out how this has happened, actually during the patch
(Improve replication connection timeouts) development, there was an idea
to merge Reply and Feedback messages during which this call got removed
but later on that doesn't appear to be good way of proceeding, so we decide
not merge the messages. While kicking out merge of messages part, it seems
to me that we forgot to add this back.

Heikki, do you remember any other reason for this problem?
I had referred below thread and the mails following it to find this out:
http://www.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C382853645A@szxeml509-mbs

Attached patch should fix this problem.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: hot_standby_feedback doesn't work on busy servers in 9.3+

От
Heikki Linnakangas
Дата:
On 01/16/2014 12:34 PM, Amit Kapila wrote:
> On Thu, Jan 16, 2014 at 2:14 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> On 2014-01-16 10:26:51 +0530, Amit Kapila wrote:
>>
>>>> To a good part that seems to have been introduced in
>>>> 6f60fdd7015b032bf49273c99f80913d57eac284.
>>>
>>>     I don't see that above commit has introduced this behaviour, as
>>>     the code in question seems to be there without commit as well,
>>>     means that HS feedback is only sent when we don't receive data in WAL
>>>     receiver.
>>
>> Part of that commit is the following hunk:
>>
>> @@ -609,19 +665,25 @@ XLogWalRcvFlush(bool dying)
>>
>>          /* Also let the master know that we made some progress */
>>          if (!dying)
>> -       {
>> -           XLogWalRcvSendReply();
>> -           XLogWalRcvSendHSFeedback();
>> -       }
>> +           XLogWalRcvSendReply(false, false);
>>      }
>>   }
>>
>> Before that commit XLogWalRcvFlush, which is in the "busy" path, sent
>> hot standby feedback. After that only the idle path does so.
>
> Okay, Thanks for pointing out.
> This is clearly a problem and I think that part of patch should be reverted.
> I have figured out how this has happened, actually during the patch
> (Improve replication connection timeouts) development, there was an idea
> to merge Reply and Feedback messages during which this call got removed
> but later on that doesn't appear to be good way of proceeding, so we decide
> not merge the messages. While kicking out merge of messages part, it seems
> to me that we forgot to add this back.
>
> Heikki, do you remember any other reason for this problem?
> I had referred below thread and the mails following it to find this out:
> http://www.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C382853645A@szxeml509-mbs
>
> Attached patch should fix this problem.

Yeah, that's probably what happened. Applied, thanks!

- Heikki

Re: hot_standby_feedback doesn't work on busy servers in 9.3+

От
Heikki Linnakangas
Дата:
On 01/16/2014 10:44 AM, Andres Freund wrote:
> On 2014-01-16 10:26:51 +0530, Amit Kapila wrote:
>>> Looking into this I also noticed that the busy path is odd, because a)
>>> why are we sending a reply before flushing things to disk? b)
>>> XLogWalRcvFlush() will do it's own XLogWalRcvSendReply().
>>
>>     I think call to reply in XLogWalRcvFlush() might not actually send
>>     reply because of time difference of last Reply message which we
>>     sent before flush call.
>
> Yes, but most of the time that will lead to only outdated data to be
> sent since that's the first call. Hardly a severe issue, odd nonetheless.
>
>>     The only point that occurs to me for having such a code is that
>>     incase flush call fails due to disk space or some other such issue,
>>     it can atleast send the correct write position to primary.
>
> Unconvinced.

The reply message contains a pointers for how far the WAL has been
applied, written, and flushed. There can be a significant delay between
the write and flush steps, so we send a separate reply after writing,
and after flushing. (if we didn't, the flush and write pointers sent to
the master would always be the equal).

- Heikki

Re: hot_standby_feedback doesn't work on busy servers in 9.3+

От
Andres Freund
Дата:
On 2014-01-16 23:24:13 +0200, Heikki Linnakangas wrote:
> On 01/16/2014 10:44 AM, Andres Freund wrote:
> >On 2014-01-16 10:26:51 +0530, Amit Kapila wrote:
> >>>Looking into this I also noticed that the busy path is odd, because a)
> >>>why are we sending a reply before flushing things to disk? b)
> >>>XLogWalRcvFlush() will do it's own XLogWalRcvSendReply().
> >>
> >>    I think call to reply in XLogWalRcvFlush() might not actually send
> >>    reply because of time difference of last Reply message which we
> >>    sent before flush call.
> >
> >Yes, but most of the time that will lead to only outdated data to be
> >sent since that's the first call. Hardly a severe issue, odd nonetheless.
> >
> >>    The only point that occurs to me for having such a code is that
> >>    incase flush call fails due to disk space or some other such issue,
> >>    it can atleast send the correct write position to primary.
> >
> >Unconvinced.
>
> The reply message contains a pointers for how far the WAL has been applied,
> written, and flushed. There can be a significant delay between the write and
> flush steps, so we send a separate reply after writing, and after flushing.
> (if we didn't, the flush and write pointers sent to the master would always
> be the equal).

If we'd always send a message, I'd be convinced by that argument, but
we're only sending messages after a timeout. This means that if syncrep
is on, we will more frequently have explicitly ask the receiver to send
a confirmation since the reply before the flush will have reset the
timers causing the reply after the flush to only infrequently send
something.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: hot_standby_feedback doesn't work on busy servers in 9.3+

От
Heikki Linnakangas
Дата:
On 01/16/2014 11:55 PM, Andres Freund wrote:
> On 2014-01-16 23:24:13 +0200, Heikki Linnakangas wrote:
>> The reply message contains a pointers for how far the WAL has been applied,
>> written, and flushed. There can be a significant delay between the write and
>> flush steps, so we send a separate reply after writing, and after flushing.
>> (if we didn't, the flush and write pointers sent to the master would always
>> be the equal).
>
> If we'd always send a message, I'd be convinced by that argument, but
> we're only sending messages after a timeout. This means that if syncrep
> is on, we will more frequently have explicitly ask the receiver to send
> a confirmation since the reply before the flush will have reset the
> timers causing the reply after the flush to only infrequently send
> something.

No, XLogWalRcvSendReply also sends the reply if the flush or write
pointers have advanced since last reply, even if the timeout hasn't expired.

- Heikki

Re: hot_standby_feedback doesn't work on busy servers in 9.3+

От
Andres Freund
Дата:
On 2014-01-17 10:50:29 +0200, Heikki Linnakangas wrote:
> On 01/16/2014 11:55 PM, Andres Freund wrote:
> >On 2014-01-16 23:24:13 +0200, Heikki Linnakangas wrote:
> >>The reply message contains a pointers for how far the WAL has been applied,
> >>written, and flushed. There can be a significant delay between the write and
> >>flush steps, so we send a separate reply after writing, and after flushing.
> >>(if we didn't, the flush and write pointers sent to the master would always
> >>be the equal).
> >
> >If we'd always send a message, I'd be convinced by that argument, but
> >we're only sending messages after a timeout. This means that if syncrep
> >is on, we will more frequently have explicitly ask the receiver to send
> >a confirmation since the reply before the flush will have reset the
> >timers causing the reply after the flush to only infrequently send
> >something.
>
> No, XLogWalRcvSendReply also sends the reply if the flush or write pointers
> have advanced since last reply, even if the timeout hasn't expired.

Uh. Yes. I mis-remembered that we weren't doing that, but that's just
the apply value...

Sorry for the noise,
'
Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services