On Sat, Nov 15, 2014 at 3:42 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-11-15 03:25:16 +0900, Fujii Masao wrote:
>> On Fri, Nov 14, 2014 at 7:22 PM, <furuyao@pm.nttdata.co.jp> wrote:
>> > "pg_ctl stop" does't work propley, if --slot option is specified when WAL is flushed only it has switched.
>> > These processes still continue even after the posmaster failed:pg_receivexlog, walsender and logger.
>>
>> I could reproduce this problem. At normal shutdown, walsender keeps waiting
>> for the last WAL record to be replicated and flushed in pg_receivexlog. But
>> pg_receivexlog issues sync command only when WAL file is switched. Thus,
>> since pg_receivexlog may never flush the last WAL record, walsender may have
>> to keep waiting infinitely.
>
> Right.
It is surprising that nobody complained about that before,
pg_receivexlog has been released two years ago.
>> pg_recvlogical handles this problem by calling fsync() when it receives the
>> request of immediate reply from the server. That is, at shutdown, walsender
>> sends the request, pg_receivexlog receives it, flushes the last WAL record,
>> and sends the flush location back to the server. Since walsender can see that
>> the last WAL record is successfully flushed in pg_receivexlog, it can
>> exit cleanly.
>>
>> One idea to the problem is to introduce the same logic as pg_recvlogical has,
>> to pg_receivexlog. Thought?
>
> Sounds sane to me. Are you looking into doing that?
Yep, sounds a good thing to do if master requested answer from the
client in the keepalive message. Something like the patch attached
would make the deal.
--
Michael