Обсуждение: WALWriter active during recovery
Currently, WALReceiver writes and fsyncs data it receives. Clearly, while we are waiting for an fsync we aren't doing any other useful work. Following patch starts WALWriter during recovery and makes it responsible for fsyncing data, allowing WALReceiver to progress other useful actions. At present this is a WIP patch, for code comments only. Don't bother with anything other than code questions at this stage. Implementation questions are * How should we wake WALReceiver, since it waits on a poll(). Should we use SIGUSR1, which is already used for latch waits, or another signal? * Should we introduce some pacing delays if the WALreceiver gets too far ahead of apply? * Other questions you may have? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
Hi, On 2014-12-15 18:51:44 +0000, Simon Riggs wrote: > Currently, WALReceiver writes and fsyncs data it receives. Clearly, > while we are waiting for an fsync we aren't doing any other useful > work. Well, it can still buffer data on the network level, but there's definitely limits to that. So I can see this as being useful. > Following patch starts WALWriter during recovery and makes it > responsible for fsyncing data, allowing WALReceiver to progress other > useful actions. > > At present this is a WIP patch, for code comments only. Don't bother > with anything other than code questions at this stage. > > Implementation questions are > > * How should we wake WALReceiver, since it waits on a poll(). Should > we use SIGUSR1, which is already used for latch waits, or another > signal? It's not entirely trivial, but also not hard, to make it use the latch code for waiting. It'd probably end up requiring less code because then we could just scratch libqpwalreceiver.c:libpq_select(). > * Should we introduce some pacing delays if the WALreceiver gets too > far ahead of apply? Hm. Why don't we simply start fsyncing in the receiver itself at regular intervals? If already synced that's cheap, if not, it'll pace us. Greetings, Andres Freund
On 12/15/2014 08:51 PM, Simon Riggs wrote: > Currently, WALReceiver writes and fsyncs data it receives. Clearly, > while we are waiting for an fsync we aren't doing any other useful > work. > > Following patch starts WALWriter during recovery and makes it > responsible for fsyncing data, allowing WALReceiver to progress other > useful actions. What other useful actions can WAL receiver do while it's waiting? It doesn't do much else than receive WAL, and fsync it to disk. - Heikki
On 2014-12-16 16:12:40 +0200, Heikki Linnakangas wrote: > On 12/15/2014 08:51 PM, Simon Riggs wrote: > >Currently, WALReceiver writes and fsyncs data it receives. Clearly, > >while we are waiting for an fsync we aren't doing any other useful > >work. > > > >Following patch starts WALWriter during recovery and makes it > >responsible for fsyncing data, allowing WALReceiver to progress other > >useful actions. > > What other useful actions can WAL receiver do while it's waiting? It doesn't > do much else than receive WAL, and fsync it to disk. It can actually receive further data from the network and write it to disk? On a relatively low latency network the buffers aren't that large. Right now we generate quite a bursty IO pattern with the disks alternating between idle and fully busy. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 16 December 2014 at 14:12, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 12/15/2014 08:51 PM, Simon Riggs wrote: >> >> Currently, WALReceiver writes and fsyncs data it receives. Clearly, >> while we are waiting for an fsync we aren't doing any other useful >> work. >> >> Following patch starts WALWriter during recovery and makes it >> responsible for fsyncing data, allowing WALReceiver to progress other >> useful actions. > > > What other useful actions can WAL receiver do while it's waiting? It doesn't > do much else than receive WAL, and fsync it to disk. So now it will only need to do one of those two things. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Hi, On Tue, Dec 16, 2014 at 6:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 16 December 2014 at 14:12, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> On 12/15/2014 08:51 PM, Simon Riggs wrote: >>> >>> Currently, WALReceiver writes and fsyncs data it receives. Clearly, >>> while we are waiting for an fsync we aren't doing any other useful >>> work. >>> >>> Following patch starts WALWriter during recovery and makes it >>> responsible for fsyncing data, allowing WALReceiver to progress other >>> useful actions. On many Linux systems it may not do that much (2.6.32 and 3.2 are bad, 3.13 is better but still it slows the fsync). If there's a fsync in progress WALReceiver will: 1- slow the fsync because its writes to the same file are grabbed by the fsync 2- stall until the end of fsync. from 'stracing' a test program simulating this pattern: two processes, one writes to a file the second fsync it. 20279 11:51:24.037108 fsync(5 <unfinished ...> 20278 11:51:24.053524 <... nanosleep resumed> NULL) = 0 <0.020281> 20278 11:51:24.053691 lseek(3, 1383612416, SEEK_SET) = 1383612416 <0.000119> 20278 11:51:24.053965 write(3, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 8192) = 8192 <0.000111> 20278 11:51:24.054190 nanosleep({0, 20000000}, NULL) = 0 <0.020243> .... 20278 11:51:24.404386 lseek(3, 194772992, SEEK_SET <unfinished ...> 20279 11:51:24.754123 <... fsync resumed> ) = 0 <0.716971> 20279 11:51:24.754202 close(5 <unfinished ...> 20278 11:51:24.754232 <... lseek resumed> ) = 194772992 <0.349825> Yes that's a 300ms lseek... >> >> >> What other useful actions can WAL receiver do while it's waiting? It doesn't >> do much else than receive WAL, and fsync it to disk. > > So now it will only need to do one of those two things. > Regards Didier
On 17 December 2014 at 11:27, didier <did447@gmail.com> wrote: > If there's a fsync in progress WALReceiver will: > 1- slow the fsync because its writes to the same file are grabbed by the fsync > 2- stall until the end of fsync. PostgreSQL already fsyncs files while they are being written to. Are you saying we should stop doing that? It would be possible to synchronize processes so that we don't write to a file while it is being fsynced. fsyncs are also made once the whole 16MB has been written, so in those cases there is no simultaneous action. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
didier wrote: > On many Linux systems it may not do that much (2.6.32 and 3.2 are bad, > 3.13 is better but still it slows the fsync). > > If there's a fsync in progress WALReceiver will: > 1- slow the fsync because its writes to the same file are grabbed by the fsync > 2- stall until the end of fsync. Is this behavior filesystem-dependent? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi On Wed, Dec 17, 2014 at 2:39 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > didier wrote: > >> On many Linux systems it may not do that much (2.6.32 and 3.2 are bad, >> 3.13 is better but still it slows the fsync). >> >> If there's a fsync in progress WALReceiver will: >> 1- slow the fsync because its writes to the same file are grabbed by the fsync >> 2- stall until the end of fsync. > > Is this behavior filesystem-dependent? I don't know. I only tested ext4 Attach the trivial code I used, there's a lot of junk in it. Didier
Вложения
On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Currently, WALReceiver writes and fsyncs data it receives. Clearly, > while we are waiting for an fsync we aren't doing any other useful > work. > > Following patch starts WALWriter during recovery and makes it > responsible for fsyncing data, allowing WALReceiver to progress other > useful actions. +1 > At present this is a WIP patch, for code comments only. Don't bother > with anything other than code questions at this stage. > > Implementation questions are > > * How should we wake WALReceiver, since it waits on a poll(). Should > we use SIGUSR1, which is already used for latch waits, or another > signal? Probably we need to change libpqwalreceiver so that it uses the latch. This is useful even for the startup process to report the replay location to the walreceiver in real time. > * Should we introduce some pacing delays if the WALreceiver gets too > far ahead of apply? I don't think so for now. Instead, we can support synchronous_commit = replay, and the users can use that new mode if they are worried about the delay of WAL replay. > * Other questions you may have? Who should wake the startup process so that it reads and replays the WAL data? Current walreceiver. But if walwriter is responsible for fsyncing WAL data, probably walwriter should do that. Because the startup process should not replay the WAL data which has not been fsync'd yet. Regards, -- Fujii Masao
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao<span dir="ltr"><<a href="mailto:masao.fujii@gmail.com" target="_blank">masao.fujii@gmail.com</a>></span> wrote:<br/><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><spanclass="">On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs <<a href="mailto:simon@2ndquadrant.com">simon@2ndquadrant.com</a>>wrote:<br /> > Currently, WALReceiver writes and fsyncsdata it receives. Clearly,<br /> > while we are waiting for an fsync we aren't doing any other useful<br /> >work.<br /> ><br /> > Following patch starts WALWriter during recovery and makes it<br /> > responsible forfsyncing data, allowing WALReceiver to progress other<br /> > useful actions.<br /><br /></span>+1<br /><span class=""><br/> > At present this is a WIP patch, for code comments only. Don't bother<br /> > with anything other thancode questions at this stage.<br /> ><br /> > Implementation questions are<br /> ><br /> > * How should wewake WALReceiver, since it waits on a poll(). Should<br /> > we use SIGUSR1, which is already used for latch waits,or another<br /> > signal?<br /><br /></span>Probably we need to change libpqwalreceiver so that it uses the latch.<br/> This is useful even for the startup process to report the replay location to<br /> the walreceiver in real time.<br/><span class=""><br /> > * Should we introduce some pacing delays if the WALreceiver gets too<br /> > farahead of apply?<br /><br /></span>I don't think so for now. Instead, we can support synchronous_commit = replay,<br />and the users can use that new mode if they are worried about the delay of<br /> WAL replay.<br /><span class=""><br />> * Other questions you may have?<br /><br /></span>Who should wake the startup process so that it reads and replaysthe WAL data?<br /> Current walreceiver. But if walwriter is responsible for fsyncing WAL data,<br /> probably walwritershould do that. Because the startup process should not replay<br /> the WAL data which has not been fsync'd yet.<br/></blockquote></div><br />Moved this patch to CF 2015-02 to not lose track of it and because it did not get any reviews.<br/>-- <br /><div class="gmail_signature">Michael<br /></div></div></div>
On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Currently, WALReceiver writes and fsyncs data it receives. Clearly, >> while we are waiting for an fsync we aren't doing any other useful >> work. >> >> Following patch starts WALWriter during recovery and makes it >> responsible for fsyncing data, allowing WALReceiver to progress other >> useful actions. With the patch, replication didn't work fine in my machine. I started the standby server after removing all the WAL files from the standby. ISTM that the patch doesn't handle that case. That is, in that case, the standby tries to start up walreceiver and replication to retrieve the REDO-starting checkpoint record *before* starting up walwriter (IOW, before reaching the consistent point). Then since walreceiver works without walwriter, no received WAL data cannot be fsync'd in the standby. So replication cannot advance furthermore. I think that walwriter needs to start before walreceiver starts. I just marked this patch as Waiting on Author. Regards, -- Fujii Masao
On Thu, Mar 5, 2015 at 5:22 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> Currently, WALReceiver writes and fsyncs data it receives. Clearly, >>> while we are waiting for an fsync we aren't doing any other useful >>> work. >>> >>> Following patch starts WALWriter during recovery and makes it >>> responsible for fsyncing data, allowing WALReceiver to progress other >>> useful actions. > > With the patch, replication didn't work fine in my machine. I started > the standby server after removing all the WAL files from the standby. > ISTM that the patch doesn't handle that case. That is, in that case, > the standby tries to start up walreceiver and replication to retrieve > the REDO-starting checkpoint record *before* starting up walwriter > (IOW, before reaching the consistent point). Then since walreceiver works > without walwriter, no received WAL data cannot be fsync'd in the standby. > So replication cannot advance furthermore. I think that walwriter needs > to start before walreceiver starts. > > I just marked this patch as Waiting on Author. This patch was moved to current CF with the status "Needs review". But there are already some review comments which have not been addressed yet, so I marked the patch as "Waiting on Author" again. Regards, -- Fujii Masao
On 2 July 2015 at 14:31, Fujii Masao <masao.fujii@gmail.com> wrote:
--
On Thu, Mar 5, 2015 at 5:22 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Currently, WALReceiver writes and fsyncs data it receives. Clearly,
>>> while we are waiting for an fsync we aren't doing any other useful
>>> work.
>>>
>>> Following patch starts WALWriter during recovery and makes it
>>> responsible for fsyncing data, allowing WALReceiver to progress other
>>> useful actions.
>
> With the patch, replication didn't work fine in my machine. I started
> the standby server after removing all the WAL files from the standby.
> ISTM that the patch doesn't handle that case. That is, in that case,
> the standby tries to start up walreceiver and replication to retrieve
> the REDO-starting checkpoint record *before* starting up walwriter
> (IOW, before reaching the consistent point). Then since walreceiver works
> without walwriter, no received WAL data cannot be fsync'd in the standby.
> So replication cannot advance furthermore. I think that walwriter needs
> to start before walreceiver starts.
>
> I just marked this patch as Waiting on Author.
This patch was moved to current CF with the status "Needs review".
But there are already some review comments which have not been addressed yet,
so I marked the patch as "Waiting on Author" again.
This was pushed back from last CF and I haven't worked on it at all, nor will I.
Pushing back again.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2015-07-02 14:34:48 +0100, Simon Riggs wrote: > This was pushed back from last CF and I haven't worked on it at all, nor > will I. > > Pushing back again. Let's "return with feedback", not " move", it then.. Moving a entries along which aren't expected to receive updates anytime soon isn't a good idea, there's more than enough entries each CF.
On 2 July 2015 at 14:38, Andres Freund <andres@anarazel.de> wrote:
Although I agree, the interface won't let me do that, so will leave as-is.
--
On 2015-07-02 14:34:48 +0100, Simon Riggs wrote:
> This was pushed back from last CF and I haven't worked on it at all, nor
> will I.
>
> Pushing back again.
Let's "return with feedback", not " move", it then.. Moving a entries
along which aren't expected to receive updates anytime soon isn't a good
idea, there's more than enough entries each CF.
Although I agree, the interface won't let me do that, so will leave as-is.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services