On Tue, Jan 12, 2016 at 7:24 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-01-12 19:17:49 +0530, Amit Kapila wrote: > > Why can't we do it at larger intervals (relative to total amount of writes)? > > To explain, what I have in mind, let us assume that checkpoint interval > > is longer (10 mins) and in the mean time all the writes are being done > > by bgwriter > > But that's not the scenario with the regression here, so I'm not sure > why you're bringing it up? > > And if we're flushing significant portion of the writes, how does that > avoid the performance problem pointed out two messages upthread? Where > sorting leads to flushing highly contended buffers together, leading to > excessive wal flushing? >
I think it will avoid that problem, because what I am telling is not-to-sort
the buffers before writing, rather sort the flush requests. If I remember
correctly, the initial patch of Fabien doesn't have sorting at the buffer
level, but still he is able to see the benefits in many cases.
> > But more importantly, unless you also want to delay the writes > themselves, leaving that many dirty buffers in the kernel page cache > will bring back exactly the type of stalls (where the kernel flushes all > the pending dirty data in a short amount of time) we're trying to avoid > with the forced flushing. So doing flushes in a large patches is > something we really fundamentally do *not* want! >
Could it be because random I/O?
> > which it registers in shared memory so that later checkpoint > > can perform corresponding fsync's, now when the request queue > > becomes threshhold size (let us say 1/3rd) full, then we can perform > > sorting and merging and issue flush hints. > > Which means that a significant portion of the writes won't be able to be > collapsed, since only a random 1/3 of the buffers is sorted together. > > > > Basically, I think this can lead to lesser merging of neighbouring > > writes, but might not hurt if sync_file_range() API is cheap. > > The cost of writing out data doess correspond heavily with the number of > random writes - which is what you get if you reduce the number of > neighbouring writes. >
Yeah, thats right, but I am not sure how much difference it would
create if sorting everything at one short versus if we do that in
batches. In anycase, I am just trying to think out loud to see if we
can find some solution to the regression you have seen above
without disabling sorting altogether for certain cases.