Greg Smith <gsmith@gregsmith.com> writes:
> On Fri, 8 Dec 2006, Takayuki Tsunakawa wrote:
>> Though I'm not sure, isn't it the key to use O_SYNC so that write()s
>> transfer data to disk?
> If disk writes near checkpoint time aren't happening fast enough now, I
> doubt forcing a sync after every write will make that better.
I think the idea would be to force the writes to actually occur, rather
than just being scheduled (and then forced en-masse by an fsync at
checkpoint time). Since the point of the bgwriter is to try to force
writes to occur *outside* checkpoint times, this seems to make sense.
I share your doubts about the value of slowing down checkpoints --- but
to the extent that bgwriter-issued writes are delayed by the kernel
until the next checkpoint, we are certainly not getting the desired
effect of leveling the write load.
>> To decrease the count of I/O, pages adjacent on disk that
>> are also adjacent on memory must be written with one write().
> Sorting out which pages are next to one another on disk is one of the jobs
> the file system cache does; bypassing it will then make all that
> complicated sorting logic the job of the database engine.
Indeed --- the knowledge that we don't know the physical layout has
always been the strongest argument against using O_SYNC in this way.
But I don't think anyone's made any serious tests. A properly tuned
bgwriter should be eating only a "background" level of I/O effort
between checkpoints, so maybe it doesn't matter too much if it's not
optimally scheduled.
regards, tom lane