Обсуждение: Separating bgwriter and checkpointer

Поиск
Список
Период
Сортировка

Separating bgwriter and checkpointer

От
Simon Riggs
Дата:
As discussed previously...

Currently the bgwriter process performs both background writing,
checkpointing and some other duties. This means that we can't perform
the final checkpoint fsync without stopping background writing, so
there is a negative performance effect from doing both things in one
process.

Additionally, our aim in 9.2 is to replace polling loops with latches
for power reduction. The complexity of the bgwriter loops is high and
it seems unlikely to come up with a clean approach using latches.

This patch splits bgwriter into 2 processes: checkpointer and
bgwriter, seeking to avoid contentious changes. Additional changes are
expected in this release to build upon these changes for both new
processes, though this patch stands on its own as both a performance
vehicle and in some ways a refcatoring to simplify the code.

Checkpointer does the important things, "new bgwriter" just does
background writing and so is much less important than before.

Current patch has a bug at shutdown I've not located yet, but seems
likely is a simple error. That is mainly because for personal reasons
I've not been able to work on the patch recently. I expect to be able
to fix that later in the CF.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Separating bgwriter and checkpointer

От
Fujii Masao
Дата:
On Fri, Sep 16, 2011 at 7:53 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> This patch splits bgwriter into 2 processes: checkpointer and
> bgwriter, seeking to avoid contentious changes. Additional changes are
> expected in this release to build upon these changes for both new
> processes, though this patch stands on its own as both a performance
> vehicle and in some ways a refcatoring to simplify the code.

I like this idea to simplify the code. How much performance gain can we
expect by this patch?

> Current patch has a bug at shutdown I've not located yet, but seems
> likely is a simple error. That is mainly because for personal reasons
> I've not been able to work on the patch recently. I expect to be able
> to fix that later in the CF.

You seem to have forgotten to include checkpointor.c and .h in the patch.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Separating bgwriter and checkpointer

От
Simon Riggs
Дата:
On Fri, Sep 16, 2011 at 2:38 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Sep 16, 2011 at 7:53 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> This patch splits bgwriter into 2 processes: checkpointer and
>> bgwriter, seeking to avoid contentious changes. Additional changes are
>> expected in this release to build upon these changes for both new
>> processes, though this patch stands on its own as both a performance
>> vehicle and in some ways a refcatoring to simplify the code.
>
> I like this idea to simplify the code. How much performance gain can we
> expect by this patch?

On heavily I/O bound systems, this is likely to make a noticeable
difference, since bgwriter reduces I/O in user processes.

The overhead of sending signals between processes is much less than I
had previously thought, so I expect no problems there, even on highly
loaded systems.


>> Current patch has a bug at shutdown I've not located yet, but seems
>> likely is a simple error. That is mainly because for personal reasons
>> I've not been able to work on the patch recently. I expect to be able
>> to fix that later in the CF.
>
> You seem to have forgotten to include checkpointor.c and .h in the patch.

I confirm this error. I'll repost full patch later in the week when I
have more time.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Separating bgwriter and checkpointer

От
Heikki Linnakangas
Дата:
On 20.09.2011 10:48, Simon Riggs wrote:
> On Fri, Sep 16, 2011 at 2:38 AM, Fujii Masao<masao.fujii@gmail.com>  wrote:
>> On Fri, Sep 16, 2011 at 7:53 AM, Simon Riggs<simon@2ndquadrant.com>  wrote:
>>> This patch splits bgwriter into 2 processes: checkpointer and
>>> bgwriter, seeking to avoid contentious changes. Additional changes are
>>> expected in this release to build upon these changes for both new
>>> processes, though this patch stands on its own as both a performance
>>> vehicle and in some ways a refcatoring to simplify the code.
>>
>> I like this idea to simplify the code. How much performance gain can we
>> expect by this patch?
>
> On heavily I/O bound systems, this is likely to make a noticeable
> difference, since bgwriter reduces I/O in user processes.

Hmm. If the system is I/O bound, it doesn't matter which process 
performs the I/O. It's still the same amount of I/O in total, and in an 
I/O bound system, that's what determines the overall throughput.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Separating bgwriter and checkpointer

От
Simon Riggs
Дата:
On Tue, Sep 20, 2011 at 9:06 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 20.09.2011 10:48, Simon Riggs wrote:
>>
>> On Fri, Sep 16, 2011 at 2:38 AM, Fujii Masao<masao.fujii@gmail.com>
>>  wrote:
>>>
>>> On Fri, Sep 16, 2011 at 7:53 AM, Simon Riggs<simon@2ndquadrant.com>
>>>  wrote:
>>>>
>>>> This patch splits bgwriter into 2 processes: checkpointer and
>>>> bgwriter, seeking to avoid contentious changes. Additional changes are
>>>> expected in this release to build upon these changes for both new
>>>> processes, though this patch stands on its own as both a performance
>>>> vehicle and in some ways a refcatoring to simplify the code.
>>>
>>> I like this idea to simplify the code. How much performance gain can we
>>> expect by this patch?
>>
>> On heavily I/O bound systems, this is likely to make a noticeable
>> difference, since bgwriter reduces I/O in user processes.
>
> Hmm. If the system is I/O bound, it doesn't matter which process performs
> the I/O. It's still the same amount of I/O in total, and in an I/O bound
> system, that's what determines the overall throughput.

That's true, but not relevant.

The bgwriter avoids I/O, if it is operating correctly. This patch
ensures it continues to operate even during heavy checkpoints. So it
helps avoid extra I/O during a period of very high I/O activity.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Separating bgwriter and checkpointer

От
Heikki Linnakangas
Дата:
On 20.09.2011 11:18, Simon Riggs wrote:
> The bgwriter avoids I/O, if it is operating correctly. This patch
> ensures it continues to operate even during heavy checkpoints. So it
> helps avoid extra I/O during a period of very high I/O activity.

I don't see what difference it makes which process does the I/O. If a 
write() by checkpointer process blocks, any write()s by the separate 
bgwriter process at that time will block too. If the I/O is not 
saturated, and the checkpoint write()s don't block, then even without 
this patch, the bgwriter process can handle its usual bgwriter duties 
during checkpoint just fine. (And if the I/O is not saturated, it's not 
an I/O bound system anyway.)

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Separating bgwriter and checkpointer

От
Simon Riggs
Дата:
On Tue, Sep 20, 2011 at 10:03 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 20.09.2011 11:18, Simon Riggs wrote:
>>
>> The bgwriter avoids I/O, if it is operating correctly. This patch
>> ensures it continues to operate even during heavy checkpoints. So it
>> helps avoid extra I/O during a period of very high I/O activity.
>
> I don't see what difference it makes which process does the I/O. If a
> write() by checkpointer process blocks, any write()s by the separate
> bgwriter process at that time will block too. If the I/O is not saturated,
> and the checkpoint write()s don't block, then even without this patch, the
> bgwriter process can handle its usual bgwriter duties during checkpoint just
> fine. (And if the I/O is not saturated, it's not an I/O bound system
> anyway.)

Whatever value you assign to the bgwriter, then this patch makes sure
that happens during heavy fsyncs.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Separating bgwriter and checkpointer

От
Greg Stark
Дата:
On Tue, Sep 20, 2011 at 11:03 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> I don't see what difference it makes which process does the I/O. If a
>> write() by checkpointer process blocks, any write()s by the separate
>> bgwriter process at that time will block too. If the I/O is not saturated,
>> and the checkpoint write()s don't block, then even without this patch, the
>> bgwriter process can handle its usual bgwriter duties during checkpoint just
>> fine. (And if the I/O is not saturated, it's not an I/O bound system
>> anyway.)
>
> Whatever value you assign to the bgwriter, then this patch makes sure
> that happens during heavy fsyncs.

I think his point is that it doesn't because if the heavy fsyncs cause
the system to be i/o bound it then bgwriter will just block issuing
the writes instead of the fsyncs.

I'm not actually convinced. Writes will only block if the kernel
decides to block. We don't really know how the kernel makes this
decision but it's entirely possible that having pending physical i/o
issued due to an fsync doesn't influence the decision if there is
still a reasonable number of dirty pages in the buffer cache.  In a
sense, "I/O bound" means different things for write and fsync. Or to
put it another way fsync is latency sensitive but write is only
bandwidth sensitive.

All that said my question is which way is the code more legible and
easier to follow?


Re: Separating bgwriter and checkpointer

От
Heikki Linnakangas
Дата:
On 20.09.2011 16:29, Greg Stark wrote:
> On Tue, Sep 20, 2011 at 11:03 AM, Simon Riggs<simon@2ndquadrant.com>  wrote:
>>> I don't see what difference it makes which process does the I/O. If a
>>> write() by checkpointer process blocks, any write()s by the separate
>>> bgwriter process at that time will block too. If the I/O is not saturated,
>>> and the checkpoint write()s don't block, then even without this patch, the
>>> bgwriter process can handle its usual bgwriter duties during checkpoint just
>>> fine. (And if the I/O is not saturated, it's not an I/O bound system
>>> anyway.)
>>
>> Whatever value you assign to the bgwriter, then this patch makes sure
>> that happens during heavy fsyncs.
>
> I think his point is that it doesn't because if the heavy fsyncs cause
> the system to be i/o bound it then bgwriter will just block issuing
> the writes instead of the fsyncs.
>
> I'm not actually convinced. Writes will only block if the kernel
> decides to block. We don't really know how the kernel makes this
> decision but it's entirely possible that having pending physical i/o
> issued due to an fsync doesn't influence the decision if there is
> still a reasonable number of dirty pages in the buffer cache.  In a
> sense, "I/O bound" means different things for write and fsync. Or to
> put it another way fsync is latency sensitive but write is only
> bandwidth sensitive.

Yeah, I was thinking of write()s, not fsyncs. I agree this might have 
some effect during fsync phase.

> All that said my question is which way is the code more legible and
> easier to follow?

Hear hear. If we're going to give the bgwriter more responsibilities, 
this might make sense even if it has no effect on performance.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Separating bgwriter and checkpointer

От
Magnus Hagander
Дата:
On Tue, Sep 20, 2011 at 15:35, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 20.09.2011 16:29, Greg Stark wrote:
>>
>> On Tue, Sep 20, 2011 at 11:03 AM, Simon Riggs<simon@2ndquadrant.com>
>>  wrote:
>>>>
>>>> I don't see what difference it makes which process does the I/O. If a
>>>> write() by checkpointer process blocks, any write()s by the separate
>>>> bgwriter process at that time will block too. If the I/O is not
>>>> saturated,
>>>> and the checkpoint write()s don't block, then even without this patch,
>>>> the
>>>> bgwriter process can handle its usual bgwriter duties during checkpoint
>>>> just
>>>> fine. (And if the I/O is not saturated, it's not an I/O bound system
>>>> anyway.)
>>>
>>> Whatever value you assign to the bgwriter, then this patch makes sure
>>> that happens during heavy fsyncs.
>>
>> I think his point is that it doesn't because if the heavy fsyncs cause
>> the system to be i/o bound it then bgwriter will just block issuing
>> the writes instead of the fsyncs.
>>
>> I'm not actually convinced. Writes will only block if the kernel
>> decides to block. We don't really know how the kernel makes this
>> decision but it's entirely possible that having pending physical i/o
>> issued due to an fsync doesn't influence the decision if there is
>> still a reasonable number of dirty pages in the buffer cache.  In a
>> sense, "I/O bound" means different things for write and fsync. Or to
>> put it another way fsync is latency sensitive but write is only
>> bandwidth sensitive.
>
> Yeah, I was thinking of write()s, not fsyncs. I agree this might have some
> effect during fsync phase.
>
>> All that said my question is which way is the code more legible and
>> easier to follow?
>
> Hear hear. If we're going to give the bgwriter more responsibilities, this
> might make sense even if it has no effect on performance.

Isn't there also the advantage of that work put in two different
processes can use two different CPU cores? Or is that likely to never
ever come in play here?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Separating bgwriter and checkpointer

От
Heikki Linnakangas
Дата:
On 20.09.2011 16:49, Magnus Hagander wrote:
> Isn't there also the advantage of that work put in two different
> processes can use two different CPU cores? Or is that likely to never
> ever come in play here?

You would need one helluva I/O system to saturate even a single CPU, 
just by doing write+fsync.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Separating bgwriter and checkpointer

От
Cédric Villemain
Дата:
2011/9/20 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:
> On 20.09.2011 16:49, Magnus Hagander wrote:
>>
>> Isn't there also the advantage of that work put in two different
>> processes can use two different CPU cores? Or is that likely to never
>> ever come in play here?
>
> You would need one helluva I/O system to saturate even a single CPU, just by
> doing write+fsync.

The point of Magnus is valid. There are possible throttling done by
linux per node, per process/task.
Since ..2.6.37 (32 ?) I believe .. there are more temptation to have
have per cgroup io/sec limits, and there exists some promising work
done to have a better IO bandwith throttling per process.

IMO, splitting the type of IO workload per process allows the
administrators to have more control on the IO limits they want to have
(and it may help the kernels() to have a better strategy ?)

>
> --
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


Re: Separating bgwriter and checkpointer

От
Heikki Linnakangas
Дата:
On 20.09.2011 17:31, Cédric Villemain wrote:
> 2011/9/20 Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>:
>> On 20.09.2011 16:49, Magnus Hagander wrote:
>>>
>>> Isn't there also the advantage of that work put in two different
>>> processes can use two different CPU cores? Or is that likely to never
>>> ever come in play here?
>>
>> You would need one helluva I/O system to saturate even a single CPU, just by
>> doing write+fsync.
>
> The point of Magnus is valid. There are possible throttling done by
> linux per node, per process/task.
> Since ..2.6.37 (32 ?) I believe .. there are more temptation to have
> have per cgroup io/sec limits, and there exists some promising work
> done to have a better IO bandwith throttling per process.
>
> IMO, splitting the type of IO workload per process allows the
> administrators to have more control on the IO limits they want to have
> (and it may help the kernels() to have a better strategy ?)

That is a separate issue from being able to use different CPU cores. But 
cool! I didn't know Linux can do that nowadays. That could be highly 
useful, if you can put e.g autovacuum on a different cgroup from regular 
backends.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Separating bgwriter and checkpointer

От
Robert Haas
Дата:
On Tue, Sep 20, 2011 at 9:35 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
>> All that said my question is which way is the code more legible and
>> easier to follow?
>
> Hear hear. If we're going to give the bgwriter more responsibilities, this
> might make sense even if it has no effect on performance.

I agree.  I don't think this change needs to be justified on
performance grounds; there are enough collateral benefits to make it
worthwhile.  If the checkpoint process handles all the stuff with
highly variable latency (i.e. fsyncs), then the background writer work
will happen more regularly and predictably.  The code will also be
simpler, which I think will open up opportunities for additional
optimizations such as (perhaps) making the background writer only wake
up when there are dirty buffers to write, which ties in to
longstanding concerns about power consumption.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Separating bgwriter and checkpointer

От
Marti Raudsepp
Дата:
On Fri, Sep 16, 2011 at 01:53, Simon Riggs <simon@2ndquadrant.com> wrote:
> This patch splits bgwriter into 2 processes: checkpointer and
> bgwriter, seeking to avoid contentious changes. Additional changes are
> expected in this release to build upon these changes for both new
> processes, though this patch stands on its own as both a performance
> vehicle and in some ways a refcatoring to simplify the code.

While you're already splitting up bgwriter, could there be any benefit
to spawning a separate bgwriter process for each tablespace?

If your database has one tablespace on a fast I/O system and another
on a slow one, the slow tablespace would also bog down background
writing for the fast tablespace. But I don't know whether that's
really a problem or not.

Regards,
Marti


Re: Separating bgwriter and checkpointer

От
Robert Haas
Дата:
On Tue, Sep 20, 2011 at 11:01 AM, Marti Raudsepp <marti@juffo.org> wrote:
> On Fri, Sep 16, 2011 at 01:53, Simon Riggs <simon@2ndquadrant.com> wrote:
>> This patch splits bgwriter into 2 processes: checkpointer and
>> bgwriter, seeking to avoid contentious changes. Additional changes are
>> expected in this release to build upon these changes for both new
>> processes, though this patch stands on its own as both a performance
>> vehicle and in some ways a refcatoring to simplify the code.
>
> While you're already splitting up bgwriter, could there be any benefit
> to spawning a separate bgwriter process for each tablespace?
>
> If your database has one tablespace on a fast I/O system and another
> on a slow one, the slow tablespace would also bog down background
> writing for the fast tablespace. But I don't know whether that's
> really a problem or not.

I doubt it.  Most of the time the writes are going to be absorbed by
the OS write cache anyway.

I think there's probably more performance to be squeezed out of the
background writer, but maybe not that exact thing, and in any case it
seems like material for a separate patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Separating bgwriter and checkpointer

От
Greg Smith
Дата:
On 09/20/2011 09:35 AM, Heikki Linnakangas wrote:
> Yeah, I was thinking of write()s, not fsyncs. I agree this might have 
> some effect during fsync phase.

Right; that's where the most serious problems seem to pop up at anyway 
now.  All the testing I did earlier this year suggested Linux at least 
is happy to do a granular fsync, and it can also use things like 
barriers when appropriate to schedule I/O.  The hope here is that the 
background writer work to clean ahead of the strategy point is helpful 
to backends, and that should keep going even during the sync 
phase--which currently doesn't pause for anything else once it's 
started.  The cleaner writes should all queue up into RAM in a lazy way 
rather than block the true I/O, which is being driven by sync calls.

There is some risk here that the cleaner writes happen faster than the 
true rate at which backends really need buffers, since it has a 
predictive component it can be wrong about.  Those could in theory 
result in the write cache filling faster than it would in the current 
environment, such that writes truly block that would have been cached in 
the current code.  If you're that close to the edge though, backends 
should really benefit from the cleaner--that same write done by a client 
would turn into a serious stall.  From that perspective, when things 
have completely filled the write cache, any writes the cleaner can get 
out of the way in advance of when a backend needs it should be the 
biggest win most of the time.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



Re: Separating bgwriter and checkpointer

От
Simon Riggs
Дата:
On Fri, Sep 16, 2011 at 2:38 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Sep 16, 2011 at 7:53 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> This patch splits bgwriter into 2 processes: checkpointer and
>> bgwriter, seeking to avoid contentious changes. Additional changes are
>> expected in this release to build upon these changes for both new
>> processes, though this patch stands on its own as both a performance
>> vehicle and in some ways a refcatoring to simplify the code.
>
> I like this idea to simplify the code. How much performance gain can we
> expect by this patch?
>
>> Current patch has a bug at shutdown I've not located yet, but seems
>> likely is a simple error. That is mainly because for personal reasons
>> I've not been able to work on the patch recently. I expect to be able
>> to fix that later in the CF.
>
> You seem to have forgotten to include checkpointor.c and .h in the patch.

Original patch included here.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Separating bgwriter and checkpointer

От
Simon Riggs
Дата:
On Thu, Sep 15, 2011 at 11:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

> Current patch has a bug at shutdown I've not located yet, but seems
> likely is a simple error. That is mainly because for personal reasons
> I've not been able to work on the patch recently. I expect to be able
> to fix that later in the CF.

Full patch, with bug fixed. (v2)

I'm now free to take review comments and make changes.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Separating bgwriter and checkpointer

От
"Dickson S. Guedes"
Дата:
2011/10/2 Simon Riggs <simon@2ndquadrant.com>:
> On Thu, Sep 15, 2011 at 11:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
>> Current patch has a bug at shutdown I've not located yet, but seems
>> likely is a simple error. That is mainly because for personal reasons
>> I've not been able to work on the patch recently. I expect to be able
>> to fix that later in the CF.
>
> Full patch, with bug fixed. (v2)
>
> I'm now free to take review comments and make changes.


Hi Simon,

I'm trying your patch, it was applied cleanly to master and compiled
ok. But since I started postgres I'm seeing a  99% of CPU usage:

guedes@betelgeuse:/srv/postgres/bgwriter_split$ ps -ef | grep postgres
guedes   14878     1  0 19:37 pts/0    00:00:00
/srv/postgres/bgwriter_split/bin/postgres -D data
guedes   14880 14878  0 19:37 ?        00:00:00 postgres: writer
process
guedes   14881 14878 99 19:37 ?        00:03:07 postgres: checkpointer
process
guedes   14882 14878  0 19:37 ?        00:00:00 postgres: wal writer
process
guedes   14883 14878  0 19:37 ?        00:00:00 postgres: autovacuum
launcher process
guedes   14884 14878  0 19:37 ?        00:00:00 postgres: stats
collector process

Best regards.
-- 
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br


Re: Separating bgwriter and checkpointer

От
Simon Riggs
Дата:
On Sun, Oct 2, 2011 at 11:45 PM, Dickson S. Guedes <listas@guedesoft.net> wrote:

> I'm trying your patch, it was applied cleanly to master and compiled
> ok. But since I started postgres I'm seeing a  99% of CPU usage:

Oh, thanks. I see what happened. I was toying with the idea of going
straight to a WaitLatch implementation for the loop but decided to
leave it out for a later patch, and then skipped the sleep as well.

New version attached.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Separating bgwriter and checkpointer

От
"Dickson S. Guedes"
Дата:
2011/10/3 Simon Riggs <simon@2ndquadrant.com>:
> On Sun, Oct 2, 2011 at 11:45 PM, Dickson S. Guedes <listas@guedesoft.net> wrote:
>> I'm trying your patch, it was applied cleanly to master and compiled
>> ok. But since I started postgres I'm seeing a  99% of CPU usage:
>
> Oh, thanks. I see what happened. I was toying with the idea of going
> straight to a WaitLatch implementation for the loop but decided to
> leave it out for a later patch, and then skipped the sleep as well.
>
> New version attached.

Working now but even passing all tests for make check, the
regress_database's postmaster doesn't shutdown properly.

$ make check
...
...
============== creating temporary installation        ==============
============== initializing database system           ==============
============== starting postmaster                    ==============
running on port 57432 with PID 20094
============== creating database "regression"         ==============
...
============== shutting down postmaster               ==============
pg_ctl: server does not shut down
pg_regress: could not stop postmaster: exit code was 256

$ uname -a
Linux betelgeuse 2.6.38-11-generic-pae #50-Ubuntu SMP Mon Sep 12
22:21:04 UTC 2011 i686 i686 i386 GNU/Linux

$ grep "$ ./configure" config.log $ ./configure --enable-debug --enable-cassert
--prefix=/srv/postgres/bgwriter_split

Best regards,
--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br


Re: Separating bgwriter and checkpointer

От
Simon Riggs
Дата:
On Tue, Oct 4, 2011 at 2:51 AM, Dickson S. Guedes <listas@guedesoft.net> wrote:
> 2011/10/3 Simon Riggs <simon@2ndquadrant.com>:
>> On Sun, Oct 2, 2011 at 11:45 PM, Dickson S. Guedes <listas@guedesoft.net> wrote:
>>> I'm trying your patch, it was applied cleanly to master and compiled
>>> ok. But since I started postgres I'm seeing a  99% of CPU usage:
>>
>> Oh, thanks. I see what happened. I was toying with the idea of going
>> straight to a WaitLatch implementation for the loop but decided to
>> leave it out for a later patch, and then skipped the sleep as well.
>>
>> New version attached.
>
> Working now but even passing all tests for make check, the
> regress_database's postmaster doesn't shutdown properly.
>
> $ make check
> ...
> ...
> ============== creating temporary installation        ==============
> ============== initializing database system           ==============
> ============== starting postmaster                    ==============
> running on port 57432 with PID 20094
> ============== creating database "regression"         ==============
> ...
> ============== shutting down postmaster               ==============
> pg_ctl: server does not shut down
> pg_regress: could not stop postmaster: exit code was 256
>
> $ uname -a
> Linux betelgeuse 2.6.38-11-generic-pae #50-Ubuntu SMP Mon Sep 12
> 22:21:04 UTC 2011 i686 i686 i386 GNU/Linux
>
> $ grep "$ ./configure" config.log
>  $ ./configure --enable-debug --enable-cassert
> --prefix=/srv/postgres/bgwriter_split


Yes, I see this also. At the same time, pg_ctl start and stop seem to
work fine in every mode, which is what I tested. Which seems a little
weird.

I seem to be having problems with HEAD as well.

Investigating further.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Separating bgwriter and checkpointer

От
Simon Riggs
Дата:
On Tue, Oct 4, 2011 at 10:05 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

>> ============== shutting down postmaster               ==============
>> pg_ctl: server does not shut down
>> pg_regress: could not stop postmaster: exit code was 256
>>
>
> Yes, I see this also. At the same time, pg_ctl start and stop seem to
> work fine in every mode, which is what I tested. Which seems a little
> weird.
>
> I seem to be having problems with HEAD as well.
>
> Investigating further.


Doh.

The problem is the *same* one I fixed in v2, yet now I see I managed
to somehow exclude that fix from the earlier patch. Slap. Anyway,
fixed again now.

Problem observed in head was because of this bug causing later make
checks to fail on port 57432, so it looked like a problem in head at
first. Nothing actual bug there at all.

Thanks for your patience.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Separating bgwriter and checkpointer

От
"Dickson S. Guedes"
Дата:
2011/10/4 Simon Riggs <simon@2ndquadrant.com>:
> The problem is the *same* one I fixed in v2, yet now I see I managed
> to somehow exclude that fix from the earlier patch. Slap. Anyway,
> fixed again now.

Ah ok! I started reviewing the v4 patch version, this is my comments:

Submission review
===============

1. The patch applies cleanly to current master (fa56a0c3e) but isn't
in context diff format;

Feature test
==========

1. Since I patched and make installed it, I can see the expected
processes: writer and checkpointer;

2. I did the following tests with the following results:

2.1 Running a long time pgbench didn't emit any assertion failure or
crash and the checkpoint works as before patch:
 LOG:  checkpoint starting: xlog LOG:  checkpoint complete: wrote 300 buffers (9.8%); 0 transaction
log file(s) added, 0 removed, 0 recycled; write=26.103 s, sync=6.492
s, total=34.000 s; sync files=13, longest=4.684 s, average=0.499 s LOG:  checkpoint starting: time LOG:  checkpoint
complete:wrote 257 buffers (8.4%); 0 transaction 
log file(s) added, 0 removed, 3 recycled; write=21.819 s, sync=9.610
s, total=32.076 s; sync files=7, longest=6.452 s, average=1.372 s

2.2 Forcing a checkpoint when filesystem has enough free space works
fine while pgbench is running:
 LOG:  checkpoint starting: immediate force wait LOG:  checkpoint complete: wrote 1605 buffers (52.2%); 0 transaction
log file(s) added, 0 removed, 2 recycled; write=0.344 s, sync=22.750
s, total=23.700 s; sync files=10, longest=15.586 s, average=2.275 s

2.3 Forcing a checkpoint when filesystem are full, works as expected:
 LOG:  checkpoint starting: immediate force wait time ERROR:  could not write to file "pg_xlog/xlogtemp.4380": Não há
espaço disponível no dispositivo ERROR:  checkpoint request failed HINT:  Consult recent messages in the server log for
details.STATEMENT:  CHECKPOINT ; ... ERROR:  could not extend file "base/16384/16405": wrote only 4096 of 
8192 bytes at block 10 HINT:  Check free disk space. STATEMENT:  INSERT INTO pgbench_history (tid, bid, aid, delta,
mtime) VALUES (69, 3, 609672, -3063,  CURRENT_TIMESTAMP); PANIC:  could not write to file "pg_xlog/xlogtemp.4528": Não
há
espaço disponível no dispositivo STATEMENT:  END; LOG:  server process (PID 4528) was terminated by signal 6: Aborted

Then I freed some space and started it again and the server ran properly:
  LOG:  database system was shut down at 2011-10-05 00:46:33 BRT  LOG:  database system is ready to accept connections
LOG: autovacuum launcher started  ...  LOG:  checkpoint starting: immediate force wait  LOG:  checkpoint complete:
wrote0 buffers (0.0%); 0 transaction 
log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s,
total=0.181 s; sync files=0, longest=0.000 s, average=0.000 s

2.2 Running a pgbench and interrupting postmaster a few seconds later,
seems to work as expected, returning the output:

... cut ...
LOG:  statement: SELECT abalance FROM pgbench_accounts WHERE aid = 148253;
^CLOG:  statement: UPDATE pgbench_tellers SET tbalance = tbalance +
934 WHERE tid = 85;
DEBUG:  postmaster received signal 2
LOG:  received fast shutdown request
LOG:  aborting any active transactions
FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command
... cut ...
LOG:  disconnection: session time: 0:00:14.917 user=guedes
database=bench host=[local]
LOG:  disconnection: session time: 0:00:14.773 user=guedes
database=bench host=[local]
DEBUG:  server process (PID 1910) exited with exit code 1
DEBUG:  server process (PID 1941) exited with exit code 1
LOG:  shutting down
LOG:  checkpoint starting: shutdown immediate
DEBUG:  SlruScanDirectory invoking callback on pg_multixact/offsets/0000
DEBUG:  SlruScanDirectory invoking callback on pg_multixact/members/0000
DEBUG:  checkpoint sync: number=1 file=base/16384/16398 time=1075.227 msec
DEBUG:  checkpoint sync: number=2 file=base/16384/16406 time=16.832 msec
DEBUG:  checkpoint sync: number=3 file=base/16384/16397 time=12306.204 msec
DEBUG:  checkpoint sync: number=4 file=base/16384/16397.1 time=2122.141 msec
DEBUG:  checkpoint sync: number=5 file=base/16384/16406_fsm time=32.278 msec
DEBUG:  checkpoint sync: number=6 file=base/16384/16385_fsm time=11.248 msec
DEBUG:  checkpoint sync: number=7 file=base/16384/16388 time=11.083 msec
DEBUG:  checkpoint sync: number=8 file=base/16384/11712 time=11.314 msec
DEBUG:  checkpoint sync: number=9 file=base/16384/16397_vm time=11.103 msec
DEBUG:  checkpoint sync: number=10 file=base/16384/16385 time=19.308 msec
DEBUG:  attempting to remove WAL segments older than log file
000000010000000000000000
DEBUG:  SlruScanDirectory invoking callback on pg_subtrans/0000
LOG:  checkpoint complete: wrote 1659 buffers (54.0%); 0 transaction
log file(s) added, 0 removed, 0 recycled; write=0.025 s, sync=15.617
s, total=15.898 s; sync files=10, longest=12.306 s, average=1.561 s
LOG:  database system is shut down

Then I started the server again and it ran properly.


Well, all the tests was running with the default postgresql.conf in my
laptop but I'll setup a more "real world" environment to test for
performance regression. Until now I couldn't notice any significant
difference in TPS before and after patch in a small environment. I'll
post something soon.


Best regards,
--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br


Re: Separating bgwriter and checkpointer

От
Simon Riggs
Дата:
On Wed, Oct 5, 2011 at 5:10 AM, Dickson S. Guedes <listas@guedesoft.net> wrote:

> Ah ok! I started reviewing the v4 patch version, this is my comments:

...

> Well, all the tests was running with the default postgresql.conf in my
> laptop but I'll setup a more "real world" environment to test for
> performance regression. Until now I couldn't notice any significant
> difference in TPS before and after patch in a small environment. I'll
> post something soon.

Great testing, thanks. Likely will have no effect in non-I/O swamped
environment, but no regression expected either.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Separating bgwriter and checkpointer

От
Simon Riggs
Дата:
On Wed, Oct 5, 2011 at 8:02 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, Oct 5, 2011 at 5:10 AM, Dickson S. Guedes <listas@guedesoft.net> wrote:
>
>> Ah ok! I started reviewing the v4 patch version, this is my comments:
>
> ...
>
>> Well, all the tests was running with the default postgresql.conf in my
>> laptop but I'll setup a more "real world" environment to test for
>> performance regression. Until now I couldn't notice any significant
>> difference in TPS before and after patch in a small environment. I'll
>> post something soon.
>
> Great testing, thanks. Likely will have no effect in non-I/O swamped
> environment, but no regression expected either.


Any reason or objection to committing this patch?


--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Separating bgwriter and checkpointer

От
Robert Haas
Дата:
On Tue, Oct 18, 2011 at 9:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Any reason or objection to committing this patch?

Not on my end, though I haven't reviewed it in detail.  One minor note
- I was mildly surprised to see that you moved this to the
checkpointer rather than leaving it in the bgwriter:

+    /* Do this once before starting the loop, then just at SIGHUP time. */
+    SyncRepUpdateSyncStandbysDefined();

My preference would probably have been to leave that in the background
writer, on the theory that the checkpointer's work is likely to be
more bursty and therefore it might be less responsive.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Separating bgwriter and checkpointer

От
Simon Riggs
Дата:
On Tue, Oct 18, 2011 at 5:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Oct 18, 2011 at 9:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Any reason or objection to committing this patch?
>
> Not on my end, though I haven't reviewed it in detail.  One minor note
> - I was mildly surprised to see that you moved this to the
> checkpointer rather than leaving it in the bgwriter:
>
> +       /* Do this once before starting the loop, then just at SIGHUP time. */
> +       SyncRepUpdateSyncStandbysDefined();
>
> My preference would probably have been to leave that in the background
> writer, on the theory that the checkpointer's work is likely to be
> more bursty and therefore it might be less responsive.

That needs to be in the checkpointer because that is the process that
shuts down last.

The bgwriter is now more like the walwriter. It shuts down early in
the shutdown process, while the checkpointer is last out.

So it wasn't preference, it was a requirement of the new role definitions.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Separating bgwriter and checkpointer

От
Robert Haas
Дата:
On Tue, Oct 18, 2011 at 12:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, Oct 18, 2011 at 5:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Oct 18, 2011 at 9:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Any reason or objection to committing this patch?
>>
>> Not on my end, though I haven't reviewed it in detail.  One minor note
>> - I was mildly surprised to see that you moved this to the
>> checkpointer rather than leaving it in the bgwriter:
>>
>> +       /* Do this once before starting the loop, then just at SIGHUP time. */
>> +       SyncRepUpdateSyncStandbysDefined();
>>
>> My preference would probably have been to leave that in the background
>> writer, on the theory that the checkpointer's work is likely to be
>> more bursty and therefore it might be less responsive.
>
> That needs to be in the checkpointer because that is the process that
> shuts down last.
>
> The bgwriter is now more like the walwriter. It shuts down early in
> the shutdown process, while the checkpointer is last out.
>
> So it wasn't preference, it was a requirement of the new role definitions.

Oh, I see.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Separating bgwriter and checkpointer

От
Fujii Masao
Дата:
On Tue, Oct 18, 2011 at 10:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Any reason or objection to committing this patch?

The checkpointer doesn't call pgstat_send_bgwriter(), but it should.
Otherwise, some entries in pg_stat_bgwriter will never be updated.

If we adopt the patch, checkpoint is performed by checkpointer. So
it looks odd that information related to checkpoint exist in
pg_stat_bgwriter. We should move them to new catalog even if
it breaks the compatibility?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Separating bgwriter and checkpointer

От
"Dickson S. Guedes"
Дата:
2011/10/18 Simon Riggs <simon@2ndquadrant.com>:
> On Wed, Oct 5, 2011 at 8:02 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On Wed, Oct 5, 2011 at 5:10 AM, Dickson S. Guedes <listas@guedesoft.net> wrote:
>>
>>> Ah ok! I started reviewing the v4 patch version, this is my comments:
>>
>> ...
>>
>>> Well, all the tests was running with the default postgresql.conf in my
>>> laptop but I'll setup a more "real world" environment to test for
>>> performance regression. Until now I couldn't notice any significant
>>> difference in TPS before and after patch in a small environment. I'll
>>> post something soon.
>>
>> Great testing, thanks. Likely will have no effect in non-I/O swamped
>> environment, but no regression expected either.
>
>
> Any reason or objection to committing this patch?

I didn't see any performance regression (as expected) in the
environments that I tested. About the code, I prefer someone with more
experience to review it.

Thanks.
-- 
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br


Re: Separating bgwriter and checkpointer

От
"Dickson S. Guedes"
Дата:
2011/10/19 Fujii Masao <masao.fujii@gmail.com>:
> On Tue, Oct 18, 2011 at 10:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Any reason or objection to committing this patch?
>
> The checkpointer doesn't call pgstat_send_bgwriter(), but it should.
> Otherwise, some entries in pg_stat_bgwriter will never be updated.

Yes, checkpoints_req, checkpoints_timed and buffer_checkpoint are not
being updated with this patch.

> If we adopt the patch, checkpoint is performed by checkpointer. So
> it looks odd that information related to checkpoint exist in
> pg_stat_bgwriter. We should move them to new catalog even if
> it breaks the compatibility?

Splitting pg_stat_bgwriter into pg_stat_bgwriter and
pg_stat_checkpointer will break something internal?

With this modification we'll see applications like monitoring tools
breaking, but they could use a view to put data back together in a
compatible way, IMHO.

-- 
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br


Re: Separating bgwriter and checkpointer

От
Robert Haas
Дата:
On Wed, Oct 19, 2011 at 8:43 AM, Dickson S. Guedes <listas@guedesoft.net> wrote:
> 2011/10/19 Fujii Masao <masao.fujii@gmail.com>:
>> On Tue, Oct 18, 2011 at 10:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Any reason or objection to committing this patch?
>>
>> The checkpointer doesn't call pgstat_send_bgwriter(), but it should.
>> Otherwise, some entries in pg_stat_bgwriter will never be updated.
>
> Yes, checkpoints_req, checkpoints_timed and buffer_checkpoint are not
> being updated with this patch.
>
>> If we adopt the patch, checkpoint is performed by checkpointer. So
>> it looks odd that information related to checkpoint exist in
>> pg_stat_bgwriter. We should move them to new catalog even if
>> it breaks the compatibility?
>
> Splitting pg_stat_bgwriter into pg_stat_bgwriter and
> pg_stat_checkpointer will break something internal?
>
> With this modification we'll see applications like monitoring tools
> breaking, but they could use a view to put data back together in a
> compatible way, IMHO.

I don't really see any reason to break the monitoring view just
because we did some internal refactoring.  I'd rather have backward
compatibility.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Separating bgwriter and checkpointer

От
Fujii Masao
Дата:
On Wed, Oct 19, 2011 at 9:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I don't really see any reason to break the monitoring view just
> because we did some internal refactoring.  I'd rather have backward
> compatibility.

Fair enough.

The patch doesn't change any document, but at least the description
of pg_stat_bgwriter seems to need to be changed.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Separating bgwriter and checkpointer

От
Simon Riggs
Дата:
On Wed, Oct 19, 2011 at 3:29 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Oct 19, 2011 at 9:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I don't really see any reason to break the monitoring view just
>> because we did some internal refactoring.  I'd rather have backward
>> compatibility.
>
> Fair enough.
>
> The patch doesn't change any document, but at least the description
> of pg_stat_bgwriter seems to need to be changed.

Thanks for the review.

Will follow up on suggestions.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Separating bgwriter and checkpointer

От
Heikki Linnakangas
Дата:
On 19.10.2011 17:58, Simon Riggs wrote:
> On Wed, Oct 19, 2011 at 3:29 PM, Fujii Masao<masao.fujii@gmail.com>  wrote:
>> On Wed, Oct 19, 2011 at 9:45 PM, Robert Haas<robertmhaas@gmail.com>  wrote:
>>> I don't really see any reason to break the monitoring view just
>>> because we did some internal refactoring.  I'd rather have backward
>>> compatibility.
>>
>> Fair enough.
>>
>> The patch doesn't change any document, but at least the description
>> of pg_stat_bgwriter seems to need to be changed.
>
> Thanks for the review.
>
> Will follow up on suggestions.

The patch looks sane, it's mostly just moving existing code around, but 
there's one thing that's been bothering me about this whole idea from 
the get-go:

If the bgwriter and checkpointer are two different processes, whenever 
bgwriter writes out a page it needs to send an fsync-request to the 
checkpointer. We avoided that when both functions were performed by the 
same process, but now we have to send and absorb a fsync-request message 
for every single write() that happens in the system, except for those 
done at checkpoints. Isn't that very expensive? Does it make the 
fsync-request queue a bottleneck on some workloads?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Separating bgwriter and checkpointer

От
Simon Riggs
Дата:
On Mon, Oct 24, 2011 at 11:40 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

> The patch looks sane, it's mostly just moving existing code around, but
> there's one thing that's been bothering me about this whole idea from the
> get-go:
>
> If the bgwriter and checkpointer are two different processes, whenever
> bgwriter writes out a page it needs to send an fsync-request to the
> checkpointer. We avoided that when both functions were performed by the same
> process, but now we have to send and absorb a fsync-request message for
> every single write() that happens in the system, except for those done at
> checkpoints. Isn't that very expensive? Does it make the fsync-request queue
> a bottleneck on some workloads?

That is a reasonable question and one I considered.

I did some benchmarking earlier to see the overhead of that.
Basically, its very small, much, much smaller than I thought.

The benefit of allowing the bgwriter to continue working during long
fsyncs easily outweighs the loss of doing more fsync-requests. Both of
those overheads/problems occur at the same time so there is the
overhead is always covered.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Separating bgwriter and checkpointer

От
Simon Riggs
Дата:
On Wed, Oct 19, 2011 at 3:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, Oct 19, 2011 at 3:29 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Wed, Oct 19, 2011 at 9:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> I don't really see any reason to break the monitoring view just
>>> because we did some internal refactoring.  I'd rather have backward
>>> compatibility.
>>
>> Fair enough.
>>
>> The patch doesn't change any document, but at least the description
>> of pg_stat_bgwriter seems to need to be changed.
>
> Thanks for the review.
>
> Will follow up on suggestions.

I'll add this in as a separate item after commit.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services