Обсуждение: Separating bgwriter and checkpointer
As discussed previously... Currently the bgwriter process performs both background writing, checkpointing and some other duties. This means that we can't perform the final checkpoint fsync without stopping background writing, so there is a negative performance effect from doing both things in one process. Additionally, our aim in 9.2 is to replace polling loops with latches for power reduction. The complexity of the bgwriter loops is high and it seems unlikely to come up with a clean approach using latches. This patch splits bgwriter into 2 processes: checkpointer and bgwriter, seeking to avoid contentious changes. Additional changes are expected in this release to build upon these changes for both new processes, though this patch stands on its own as both a performance vehicle and in some ways a refcatoring to simplify the code. Checkpointer does the important things, "new bgwriter" just does background writing and so is much less important than before. Current patch has a bug at shutdown I've not located yet, but seems likely is a simple error. That is mainly because for personal reasons I've not been able to work on the patch recently. I expect to be able to fix that later in the CF. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Fri, Sep 16, 2011 at 7:53 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > This patch splits bgwriter into 2 processes: checkpointer and > bgwriter, seeking to avoid contentious changes. Additional changes are > expected in this release to build upon these changes for both new > processes, though this patch stands on its own as both a performance > vehicle and in some ways a refcatoring to simplify the code. I like this idea to simplify the code. How much performance gain can we expect by this patch? > Current patch has a bug at shutdown I've not located yet, but seems > likely is a simple error. That is mainly because for personal reasons > I've not been able to work on the patch recently. I expect to be able > to fix that later in the CF. You seem to have forgotten to include checkpointor.c and .h in the patch. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Sep 16, 2011 at 2:38 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Fri, Sep 16, 2011 at 7:53 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> This patch splits bgwriter into 2 processes: checkpointer and >> bgwriter, seeking to avoid contentious changes. Additional changes are >> expected in this release to build upon these changes for both new >> processes, though this patch stands on its own as both a performance >> vehicle and in some ways a refcatoring to simplify the code. > > I like this idea to simplify the code. How much performance gain can we > expect by this patch? On heavily I/O bound systems, this is likely to make a noticeable difference, since bgwriter reduces I/O in user processes. The overhead of sending signals between processes is much less than I had previously thought, so I expect no problems there, even on highly loaded systems. >> Current patch has a bug at shutdown I've not located yet, but seems >> likely is a simple error. That is mainly because for personal reasons >> I've not been able to work on the patch recently. I expect to be able >> to fix that later in the CF. > > You seem to have forgotten to include checkpointor.c and .h in the patch. I confirm this error. I'll repost full patch later in the week when I have more time. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 20.09.2011 10:48, Simon Riggs wrote: > On Fri, Sep 16, 2011 at 2:38 AM, Fujii Masao<masao.fujii@gmail.com> wrote: >> On Fri, Sep 16, 2011 at 7:53 AM, Simon Riggs<simon@2ndquadrant.com> wrote: >>> This patch splits bgwriter into 2 processes: checkpointer and >>> bgwriter, seeking to avoid contentious changes. Additional changes are >>> expected in this release to build upon these changes for both new >>> processes, though this patch stands on its own as both a performance >>> vehicle and in some ways a refcatoring to simplify the code. >> >> I like this idea to simplify the code. How much performance gain can we >> expect by this patch? > > On heavily I/O bound systems, this is likely to make a noticeable > difference, since bgwriter reduces I/O in user processes. Hmm. If the system is I/O bound, it doesn't matter which process performs the I/O. It's still the same amount of I/O in total, and in an I/O bound system, that's what determines the overall throughput. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, Sep 20, 2011 at 9:06 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 20.09.2011 10:48, Simon Riggs wrote: >> >> On Fri, Sep 16, 2011 at 2:38 AM, Fujii Masao<masao.fujii@gmail.com> >> wrote: >>> >>> On Fri, Sep 16, 2011 at 7:53 AM, Simon Riggs<simon@2ndquadrant.com> >>> wrote: >>>> >>>> This patch splits bgwriter into 2 processes: checkpointer and >>>> bgwriter, seeking to avoid contentious changes. Additional changes are >>>> expected in this release to build upon these changes for both new >>>> processes, though this patch stands on its own as both a performance >>>> vehicle and in some ways a refcatoring to simplify the code. >>> >>> I like this idea to simplify the code. How much performance gain can we >>> expect by this patch? >> >> On heavily I/O bound systems, this is likely to make a noticeable >> difference, since bgwriter reduces I/O in user processes. > > Hmm. If the system is I/O bound, it doesn't matter which process performs > the I/O. It's still the same amount of I/O in total, and in an I/O bound > system, that's what determines the overall throughput. That's true, but not relevant. The bgwriter avoids I/O, if it is operating correctly. This patch ensures it continues to operate even during heavy checkpoints. So it helps avoid extra I/O during a period of very high I/O activity. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 20.09.2011 11:18, Simon Riggs wrote: > The bgwriter avoids I/O, if it is operating correctly. This patch > ensures it continues to operate even during heavy checkpoints. So it > helps avoid extra I/O during a period of very high I/O activity. I don't see what difference it makes which process does the I/O. If a write() by checkpointer process blocks, any write()s by the separate bgwriter process at that time will block too. If the I/O is not saturated, and the checkpoint write()s don't block, then even without this patch, the bgwriter process can handle its usual bgwriter duties during checkpoint just fine. (And if the I/O is not saturated, it's not an I/O bound system anyway.) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, Sep 20, 2011 at 10:03 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 20.09.2011 11:18, Simon Riggs wrote: >> >> The bgwriter avoids I/O, if it is operating correctly. This patch >> ensures it continues to operate even during heavy checkpoints. So it >> helps avoid extra I/O during a period of very high I/O activity. > > I don't see what difference it makes which process does the I/O. If a > write() by checkpointer process blocks, any write()s by the separate > bgwriter process at that time will block too. If the I/O is not saturated, > and the checkpoint write()s don't block, then even without this patch, the > bgwriter process can handle its usual bgwriter duties during checkpoint just > fine. (And if the I/O is not saturated, it's not an I/O bound system > anyway.) Whatever value you assign to the bgwriter, then this patch makes sure that happens during heavy fsyncs. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Sep 20, 2011 at 11:03 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> I don't see what difference it makes which process does the I/O. If a >> write() by checkpointer process blocks, any write()s by the separate >> bgwriter process at that time will block too. If the I/O is not saturated, >> and the checkpoint write()s don't block, then even without this patch, the >> bgwriter process can handle its usual bgwriter duties during checkpoint just >> fine. (And if the I/O is not saturated, it's not an I/O bound system >> anyway.) > > Whatever value you assign to the bgwriter, then this patch makes sure > that happens during heavy fsyncs. I think his point is that it doesn't because if the heavy fsyncs cause the system to be i/o bound it then bgwriter will just block issuing the writes instead of the fsyncs. I'm not actually convinced. Writes will only block if the kernel decides to block. We don't really know how the kernel makes this decision but it's entirely possible that having pending physical i/o issued due to an fsync doesn't influence the decision if there is still a reasonable number of dirty pages in the buffer cache. In a sense, "I/O bound" means different things for write and fsync. Or to put it another way fsync is latency sensitive but write is only bandwidth sensitive. All that said my question is which way is the code more legible and easier to follow?
On 20.09.2011 16:29, Greg Stark wrote: > On Tue, Sep 20, 2011 at 11:03 AM, Simon Riggs<simon@2ndquadrant.com> wrote: >>> I don't see what difference it makes which process does the I/O. If a >>> write() by checkpointer process blocks, any write()s by the separate >>> bgwriter process at that time will block too. If the I/O is not saturated, >>> and the checkpoint write()s don't block, then even without this patch, the >>> bgwriter process can handle its usual bgwriter duties during checkpoint just >>> fine. (And if the I/O is not saturated, it's not an I/O bound system >>> anyway.) >> >> Whatever value you assign to the bgwriter, then this patch makes sure >> that happens during heavy fsyncs. > > I think his point is that it doesn't because if the heavy fsyncs cause > the system to be i/o bound it then bgwriter will just block issuing > the writes instead of the fsyncs. > > I'm not actually convinced. Writes will only block if the kernel > decides to block. We don't really know how the kernel makes this > decision but it's entirely possible that having pending physical i/o > issued due to an fsync doesn't influence the decision if there is > still a reasonable number of dirty pages in the buffer cache. In a > sense, "I/O bound" means different things for write and fsync. Or to > put it another way fsync is latency sensitive but write is only > bandwidth sensitive. Yeah, I was thinking of write()s, not fsyncs. I agree this might have some effect during fsync phase. > All that said my question is which way is the code more legible and > easier to follow? Hear hear. If we're going to give the bgwriter more responsibilities, this might make sense even if it has no effect on performance. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, Sep 20, 2011 at 15:35, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 20.09.2011 16:29, Greg Stark wrote: >> >> On Tue, Sep 20, 2011 at 11:03 AM, Simon Riggs<simon@2ndquadrant.com> >> wrote: >>>> >>>> I don't see what difference it makes which process does the I/O. If a >>>> write() by checkpointer process blocks, any write()s by the separate >>>> bgwriter process at that time will block too. If the I/O is not >>>> saturated, >>>> and the checkpoint write()s don't block, then even without this patch, >>>> the >>>> bgwriter process can handle its usual bgwriter duties during checkpoint >>>> just >>>> fine. (And if the I/O is not saturated, it's not an I/O bound system >>>> anyway.) >>> >>> Whatever value you assign to the bgwriter, then this patch makes sure >>> that happens during heavy fsyncs. >> >> I think his point is that it doesn't because if the heavy fsyncs cause >> the system to be i/o bound it then bgwriter will just block issuing >> the writes instead of the fsyncs. >> >> I'm not actually convinced. Writes will only block if the kernel >> decides to block. We don't really know how the kernel makes this >> decision but it's entirely possible that having pending physical i/o >> issued due to an fsync doesn't influence the decision if there is >> still a reasonable number of dirty pages in the buffer cache. In a >> sense, "I/O bound" means different things for write and fsync. Or to >> put it another way fsync is latency sensitive but write is only >> bandwidth sensitive. > > Yeah, I was thinking of write()s, not fsyncs. I agree this might have some > effect during fsync phase. > >> All that said my question is which way is the code more legible and >> easier to follow? > > Hear hear. If we're going to give the bgwriter more responsibilities, this > might make sense even if it has no effect on performance. Isn't there also the advantage of that work put in two different processes can use two different CPU cores? Or is that likely to never ever come in play here? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On 20.09.2011 16:49, Magnus Hagander wrote: > Isn't there also the advantage of that work put in two different > processes can use two different CPU cores? Or is that likely to never > ever come in play here? You would need one helluva I/O system to saturate even a single CPU, just by doing write+fsync. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
2011/9/20 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>: > On 20.09.2011 16:49, Magnus Hagander wrote: >> >> Isn't there also the advantage of that work put in two different >> processes can use two different CPU cores? Or is that likely to never >> ever come in play here? > > You would need one helluva I/O system to saturate even a single CPU, just by > doing write+fsync. The point of Magnus is valid. There are possible throttling done by linux per node, per process/task. Since ..2.6.37 (32 ?) I believe .. there are more temptation to have have per cgroup io/sec limits, and there exists some promising work done to have a better IO bandwith throttling per process. IMO, splitting the type of IO workload per process allows the administrators to have more control on the IO limits they want to have (and it may help the kernels() to have a better strategy ?) > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation
On 20.09.2011 17:31, Cédric Villemain wrote: > 2011/9/20 Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>: >> On 20.09.2011 16:49, Magnus Hagander wrote: >>> >>> Isn't there also the advantage of that work put in two different >>> processes can use two different CPU cores? Or is that likely to never >>> ever come in play here? >> >> You would need one helluva I/O system to saturate even a single CPU, just by >> doing write+fsync. > > The point of Magnus is valid. There are possible throttling done by > linux per node, per process/task. > Since ..2.6.37 (32 ?) I believe .. there are more temptation to have > have per cgroup io/sec limits, and there exists some promising work > done to have a better IO bandwith throttling per process. > > IMO, splitting the type of IO workload per process allows the > administrators to have more control on the IO limits they want to have > (and it may help the kernels() to have a better strategy ?) That is a separate issue from being able to use different CPU cores. But cool! I didn't know Linux can do that nowadays. That could be highly useful, if you can put e.g autovacuum on a different cgroup from regular backends. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, Sep 20, 2011 at 9:35 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >> All that said my question is which way is the code more legible and >> easier to follow? > > Hear hear. If we're going to give the bgwriter more responsibilities, this > might make sense even if it has no effect on performance. I agree. I don't think this change needs to be justified on performance grounds; there are enough collateral benefits to make it worthwhile. If the checkpoint process handles all the stuff with highly variable latency (i.e. fsyncs), then the background writer work will happen more regularly and predictably. The code will also be simpler, which I think will open up opportunities for additional optimizations such as (perhaps) making the background writer only wake up when there are dirty buffers to write, which ties in to longstanding concerns about power consumption. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Sep 16, 2011 at 01:53, Simon Riggs <simon@2ndquadrant.com> wrote: > This patch splits bgwriter into 2 processes: checkpointer and > bgwriter, seeking to avoid contentious changes. Additional changes are > expected in this release to build upon these changes for both new > processes, though this patch stands on its own as both a performance > vehicle and in some ways a refcatoring to simplify the code. While you're already splitting up bgwriter, could there be any benefit to spawning a separate bgwriter process for each tablespace? If your database has one tablespace on a fast I/O system and another on a slow one, the slow tablespace would also bog down background writing for the fast tablespace. But I don't know whether that's really a problem or not. Regards, Marti
On Tue, Sep 20, 2011 at 11:01 AM, Marti Raudsepp <marti@juffo.org> wrote: > On Fri, Sep 16, 2011 at 01:53, Simon Riggs <simon@2ndquadrant.com> wrote: >> This patch splits bgwriter into 2 processes: checkpointer and >> bgwriter, seeking to avoid contentious changes. Additional changes are >> expected in this release to build upon these changes for both new >> processes, though this patch stands on its own as both a performance >> vehicle and in some ways a refcatoring to simplify the code. > > While you're already splitting up bgwriter, could there be any benefit > to spawning a separate bgwriter process for each tablespace? > > If your database has one tablespace on a fast I/O system and another > on a slow one, the slow tablespace would also bog down background > writing for the fast tablespace. But I don't know whether that's > really a problem or not. I doubt it. Most of the time the writes are going to be absorbed by the OS write cache anyway. I think there's probably more performance to be squeezed out of the background writer, but maybe not that exact thing, and in any case it seems like material for a separate patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 09/20/2011 09:35 AM, Heikki Linnakangas wrote: > Yeah, I was thinking of write()s, not fsyncs. I agree this might have > some effect during fsync phase. Right; that's where the most serious problems seem to pop up at anyway now. All the testing I did earlier this year suggested Linux at least is happy to do a granular fsync, and it can also use things like barriers when appropriate to schedule I/O. The hope here is that the background writer work to clean ahead of the strategy point is helpful to backends, and that should keep going even during the sync phase--which currently doesn't pause for anything else once it's started. The cleaner writes should all queue up into RAM in a lazy way rather than block the true I/O, which is being driven by sync calls. There is some risk here that the cleaner writes happen faster than the true rate at which backends really need buffers, since it has a predictive component it can be wrong about. Those could in theory result in the write cache filling faster than it would in the current environment, such that writes truly block that would have been cached in the current code. If you're that close to the edge though, backends should really benefit from the cleaner--that same write done by a client would turn into a serious stall. From that perspective, when things have completely filled the write cache, any writes the cleaner can get out of the way in advance of when a backend needs it should be the biggest win most of the time. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
On Fri, Sep 16, 2011 at 2:38 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Fri, Sep 16, 2011 at 7:53 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> This patch splits bgwriter into 2 processes: checkpointer and >> bgwriter, seeking to avoid contentious changes. Additional changes are >> expected in this release to build upon these changes for both new >> processes, though this patch stands on its own as both a performance >> vehicle and in some ways a refcatoring to simplify the code. > > I like this idea to simplify the code. How much performance gain can we > expect by this patch? > >> Current patch has a bug at shutdown I've not located yet, but seems >> likely is a simple error. That is mainly because for personal reasons >> I've not been able to work on the patch recently. I expect to be able >> to fix that later in the CF. > > You seem to have forgotten to include checkpointor.c and .h in the patch. Original patch included here. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Thu, Sep 15, 2011 at 11:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > Current patch has a bug at shutdown I've not located yet, but seems > likely is a simple error. That is mainly because for personal reasons > I've not been able to work on the patch recently. I expect to be able > to fix that later in the CF. Full patch, with bug fixed. (v2) I'm now free to take review comments and make changes. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
2011/10/2 Simon Riggs <simon@2ndquadrant.com>: > On Thu, Sep 15, 2011 at 11:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > >> Current patch has a bug at shutdown I've not located yet, but seems >> likely is a simple error. That is mainly because for personal reasons >> I've not been able to work on the patch recently. I expect to be able >> to fix that later in the CF. > > Full patch, with bug fixed. (v2) > > I'm now free to take review comments and make changes. Hi Simon, I'm trying your patch, it was applied cleanly to master and compiled ok. But since I started postgres I'm seeing a 99% of CPU usage: guedes@betelgeuse:/srv/postgres/bgwriter_split$ ps -ef | grep postgres guedes 14878 1 0 19:37 pts/0 00:00:00 /srv/postgres/bgwriter_split/bin/postgres -D data guedes 14880 14878 0 19:37 ? 00:00:00 postgres: writer process guedes 14881 14878 99 19:37 ? 00:03:07 postgres: checkpointer process guedes 14882 14878 0 19:37 ? 00:00:00 postgres: wal writer process guedes 14883 14878 0 19:37 ? 00:00:00 postgres: autovacuum launcher process guedes 14884 14878 0 19:37 ? 00:00:00 postgres: stats collector process Best regards. -- Dickson S. Guedes mail/xmpp: guedes@guedesoft.net - skype: guediz http://guedesoft.net - http://www.postgresql.org.br
On Sun, Oct 2, 2011 at 11:45 PM, Dickson S. Guedes <listas@guedesoft.net> wrote: > I'm trying your patch, it was applied cleanly to master and compiled > ok. But since I started postgres I'm seeing a 99% of CPU usage: Oh, thanks. I see what happened. I was toying with the idea of going straight to a WaitLatch implementation for the loop but decided to leave it out for a later patch, and then skipped the sleep as well. New version attached. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
2011/10/3 Simon Riggs <simon@2ndquadrant.com>: > On Sun, Oct 2, 2011 at 11:45 PM, Dickson S. Guedes <listas@guedesoft.net> wrote: >> I'm trying your patch, it was applied cleanly to master and compiled >> ok. But since I started postgres I'm seeing a 99% of CPU usage: > > Oh, thanks. I see what happened. I was toying with the idea of going > straight to a WaitLatch implementation for the loop but decided to > leave it out for a later patch, and then skipped the sleep as well. > > New version attached. Working now but even passing all tests for make check, the regress_database's postmaster doesn't shutdown properly. $ make check ... ... ============== creating temporary installation ============== ============== initializing database system ============== ============== starting postmaster ============== running on port 57432 with PID 20094 ============== creating database "regression" ============== ... ============== shutting down postmaster ============== pg_ctl: server does not shut down pg_regress: could not stop postmaster: exit code was 256 $ uname -a Linux betelgeuse 2.6.38-11-generic-pae #50-Ubuntu SMP Mon Sep 12 22:21:04 UTC 2011 i686 i686 i386 GNU/Linux $ grep "$ ./configure" config.log $ ./configure --enable-debug --enable-cassert --prefix=/srv/postgres/bgwriter_split Best regards, -- Dickson S. Guedes mail/xmpp: guedes@guedesoft.net - skype: guediz http://guedesoft.net - http://www.postgresql.org.br
On Tue, Oct 4, 2011 at 2:51 AM, Dickson S. Guedes <listas@guedesoft.net> wrote: > 2011/10/3 Simon Riggs <simon@2ndquadrant.com>: >> On Sun, Oct 2, 2011 at 11:45 PM, Dickson S. Guedes <listas@guedesoft.net> wrote: >>> I'm trying your patch, it was applied cleanly to master and compiled >>> ok. But since I started postgres I'm seeing a 99% of CPU usage: >> >> Oh, thanks. I see what happened. I was toying with the idea of going >> straight to a WaitLatch implementation for the loop but decided to >> leave it out for a later patch, and then skipped the sleep as well. >> >> New version attached. > > Working now but even passing all tests for make check, the > regress_database's postmaster doesn't shutdown properly. > > $ make check > ... > ... > ============== creating temporary installation ============== > ============== initializing database system ============== > ============== starting postmaster ============== > running on port 57432 with PID 20094 > ============== creating database "regression" ============== > ... > ============== shutting down postmaster ============== > pg_ctl: server does not shut down > pg_regress: could not stop postmaster: exit code was 256 > > $ uname -a > Linux betelgeuse 2.6.38-11-generic-pae #50-Ubuntu SMP Mon Sep 12 > 22:21:04 UTC 2011 i686 i686 i386 GNU/Linux > > $ grep "$ ./configure" config.log > $ ./configure --enable-debug --enable-cassert > --prefix=/srv/postgres/bgwriter_split Yes, I see this also. At the same time, pg_ctl start and stop seem to work fine in every mode, which is what I tested. Which seems a little weird. I seem to be having problems with HEAD as well. Investigating further. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Oct 4, 2011 at 10:05 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> ============== shutting down postmaster ============== >> pg_ctl: server does not shut down >> pg_regress: could not stop postmaster: exit code was 256 >> > > Yes, I see this also. At the same time, pg_ctl start and stop seem to > work fine in every mode, which is what I tested. Which seems a little > weird. > > I seem to be having problems with HEAD as well. > > Investigating further. Doh. The problem is the *same* one I fixed in v2, yet now I see I managed to somehow exclude that fix from the earlier patch. Slap. Anyway, fixed again now. Problem observed in head was because of this bug causing later make checks to fail on port 57432, so it looked like a problem in head at first. Nothing actual bug there at all. Thanks for your patience. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
2011/10/4 Simon Riggs <simon@2ndquadrant.com>: > The problem is the *same* one I fixed in v2, yet now I see I managed > to somehow exclude that fix from the earlier patch. Slap. Anyway, > fixed again now. Ah ok! I started reviewing the v4 patch version, this is my comments: Submission review =============== 1. The patch applies cleanly to current master (fa56a0c3e) but isn't in context diff format; Feature test ========== 1. Since I patched and make installed it, I can see the expected processes: writer and checkpointer; 2. I did the following tests with the following results: 2.1 Running a long time pgbench didn't emit any assertion failure or crash and the checkpoint works as before patch: LOG: checkpoint starting: xlog LOG: checkpoint complete: wrote 300 buffers (9.8%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=26.103 s, sync=6.492 s, total=34.000 s; sync files=13, longest=4.684 s, average=0.499 s LOG: checkpoint starting: time LOG: checkpoint complete:wrote 257 buffers (8.4%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=21.819 s, sync=9.610 s, total=32.076 s; sync files=7, longest=6.452 s, average=1.372 s 2.2 Forcing a checkpoint when filesystem has enough free space works fine while pgbench is running: LOG: checkpoint starting: immediate force wait LOG: checkpoint complete: wrote 1605 buffers (52.2%); 0 transaction log file(s) added, 0 removed, 2 recycled; write=0.344 s, sync=22.750 s, total=23.700 s; sync files=10, longest=15.586 s, average=2.275 s 2.3 Forcing a checkpoint when filesystem are full, works as expected: LOG: checkpoint starting: immediate force wait time ERROR: could not write to file "pg_xlog/xlogtemp.4380": Não há espaço disponível no dispositivo ERROR: checkpoint request failed HINT: Consult recent messages in the server log for details.STATEMENT: CHECKPOINT ; ... ERROR: could not extend file "base/16384/16405": wrote only 4096 of 8192 bytes at block 10 HINT: Check free disk space. STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (69, 3, 609672, -3063, CURRENT_TIMESTAMP); PANIC: could not write to file "pg_xlog/xlogtemp.4528": Não há espaço disponível no dispositivo STATEMENT: END; LOG: server process (PID 4528) was terminated by signal 6: Aborted Then I freed some space and started it again and the server ran properly: LOG: database system was shut down at 2011-10-05 00:46:33 BRT LOG: database system is ready to accept connections LOG: autovacuum launcher started ... LOG: checkpoint starting: immediate force wait LOG: checkpoint complete: wrote0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=0.181 s; sync files=0, longest=0.000 s, average=0.000 s 2.2 Running a pgbench and interrupting postmaster a few seconds later, seems to work as expected, returning the output: ... cut ... LOG: statement: SELECT abalance FROM pgbench_accounts WHERE aid = 148253; ^CLOG: statement: UPDATE pgbench_tellers SET tbalance = tbalance + 934 WHERE tid = 85; DEBUG: postmaster received signal 2 LOG: received fast shutdown request LOG: aborting any active transactions FATAL: terminating connection due to administrator command FATAL: terminating connection due to administrator command ... cut ... LOG: disconnection: session time: 0:00:14.917 user=guedes database=bench host=[local] LOG: disconnection: session time: 0:00:14.773 user=guedes database=bench host=[local] DEBUG: server process (PID 1910) exited with exit code 1 DEBUG: server process (PID 1941) exited with exit code 1 LOG: shutting down LOG: checkpoint starting: shutdown immediate DEBUG: SlruScanDirectory invoking callback on pg_multixact/offsets/0000 DEBUG: SlruScanDirectory invoking callback on pg_multixact/members/0000 DEBUG: checkpoint sync: number=1 file=base/16384/16398 time=1075.227 msec DEBUG: checkpoint sync: number=2 file=base/16384/16406 time=16.832 msec DEBUG: checkpoint sync: number=3 file=base/16384/16397 time=12306.204 msec DEBUG: checkpoint sync: number=4 file=base/16384/16397.1 time=2122.141 msec DEBUG: checkpoint sync: number=5 file=base/16384/16406_fsm time=32.278 msec DEBUG: checkpoint sync: number=6 file=base/16384/16385_fsm time=11.248 msec DEBUG: checkpoint sync: number=7 file=base/16384/16388 time=11.083 msec DEBUG: checkpoint sync: number=8 file=base/16384/11712 time=11.314 msec DEBUG: checkpoint sync: number=9 file=base/16384/16397_vm time=11.103 msec DEBUG: checkpoint sync: number=10 file=base/16384/16385 time=19.308 msec DEBUG: attempting to remove WAL segments older than log file 000000010000000000000000 DEBUG: SlruScanDirectory invoking callback on pg_subtrans/0000 LOG: checkpoint complete: wrote 1659 buffers (54.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.025 s, sync=15.617 s, total=15.898 s; sync files=10, longest=12.306 s, average=1.561 s LOG: database system is shut down Then I started the server again and it ran properly. Well, all the tests was running with the default postgresql.conf in my laptop but I'll setup a more "real world" environment to test for performance regression. Until now I couldn't notice any significant difference in TPS before and after patch in a small environment. I'll post something soon. Best regards, -- Dickson S. Guedes mail/xmpp: guedes@guedesoft.net - skype: guediz http://guedesoft.net - http://www.postgresql.org.br
On Wed, Oct 5, 2011 at 5:10 AM, Dickson S. Guedes <listas@guedesoft.net> wrote: > Ah ok! I started reviewing the v4 patch version, this is my comments: ... > Well, all the tests was running with the default postgresql.conf in my > laptop but I'll setup a more "real world" environment to test for > performance regression. Until now I couldn't notice any significant > difference in TPS before and after patch in a small environment. I'll > post something soon. Great testing, thanks. Likely will have no effect in non-I/O swamped environment, but no regression expected either. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Oct 5, 2011 at 8:02 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, Oct 5, 2011 at 5:10 AM, Dickson S. Guedes <listas@guedesoft.net> wrote: > >> Ah ok! I started reviewing the v4 patch version, this is my comments: > > ... > >> Well, all the tests was running with the default postgresql.conf in my >> laptop but I'll setup a more "real world" environment to test for >> performance regression. Until now I couldn't notice any significant >> difference in TPS before and after patch in a small environment. I'll >> post something soon. > > Great testing, thanks. Likely will have no effect in non-I/O swamped > environment, but no regression expected either. Any reason or objection to committing this patch? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Oct 18, 2011 at 9:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Any reason or objection to committing this patch? Not on my end, though I haven't reviewed it in detail. One minor note - I was mildly surprised to see that you moved this to the checkpointer rather than leaving it in the bgwriter: + /* Do this once before starting the loop, then just at SIGHUP time. */ + SyncRepUpdateSyncStandbysDefined(); My preference would probably have been to leave that in the background writer, on the theory that the checkpointer's work is likely to be more bursty and therefore it might be less responsive. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Oct 18, 2011 at 5:39 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Oct 18, 2011 at 9:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Any reason or objection to committing this patch? > > Not on my end, though I haven't reviewed it in detail. One minor note > - I was mildly surprised to see that you moved this to the > checkpointer rather than leaving it in the bgwriter: > > + /* Do this once before starting the loop, then just at SIGHUP time. */ > + SyncRepUpdateSyncStandbysDefined(); > > My preference would probably have been to leave that in the background > writer, on the theory that the checkpointer's work is likely to be > more bursty and therefore it might be less responsive. That needs to be in the checkpointer because that is the process that shuts down last. The bgwriter is now more like the walwriter. It shuts down early in the shutdown process, while the checkpointer is last out. So it wasn't preference, it was a requirement of the new role definitions. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Oct 18, 2011 at 12:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Tue, Oct 18, 2011 at 5:39 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Tue, Oct 18, 2011 at 9:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> Any reason or objection to committing this patch? >> >> Not on my end, though I haven't reviewed it in detail. One minor note >> - I was mildly surprised to see that you moved this to the >> checkpointer rather than leaving it in the bgwriter: >> >> + /* Do this once before starting the loop, then just at SIGHUP time. */ >> + SyncRepUpdateSyncStandbysDefined(); >> >> My preference would probably have been to leave that in the background >> writer, on the theory that the checkpointer's work is likely to be >> more bursty and therefore it might be less responsive. > > That needs to be in the checkpointer because that is the process that > shuts down last. > > The bgwriter is now more like the walwriter. It shuts down early in > the shutdown process, while the checkpointer is last out. > > So it wasn't preference, it was a requirement of the new role definitions. Oh, I see. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Oct 18, 2011 at 10:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > Any reason or objection to committing this patch? The checkpointer doesn't call pgstat_send_bgwriter(), but it should. Otherwise, some entries in pg_stat_bgwriter will never be updated. If we adopt the patch, checkpoint is performed by checkpointer. So it looks odd that information related to checkpoint exist in pg_stat_bgwriter. We should move them to new catalog even if it breaks the compatibility? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
2011/10/18 Simon Riggs <simon@2ndquadrant.com>: > On Wed, Oct 5, 2011 at 8:02 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> On Wed, Oct 5, 2011 at 5:10 AM, Dickson S. Guedes <listas@guedesoft.net> wrote: >> >>> Ah ok! I started reviewing the v4 patch version, this is my comments: >> >> ... >> >>> Well, all the tests was running with the default postgresql.conf in my >>> laptop but I'll setup a more "real world" environment to test for >>> performance regression. Until now I couldn't notice any significant >>> difference in TPS before and after patch in a small environment. I'll >>> post something soon. >> >> Great testing, thanks. Likely will have no effect in non-I/O swamped >> environment, but no regression expected either. > > > Any reason or objection to committing this patch? I didn't see any performance regression (as expected) in the environments that I tested. About the code, I prefer someone with more experience to review it. Thanks. -- Dickson S. Guedes mail/xmpp: guedes@guedesoft.net - skype: guediz http://guedesoft.net - http://www.postgresql.org.br
2011/10/19 Fujii Masao <masao.fujii@gmail.com>: > On Tue, Oct 18, 2011 at 10:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Any reason or objection to committing this patch? > > The checkpointer doesn't call pgstat_send_bgwriter(), but it should. > Otherwise, some entries in pg_stat_bgwriter will never be updated. Yes, checkpoints_req, checkpoints_timed and buffer_checkpoint are not being updated with this patch. > If we adopt the patch, checkpoint is performed by checkpointer. So > it looks odd that information related to checkpoint exist in > pg_stat_bgwriter. We should move them to new catalog even if > it breaks the compatibility? Splitting pg_stat_bgwriter into pg_stat_bgwriter and pg_stat_checkpointer will break something internal? With this modification we'll see applications like monitoring tools breaking, but they could use a view to put data back together in a compatible way, IMHO. -- Dickson S. Guedes mail/xmpp: guedes@guedesoft.net - skype: guediz http://guedesoft.net - http://www.postgresql.org.br
On Wed, Oct 19, 2011 at 8:43 AM, Dickson S. Guedes <listas@guedesoft.net> wrote: > 2011/10/19 Fujii Masao <masao.fujii@gmail.com>: >> On Tue, Oct 18, 2011 at 10:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> Any reason or objection to committing this patch? >> >> The checkpointer doesn't call pgstat_send_bgwriter(), but it should. >> Otherwise, some entries in pg_stat_bgwriter will never be updated. > > Yes, checkpoints_req, checkpoints_timed and buffer_checkpoint are not > being updated with this patch. > >> If we adopt the patch, checkpoint is performed by checkpointer. So >> it looks odd that information related to checkpoint exist in >> pg_stat_bgwriter. We should move them to new catalog even if >> it breaks the compatibility? > > Splitting pg_stat_bgwriter into pg_stat_bgwriter and > pg_stat_checkpointer will break something internal? > > With this modification we'll see applications like monitoring tools > breaking, but they could use a view to put data back together in a > compatible way, IMHO. I don't really see any reason to break the monitoring view just because we did some internal refactoring. I'd rather have backward compatibility. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Oct 19, 2011 at 9:45 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I don't really see any reason to break the monitoring view just > because we did some internal refactoring. I'd rather have backward > compatibility. Fair enough. The patch doesn't change any document, but at least the description of pg_stat_bgwriter seems to need to be changed. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Oct 19, 2011 at 3:29 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Wed, Oct 19, 2011 at 9:45 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> I don't really see any reason to break the monitoring view just >> because we did some internal refactoring. I'd rather have backward >> compatibility. > > Fair enough. > > The patch doesn't change any document, but at least the description > of pg_stat_bgwriter seems to need to be changed. Thanks for the review. Will follow up on suggestions. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 19.10.2011 17:58, Simon Riggs wrote: > On Wed, Oct 19, 2011 at 3:29 PM, Fujii Masao<masao.fujii@gmail.com> wrote: >> On Wed, Oct 19, 2011 at 9:45 PM, Robert Haas<robertmhaas@gmail.com> wrote: >>> I don't really see any reason to break the monitoring view just >>> because we did some internal refactoring. I'd rather have backward >>> compatibility. >> >> Fair enough. >> >> The patch doesn't change any document, but at least the description >> of pg_stat_bgwriter seems to need to be changed. > > Thanks for the review. > > Will follow up on suggestions. The patch looks sane, it's mostly just moving existing code around, but there's one thing that's been bothering me about this whole idea from the get-go: If the bgwriter and checkpointer are two different processes, whenever bgwriter writes out a page it needs to send an fsync-request to the checkpointer. We avoided that when both functions were performed by the same process, but now we have to send and absorb a fsync-request message for every single write() that happens in the system, except for those done at checkpoints. Isn't that very expensive? Does it make the fsync-request queue a bottleneck on some workloads? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, Oct 24, 2011 at 11:40 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > The patch looks sane, it's mostly just moving existing code around, but > there's one thing that's been bothering me about this whole idea from the > get-go: > > If the bgwriter and checkpointer are two different processes, whenever > bgwriter writes out a page it needs to send an fsync-request to the > checkpointer. We avoided that when both functions were performed by the same > process, but now we have to send and absorb a fsync-request message for > every single write() that happens in the system, except for those done at > checkpoints. Isn't that very expensive? Does it make the fsync-request queue > a bottleneck on some workloads? That is a reasonable question and one I considered. I did some benchmarking earlier to see the overhead of that. Basically, its very small, much, much smaller than I thought. The benefit of allowing the bgwriter to continue working during long fsyncs easily outweighs the loss of doing more fsync-requests. Both of those overheads/problems occur at the same time so there is the overhead is always covered. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Oct 19, 2011 at 3:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, Oct 19, 2011 at 3:29 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Wed, Oct 19, 2011 at 9:45 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> I don't really see any reason to break the monitoring view just >>> because we did some internal refactoring. I'd rather have backward >>> compatibility. >> >> Fair enough. >> >> The patch doesn't change any document, but at least the description >> of pg_stat_bgwriter seems to need to be changed. > > Thanks for the review. > > Will follow up on suggestions. I'll add this in as a separate item after commit. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services