Обсуждение: Expose checkpoint start/finish times into SQL.
First whack at exposing the start and finish checkpoint times into SQL. -- Theo Schlossnagle Esoteric Curio -- http://lethargy.org/ OmniTI Computer Consulting, Inc. -- http://omniti.com/
Вложения
Theo Schlossnagle wrote: > First whack at exposing the start and finish checkpoint times into SQL. I suggest using GetCurrentTimestamp() directly instead of time_t and converting. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Theo Schlossnagle wrote: > First whack at exposing the start and finish checkpoint times into SQL. Why is that useful? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 03 Apr 2008 23:21:49 +0100 Heikki Linnakangas <heikki@enterprisedb.com> wrote: > Theo Schlossnagle wrote: > > First whack at exposing the start and finish checkpoint times into > > SQL. > > Why is that useful? For knowing how long checkpoints are taking. If they are taking too long you may need to adjust your bgwriter settings, and it is a serious drag to parse postgresql logs for this info. Joshua D. Drake - -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFH9VpcATb/zqfZUUQRAiFwAJ0W7uu4Xk4DgXph1JaL180XfsAKpQCghDRw GYNE9ouPjlRhEqUmxwktDYc= =DRsZ -----END PGP SIGNATURE-----
Heikki Linnakangas <heikki@enterprisedb.com> writes: > Theo Schlossnagle wrote: >> First whack at exposing the start and finish checkpoint times into SQL. > Why is that useful? Does this implementation even work? It looks to me like the globalStats.last_checkpoint_start/done fields will go back to zero the very next time the bgwriter sends a stats message. I'm not sure what a sane behavior would be, but it seems unlikely that that's it. regards, tom lane
Joshua D. Drake wrote: >> Theo Schlossnagle wrote: >> >>> First whack at exposing the start and finish checkpoint times into >>> SQL. >>> >> Why is that useful? >> > > For knowing how long checkpoints are taking. If they are taking too > long you may need to adjust your bgwriter settings, and it is a > serious drag to parse postgresql logs for this info. > > > Even if this were true, surely the answer is to improve the logging. Has this feature been discussed on -hackers? I don't recall it (and my memory has plenty of holes in it), but I'm sure that after attending my talk last Sunday Theo hasn't sent in a patch for an undiscussed feature ;-) cheers andrew
"Joshua D. Drake" <jd@commandprompt.com> writes: > Heikki Linnakangas <heikki@enterprisedb.com> wrote: >> Why is that useful? > For knowing how long checkpoints are taking. If they are taking too > long you may need to adjust your bgwriter settings, and it is a > serious drag to parse postgresql logs for this info. 1. To do anything useful along those lines, you would need to look at a lot of checkpoints over time, which is what log_checkpoints is good for. This patch only tells you about the latest, which isn't very useful for making any good decisions about parameters. 2. If I read the patch correctly, half of the time what you'd be seeing is the start time of the currently-active checkpoint and the completion time of the prior checkpoint. I don't know what those numbers are good for at all. 3. As of PG 8.3, the bgwriter tries very hard to make the elapsed time of a checkpoint be just about checkpoint_timeout * checkpoint_completion_target, regardless of load factors. So unless your settings are completely broken, measuring the actual time isn't going to tell you much. In short: Heikki's question is on point. regards, tom lane
On Thursday 03 April 2008 19:08, Andrew Dunstan wrote: > Joshua D. Drake wrote: > >> Theo Schlossnagle wrote: > >>> First whack at exposing the start and finish checkpoint times into > >>> SQL. > >> > >> Why is that useful? > > > > For knowing how long checkpoints are taking. If they are taking too > > long you may need to adjust your bgwriter settings, and it is a > > serious drag to parse postgresql logs for this info. > > Even if this were true, surely the answer is to improve the logging. > Exposing everything into the log files isn't always sufficient (says the guy who maintains a remote admin tool) -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Robert Treat wrote: > On Thursday 03 April 2008 19:08, Andrew Dunstan wrote: > >> Joshua D. Drake wrote: >> >>>> Theo Schlossnagle wrote: >>>> >>>>> First whack at exposing the start and finish checkpoint times into >>>>> SQL. >>>>> >>>> Why is that useful? >>>> >>> For knowing how long checkpoints are taking. If they are taking too >>> long you may need to adjust your bgwriter settings, and it is a >>> serious drag to parse postgresql logs for this info. >>> >> Even if this were true, surely the answer is to improve the logging. >> >> > > Exposing everything into the log files isn't always sufficient (says the guy > who maintains a remote admin tool) > It should be now that you can have machine readable logs (says the guy who literally spent weeks making that happen) ;-) cheers andrew
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 03 Apr 2008 20:29:18 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Joshua D. Drake" <jd@commandprompt.com> writes: > > Heikki Linnakangas <heikki@enterprisedb.com> wrote: > >> Why is that useful? > > > For knowing how long checkpoints are taking. If they are taking too > > long you may need to adjust your bgwriter settings, and it is a > > serious drag to parse postgresql logs for this info. > > 1. To do anything useful along those lines, you would need to look at > a lot of checkpoints over time, which is what log_checkpoints is good > for. This patch only tells you about the latest, which isn't very > useful for making any good decisions about parameters. I would agree with this. We would need a history of checkpoints that didn't reset until we told it to. > > 2. If I read the patch correctly, half of the time what you'd be > seeing is the start time of the currently-active checkpoint and the > completion time of the prior checkpoint. I don't know what those > numbers are good for at all. IMO we should see start checkpoint, end checkpoint. As part of a single record. Other possibly interesting info would be how many logs were processed. > > 3. As of PG 8.3, the bgwriter tries very hard to make the elapsed time > of a checkpoint be just about checkpoint_timeout * > checkpoint_completion_target, regardless of load factors. So unless > your settings are completely broken, measuring the actual time isn't > going to tell you much. > You assume that people don't have broken settings. :) > In short: Heikki's question is on point. I didn't say it wasn't. I was just stating why a patch that does what was described is useful. I believe with changes it would still be useful. Having to go to log for a lot of this stuff is painful. Joshua D. Drake - -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFH9YETATb/zqfZUUQRAmiTAJ0Shc4rSIKRG5nabAv9RwW1MVi/BQCfUIiK Nb8qyBonAlNl/Agp/wCyvTU= =H8TF -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 03 Apr 2008 20:45:37 -0400 Andrew Dunstan <andrew@dunslane.net> wrote: > > Exposing everything into the log files isn't always sufficient > > (says the guy who maintains a remote admin tool) > > > > It should be now that you can have machine readable logs (says the > guy who literally spent weeks making that happen) ;-) And how does the person get access to those? And what script do I need to write to make it happen? Don't get me wrong, the feature you worked entirely too hard on to get working is valuable but... being able to say, "SELECT * FROM give_me_my_db_info;" is much more useful in this context. In short, I should never have to go to log for this class of information. It should be available in the database. Sincerely, Joshua D. Drake - -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFH9YF9ATb/zqfZUUQRAp+fAKCIOOKCKv5FaWiOF7hZjtLr8PzfGQCfRpq+ lY353B+9fmWwxWppAkhncMY= =BwYD -----END PGP SIGNATURE-----
"Joshua D. Drake" <jd@commandprompt.com> writes: > I would agree with this. We would need a history of checkpoints that > didn't reset until we told it to. Indeed, but the submitted patch has nought whatsoever to do with that. It exposes some instantaneous state. You could perhaps *build* a log facility on top of that, at the SQL level; but I don't see the point, and I definitely disagree that it would be "easier than trolling the logs". regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 03 Apr 2008 21:26:46 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Joshua D. Drake" <jd@commandprompt.com> writes: > > I would agree with this. We would need a history of checkpoints that > > didn't reset until we told it to. > > Indeed, but the submitted patch has nought whatsoever to do with that. > It exposes some instantaneous state. > > You could perhaps *build* a log facility on top of that, at the SQL > level; but I don't see the point, and I definitely disagree that it > would be "easier than trolling the logs". Having the ability to do this: SELECT * FROM pg_stat_bgwriter WHERE last_checkpoint BETWEEN (current_time - '1 Day'::interval) AND current_time; Would be very useful. Which I can do with logs currently. You are correct. However from a usability, remote reporting and manageability perspective it certainly is not the same as something as I describe above. Note I am perfectly willing to table this until we have a full todo and specification for a feature. Sincerely, Joshua D. Drake - -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFH9YVlATb/zqfZUUQRAnRxAKCab3O4dmBXctXTptDFwkRx+1zUQQCdFmsN E3GWNoC90jS7ooFgArR8Nv0= =CvMx -----END PGP SIGNATURE-----
On Apr 3, 2008, at 7:08 PM, Andrew Dunstan wrote: > > > Joshua D. Drake wrote: >>> Theo Schlossnagle wrote: >>> >>>> First whack at exposing the start and finish checkpoint times into >>>> SQL. >>>> >>> Why is that useful? >>> >> >> For knowing how long checkpoints are taking. If they are taking too >> long you may need to adjust your bgwriter settings, and it is a >> serious drag to parse postgresql logs for this info. >> >> >> > > Even if this were true, surely the answer is to improve the logging. > > Has this feature been discussed on -hackers? I don't recall it (and > my memory has plenty of holes in it), but I'm sure that after > attending my talk last Sunday Theo hasn't sent in a patch for an > undiscussed feature ;-) Andrew: I don't think this feature has been discussed on hackers. The patch took about 15 minutes to author, so it sounds like the most concise way to start a conversation. Seems silly to start the conversation on hackers with a patch. :-) Alvaro: Thanks, I flip that to GetCurrentTimestamp() Heikki: It it useful for knowing when the last checkpoint occurred. Like Robert, we have situations where reading the log file is a PITA -- so this provides that information. I originally planned on only adding the start time, but figured adding the end would make sense too. Tom: It worked for me in my testing, though I did not extensively tested. I didn't see anywhere the stats are zero'd out, so I believe the timestamp is zero at start and then only ever set to the starttime during a checkpoint invocation. I admittedly don't have a thorough understanding of that code -- but that segment (before my patch) looked pretty concise. -- Theo Schlossnagle Esoteric Curio -- http://lethargy.org/ OmniTI Computer Consulting, Inc. -- http://omniti.com/
Joshua D. Drake wrote: >>> Exposing everything into the log files isn't always sufficient >>> (says the guy who maintains a remote admin tool) >>> >>> >> It should be now that you can have machine readable logs (says the >> guy who literally spent weeks making that happen) ;-) >> > > And how does the person get access to those? And what script do I need > to write to make it happen? Don't get me wrong, the feature you worked > entirely too hard on to get working is valuable but... being able to > say, "SELECT * FROM give_me_my_db_info;" is much more useful in this > context. > How to load the CSV logs is very clearly documented. It's really *very* easy, so easy it's mostly C&P. See http://www.postgresql.org/docs/current/static/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-CSVLOG If you are trying to tell me that that's too hard for a DBA, then I have to say you need better DBAs. > In short, I should never have to go to log for this class of > information. It should be available in the database. > > What you haven't explained is why this information needs to be kept in the db on a historical basis, as opposed to all the other possible diagnostics where history might be useful (and, as Tom has pointed out, this patch doesn keep it historically any way). I think there is quite possibly a good case for keeping some diagnostics in a table or tables, on a rolling basis, maybe. But then that's a facility that needs to be properly designed. cheers andrew
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 03 Apr 2008 21:44:00 -0400 Andrew Dunstan <andrew@dunslane.net> wrote: > I think there is quite possibly a good case for keeping some > diagnostics in a table or tables, on a rolling basis, maybe. But then > that's a facility that needs to be properly designed. Please see my later email on the thread. Joshua D. Drake - -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFH9YjfATb/zqfZUUQRAnrwAKCUW76EI+lnz+qGJXFPyp9QqWJ9DgCgpnmP eiGM+P4P79O20VgBm6ew1u4= =/abt -----END PGP SIGNATURE-----
Theo Schlossnagle <jesus@omniti.com> writes: > Heikki: It it useful for knowing when the last checkpoint occurred. I guess I'm wondering why that's important. In the current bgwriter design, the system spends half its time checkpointing (or in general checkpoint_completion_target % of the time). So this seems fairly close to wanting to know when the bgwriter last wrote a dirty buffer --- yeah, I can imagine scenarios for wanting to know that, but they probably require a whole pile of other knowledge as well. JD seems to have gone off into the weeds imagining that this patch would provide tracking of the last N checkpoints; which might start to approach the level of an interesting feature, except that it's still not clear *why* those numbers are interesting, given the bgwriter's propensity to try to hold the checkpoint duration constant. regards, tom lane
Theo Schlossnagle wrote: > >> >> >> Has this feature been discussed on -hackers? I don't recall it (and >> my memory has plenty of holes in it), but I'm sure that after >> attending my talk last Sunday Theo hasn't sent in a patch for an >> undiscussed feature ;-) > > > Andrew: I don't think this feature has been discussed on hackers. The > patch took about 15 minutes to author, so it sounds like the most > concise way to start a conversation. Seems silly to start the > conversation on hackers with a patch. :-) > > Well, not really. I believe -hackers has a much larger readership than -patches, so even for small features we generally want them discussed there. cheers andrew
On Thursday 03 April 2008 21:14, Joshua D. Drake wrote: > On Thu, 03 Apr 2008 20:29:18 -0400 > > Tom Lane <tgl@sss.pgh.pa.us> wrote: > > "Joshua D. Drake" <jd@commandprompt.com> writes: > > > Heikki Linnakangas <heikki@enterprisedb.com> wrote: > > >> Why is that useful? > > > > > > For knowing how long checkpoints are taking. If they are taking too > > > long you may need to adjust your bgwriter settings, and it is a > > > serious drag to parse postgresql logs for this info. > > > > 1. To do anything useful along those lines, you would need to look at > > a lot of checkpoints over time, which is what log_checkpoints is good > > for. This patch only tells you about the latest, which isn't very > > useful for making any good decisions about parameters. > > I would agree with this. We would need a history of checkpoints that > didn't reset until we told it to. > You can plug a single item graphed over time into things like rrdtool to get good trending information. And it's often easier to do this using sql interfaces to get the data than pulling it out of log files (almost like the db was designed for that :-) > > 2. If I read the patch correctly, half of the time what you'd be > > seeing is the start time of the currently-active checkpoint and the > > completion time of the prior checkpoint. I don't know what those > > numbers are good for at all. > Knowing when the last checkpoint occured is certainly useful for monitoring purposes (wrt pitr and as a general item for cuasing alerts if checkpoints stop occuring frequently enough, or if they start taking too long.) > > > 3. As of PG 8.3, the bgwriter tries very hard to make the elapsed time > > of a checkpoint be just about checkpoint_timeout * > > checkpoint_completion_target, regardless of load factors. So unless > > your settings are completely broken, measuring the actual time isn't > > going to tell you much. > How does one measure when the bgwriter is failing at this effort? -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Robert Treat <xzilla@users.sourceforge.net> writes: >> Tom Lane <tgl@sss.pgh.pa.us> wrote: > 3. As of PG 8.3, the bgwriter tries very hard to make the elapsed time > of a checkpoint be just about checkpoint_timeout * > checkpoint_completion_target, regardless of load factors. So unless > your settings are completely broken, measuring the actual time isn't > going to tell you much. > How does one measure when the bgwriter is failing at this effort? Well, not with *this* patch. At least not without adding a lot of infrastructure on top of it, and I'm failing to see why you'd build such infrastructure in order to track just two numbers that are of uncertain value. JD seems to be on record that the existing logging mechanism sucks and he needs something else. That's fine, but I think it means that we need to improve logging in general, not invent a single-purpose mechanism for logging checkpoint times. Theo claimed he had a reason for wanting to know the latest checkpoint time, *without* any intention of time-extended tracking of that; but he didn't say what it was. If there is a credible reason for that then it might justify a patch of this nature, but I don't see that the reasons that have been stated so far in the thread hold any water. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 03 Apr 2008 22:33:15 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > JD seems to be on record that the existing logging mechanism sucks > and he needs something else. That's fine, but I think it means that > we need to improve logging in general, not invent a single-purpose > mechanism for logging checkpoint times. Alright hold on. That is *not* what I said. I said *in this context* an SQL interface would be nicer. I also said that I was willing to table the entire discussion based on your review of the patch in that it didn't offer what I thought it did. Sincerely, Joshua D. Drake - -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFH9ZVSATb/zqfZUUQRAhDJAJ49gMASDA40N9ydhzTTaMBdr/KccQCfei3d 4QXp/4C6iH7GXatqwU5qnuk= =slQS -----END PGP SIGNATURE-----
On Apr 3, 2008, at 10:33 PM, Tom Lane wrote: > Theo claimed he had a reason for wanting to know the latest checkpoint > time, *without* any intention of time-extended tracking of that; but > he didn't say what it was. If there is a credible reason for that > then it might justify a patch of this nature, but I don't see that > the reasons that have been stated so far in the thread hold any water. We had a recent event where the system stopped checkpointing, which eventually lead to Robert Treat's therapy session at PostgreSQL Conference East. If you have been good to be able to remotely (via SQL) monitor the system to ensure that the last checkpoint start time is within a reasonable window from current_timestamp. My patch was clearly single purposed and never intended to be a SQL accessible log of checkpoint times. So, in reality, the "finish time" of the checkpoint is really not of any importance to me either. -- Theo Schlossnagle Esoteric Curio -- http://lethargy.org/ OmniTI Computer Consulting, Inc. -- http://omniti.com/
Theo Schlossnagle <jesus@omniti.com> writes: > On Apr 3, 2008, at 10:33 PM, Tom Lane wrote: >> Theo claimed he had a reason for wanting to know the latest checkpoint >> time, *without* any intention of time-extended tracking of that; but >> he didn't say what it was. > We had a recent event where the system stopped checkpointing, which > eventually lead to Robert Treat's therapy session at PostgreSQL > Conference East. Therapy is great ;-) ... but "the system stopped checkpointing" does not strike me as a routine occurrence that we should be making provisions for DBAs to watch for. What, pray tell, is the DBA supposed to do when and if he notices that? Not to mention that since checkpointing is ingrained in the bgwriter, you've got worse problems than that if it's not working. I'd much rather be spending our time and effort on understanding what broke for you, and fixing the code so it doesn't happen again. An ounce of prevention is worth a pound of cure, and all that... regards, tom lane
On Thu, 3 Apr 2008, Robert Treat wrote: > You can plug a single item graphed over time into things like rrdtool to > get good trending information. And it's often easier to do this using > sql interfaces to get the data than pulling it out of log files (almost > like the db was designed for that :-) The pg_stat_bgwriter value for buffers_checkpoint was intentionally implemented in 8.3 such that it jumps in one big lump when the checkpoint is done. While it's not the ideal interface for what you're looking for, the reason for that is to made it possible to build a "when was the last checkpoint finished?" interface via some remote monitoring tool just by determining the last time that the value jumped upwards. You can easily see them just by graphing that value, it shouldn't be too hard to teach something with rrdtool guts to find them. Since checkpoints have a fairly predictable duration in 8.3, as long as you catch the start or end of them you can make a resonable guess where the other side was. The case you're trying to avoid here, the system going a long time without checkpointing, can be implemented by looking for a begin or end regularly, you don't need to track both. As long as there's a checkpoint finish "pulse" in buffers_checkpoint showing up regularly you're fine. The only situation I can think of where this might be problematic is where the system has been idle enough to not have any buffers to write at checkpoint time, but I recall a code path there where checkpoints stop altogether unless there's been activity so even tracking the time may not change that. Ultimately a lot of the other questions you might ask (i.e. "how many buffers have been written per hour by checkpoints?") require processing the numbers in this way anyway, and I thought this implementation was good enough to monitor the situation you're trying to avoid--presuming you're using some sort of moderately powerful remote monitoring tool. Theo's patch would make it easier to answer with a simple command which has some value; a little SQL in a cron job would be good enough to trigger an alert rather than needing a real monitoring probe. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith wrote: > On Thu, 3 Apr 2008, Robert Treat wrote: > > > You can plug a single item graphed over time into things like rrdtool to > > get good trending information. And it's often easier to do this using > > sql interfaces to get the data than pulling it out of log files (almost > > like the db was designed for that :-) > > The pg_stat_bgwriter value for buffers_checkpoint was intentionally > implemented in 8.3 such that it jumps in one big lump when the checkpoint > is done. While it's not the ideal interface for what you're looking for, > the reason for that is to made it possible to build a "when was the last > checkpoint finished?" interface via some remote monitoring tool just by > determining the last time that the value jumped upwards. You can easily > see them just by graphing that value, it shouldn't be too hard to teach > something with rrdtool guts to find them. > > Since checkpoints have a fairly predictable duration in 8.3, as long as > you catch the start or end of them you can make a resonable guess where > the other side was. The case you're trying to avoid here, the system > going a long time without checkpointing, can be implemented by looking for > a begin or end regularly, you don't need to track both. As long as > there's a checkpoint finish "pulse" in buffers_checkpoint showing up > regularly you're fine. The only situation I can think of where this might > be problematic is where the system has been idle enough to not have any > buffers to write at checkpoint time, but I recall a code path there where > checkpoints stop altogether unless there's been activity so even tracking > the time may not change that. If we want to expose more of the internals of Postgres via functions, fine, but a one-off function just to cover something that happened to one user just doesn't make sense. We need a plan on exactly what we want to expose in a coherent way, and how. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Thu, 3 Apr 2008, Joshua D. Drake wrote: > For knowing how long checkpoints are taking. If they are taking too > long you may need to adjust your bgwriter settings, and it is a > serious drag to parse postgresql logs for this info. There's some disconnect here between what I think you want here and what Theo's patch does. If I were running an 8.2 system, there's a whole lot of diagnostic information I'd like to have handy in order to optimize checkpoints, as you describe. But this patch doesn't help 8.2 users, because the whole pg_stat_bgwriter structure is 8.3 only--it provides much of what is needed in this area. In 8.3, there is very little background writer tuning to do. What tuning you can do is easy to figure out from what's in pg_stat_bgwriter. The new checkpoint_completion_target feature works better than tuning the 8.2 background writer to reduce checkpoint spikes ever did anyway, and that's the main thing that drives how long checkpoints take to process now. So in the only situation where this patch can be applied, 8.3, I don't think the scenario you describe is likely to pop up. And the combination of two 8.3 features (log_checkpoint and the CSV logs, it's no coincidence I worked on both) should allow a good enough way to hack around this area when it does. Just make the logs rotate fairly regularly, and every time they do import the last CSV section that's now finished into a table so you can compute statistics about how the checkpoints are doing there. I don't see why that sort of stuff should go into core when it's now easy to do outside of it. I have a whole stack of scripts in that area I plan to release over the summer. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Thu, 3 Apr 2008, Tom Lane wrote: > As of PG 8.3, the bgwriter tries very hard to make the elapsed time of a > checkpoint be just about checkpoint_timeout * > checkpoint_completion_target, regardless of load factors. In the cases where the timing on checkpoint writes are timeout driven. When they're checkpoint_segments driven the time is derived from that drumbeat instead. And those cases are the more interesting ones, because those are the ones that will speed up during really busy periods. I think it's completely relevant to track how often checkpoints are happening because they do still vary even in 8.3. I'm just not sure if the current methods available for that really aren't good enough, or if it's just the case that not everyone is aware of all of them. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Friday 04 April 2008 00:09, Greg Smith wrote: > On Thu, 3 Apr 2008, Robert Treat wrote: > > You can plug a single item graphed over time into things like rrdtool to > > get good trending information. And it's often easier to do this using > > sql interfaces to get the data than pulling it out of log files (almost > > like the db was designed for that :-) > > The pg_stat_bgwriter value for buffers_checkpoint was intentionally > implemented in 8.3 such that it jumps in one big lump when the checkpoint > is done. While it's not the ideal interface for what you're looking for, > the reason for that is to made it possible to build a "when was the last > checkpoint finished?" interface via some remote monitoring tool just by > determining the last time that the value jumped upwards. You can easily > see them just by graphing that value, it shouldn't be too hard to teach > something with rrdtool guts to find them. > the advantage of using a timestamp is that you get the incrementing counter for free which is certainly helpful in third party tools like phppgadmin that don't instrument tracking methods; you can look at the system and it's settings and have a pretty good idea if something is awry. > > Ultimately a lot of the other questions you might ask (i.e. "how many > buffers have been written per hour by checkpoints?") require processing > the numbers in this way anyway, and I thought this implementation was good > enough to monitor the situation you're trying to avoid--presuming you're > using some sort of moderately powerful remote monitoring tool. Theo's > patch would make it easier to answer with a simple command which has some > value; a little SQL in a cron job would be good enough to trigger an alert > rather than needing a real monitoring probe. > Yes, the idea of making a basic nagios/munin check that is readily consumable by the general public is certainly a bonus in my eyes. I have to add, given that we already provide the time of last checkpoint information via pg_controldata, I don't understand why people are against making that information accesible to remote clients. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Robert Treat <xzilla@users.sourceforge.net> writes: > I have to add, given that we already provide the time of last checkpoint > information via pg_controldata, I don't understand why people are against > making that information accesible to remote clients. So, I can expect to see a patch next week that invents a separate SQL function to expose each field printed by pg_controldata? You are failing to make the distinction between debug/developer support (which is what pg_controldata is) and things that are interesting for DBAs to track. Would you expect someone to install handwritten monitoring infrastructure to awaken them in the middle of the night when pg_control_version changes? I am not saying that there is no use for some sort of facility in this area. But I think it needs some more-general thinking and design than has been evidenced so far in this thread. regards, tom lane
On Thu, 3 Apr 2008, Tom Lane wrote: > "the system stopped checkpointing" does not strike me as a routine > occurrence that we should be making provisions for DBAs to watch for. > What, pray tell, is the DBA supposed to do when and if he notices that? Schedule downtime rather than wait for it to happen unpredictably when the inevitable crash happens. > I'd much rather be spending our time and effort on understanding what > broke for you, and fixing the code so it doesn't happen again. (Here I start laughing all over again as I recall Robert's talk, which we really need to get you in particular the video of) Their situation had possible causes that included a bit flipping in bad memory, which is pretty hard to code around (unless you're a Core Wars veteran). I'm familiar with that part of the checkpoint code path, and everything I was able to think of when hearing the outline of events was already considered and rejected as not being a likely cause. This patch comes out of pragmatic acceptance that, sometimes, stuff will happen you can't easily explain, but that doesn't mean it's not worth keeping an eye on it anyway so it doesn't sneak up on you again. Anyway, I think this whole thing would be better handled by a larger internals view that this whole codebase could use a dose of anyway. What I really want is an interface like this: psql> select pg_internals('last_checkpoint_time'); and then start sprinkling exports of those probe points in some popular places people would like to look at. I will apologize now for suggesting this, followed by not having enough time to code it in the near future. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith <gsmith@gregsmith.com> writes: > On Thu, 3 Apr 2008, Tom Lane wrote: >> I'd much rather be spending our time and effort on understanding what >> broke for you, and fixing the code so it doesn't happen again. > [ shit happens... ] Completely fair, but I still don't see how this particular patch would be a useful addition to the DBA's monitoring-tool arsenal. The scope seems too narrow. (And you still didn't tell me what the actual failure case was.) regards, tom lane
On Fri, 4 Apr 2008, Tom Lane wrote: > (And you still didn't tell me what the actual failure case was.) Database stops checkpointing. WAL files pile up. In the middle of backup, system finally dies, and when it starts recovery there's a bad record in the WAL files--which there are now thousands of to apply, and the bad one is 4 hours of replay in. Believe it or not, it goes downhill from there. It's what kicked off the first step that's the big mystery. The only code path I thought of that can block checkpoints like this is when the archive_command isn't working anymore, and that wasn't being used. Given some of the other corruption found later and the bad memory issues discovered, a bit flipping in the "do I need to checkpoint now?" code or data seems just as likely as any other explanation. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith <gsmith@gregsmith.com> writes: > On Fri, 4 Apr 2008, Tom Lane wrote: >> (And you still didn't tell me what the actual failure case was.) > Database stops checkpointing. WAL files pile up. In the middle of > backup, system finally dies, and when it starts recovery there's a bad > record in the WAL files--which there are now thousands of to apply, and > the bad one is 4 hours of replay in. Believe it or not, it goes downhill > from there. > It's what kicked off the first step that's the big mystery. Indeed :-(. But given those observations, I'd still have about zero faith in the usefulness of this patch. If the bgwriter is not able to complete checkpoints, is it able to tell you the truth about what it's doing? The actual advice I'd give to a DBA faced with such a case is to kill -ABRT the bgwriter and send the stack trace to -hackers. That's not in the proposed patch though... regards, tom lane
On Fri, 4 Apr 2008, Tom Lane wrote: > The actual advice I'd give to a DBA faced with such a case is to > kill -ABRT the bgwriter and send the stack trace to -hackers. And that's a perfect example of where they're trying to get to. They didn't notice the problem until after the crash. The server didn't come back up (busy processing WALs) and that downtime was caught by a monitoring system. At that point it was too late to collect debugging information on what was wrong inside the server processes that might have given a clue what happened. If they'd have noticed it while the server was up, perhaps because the "last checkpoint" value hadn't changed in a long time (which seems like it might be available via stats even if, as you say, the background writer is out of its mind at that point), they could have done such a kill and collected some actual useful information here. That's the theory at least. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith <gsmith@gregsmith.com> writes: > ... If they'd have noticed it while the server was up, perhaps because the > "last checkpoint" value hadn't changed in a long time (which seems like it > might be available via stats even if, as you say, the background writer is > out of its mind at that point), they could have done such a kill and > collected some actual useful information here. That's the theory at > least. Well, mebbe, but that still seems to require a lot of custom monitoring infrastructure that is not present in this patch, and furthermore that this patch doesn't especially aid the development of. regards, tom lane
Tom Lane wrote: > Greg Smith <gsmith@gregsmith.com> writes: > > ... If they'd have noticed it while the server was up, perhaps because the > > "last checkpoint" value hadn't changed in a long time (which seems like it > > might be available via stats even if, as you say, the background writer is > > out of its mind at that point), they could have done such a kill and > > collected some actual useful information here. That's the theory at > > least. > > Well, mebbe, but that still seems to require a lot of custom monitoring > infrastructure that is not present in this patch, and furthermore that > this patch doesn't especially aid the development of. These kind of things can be monitored externally very easily, say by Nagios, when the values are available via the database. If you have to troll the logs, it's quite a bit harder to do it. I'm not sure about the right values to export -- last checkpoint start time is the most obvious idea, but I would also suggest exporting last checkpoint end, or NULL if the checkpoint is ongoing; and also previous- to-last checkpoint start and end. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Friday 04 April 2008 01:59, Tom Lane wrote: > Greg Smith <gsmith@gregsmith.com> writes: > > On Thu, 3 Apr 2008, Tom Lane wrote: > >> I'd much rather be spending our time and effort on understanding what > >> broke for you, and fixing the code so it doesn't happen again. > > > > [ shit happens... ] > > Completely fair, but I still don't see how this particular patch would > be a useful addition to the DBA's monitoring-tool arsenal. The scope > seems too narrow. > So, thinking about this for a few minutes, here are some of the things that knowing the last checkpoint time might help a DBA determine. 1) Alert if checkpointing stops occuring within a reasonable time frame (note there are failure cases and normal use cases where this might occur) (also note I'll agree, this isn't common, but the results are pretty disatrous if it does happen) 2) Can be graphed over time (using rrdtool and others) for trending checkpoint activity 3) Ease monitoring of checkpoints across pitr setups 4) Help determine if your checkpoints are being timeout driven or segment driven, or if you need to look at those settings 5) Determine the number of log files that will need to be replayed in the event of a crash 6) Helps give an indication on if you should enter a manual checkpoint before issuing a pg_start_backup call -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
On Friday 04 April 2008 03:14, Tom Lane wrote: > Greg Smith <gsmith@gregsmith.com> writes: > > ... If they'd have noticed it while the server was up, perhaps because > > the "last checkpoint" value hadn't changed in a long time (which seems > > like it might be available via stats even if, as you say, the background > > writer is out of its mind at that point), they could have done such a > > kill and collected some actual useful information here. That's the > > theory at least. > > Well, mebbe, but that still seems to require a lot of custom monitoring > infrastructure that is not present in this patch, and furthermore that > this patch doesn't especially aid the development of. > This patch makes that a 1 line sql call, which can be plugged into any number of monitoring systems. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
"Alvaro Herrera" <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> Greg Smith <gsmith@gregsmith.com> writes: >> > ... If they'd have noticed it while the server was up, perhaps because the >> > "last checkpoint" value hadn't changed in a long time (which seems like it >> > might be available via stats even if, as you say, the background writer is >> > out of its mind at that point), they could have done such a kill and >> > collected some actual useful information here. That's the theory at >> > least. >> >> Well, mebbe, but that still seems to require a lot of custom monitoring >> infrastructure that is not present in this patch, and furthermore that >> this patch doesn't especially aid the development of. > > These kind of things can be monitored externally very easily, say by > Nagios, when the values are available via the database. If you have to > troll the logs, it's quite a bit harder to do it. I can see Tom's reluctance out of fear that really this is going to be the first of hundreds of dials which have to be monitored so a single function to handle that single dial is kind of short sighted. I would think eventually it should be part of the Postgres SNMP MIB. But I would say from my experience on the not-really-a-sysadmin side I think the time of the last checkpoint is probably the *most* important thing to be monitoring. Effectively it's monitoring how stale your data on disk is potentially becoming by showing how much recovery will be necessary. > I'm not sure about the right values to export -- last checkpoint start > time is the most obvious idea, but I would also suggest exporting last > checkpoint end, or NULL if the checkpoint is ongoing; and also previous- > to-last checkpoint start and end. I'm surprised y'all want the time of the last checkpoint *start* though. It seems to me that only the last checkpoint *end* is actually interesting. A checkpoint which has started but not ended yet is not really a checkpoint at all. It's nothing. In the future there could be multiple checkpoints which have "started" but not finished. Or we could add support for lazy checkpoints in which case there could be an infinite number of "potential" checkpoints which haven't finished. Worse, the time the last checkpoint actually finished isn't actually interesting either. What you want to know is what time the record which the last finished checkpoint *checkpointed up to* was inserted. That is, the time of the record that the last checkpoint record *points to*. That represents the "guarantee" that the database is making to the sysadmin about data integrity. Everything before that time is guaranteed to have reached data files already. Everything after it may or may not be in the data files and has to be checked against the WAL logs. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!
Alvaro Herrera <alvherre@commandprompt.com> writes: > These kind of things can be monitored externally very easily, say by > Nagios, when the values are available via the database. If you have to > troll the logs, it's quite a bit harder to do it. > I'm not sure about the right values to export -- last checkpoint start > time is the most obvious idea, but I would also suggest exporting last > checkpoint end, or NULL if the checkpoint is ongoing; and also previous- > to-last checkpoint start and end. Any Nagios-style monitoring would have to watch checkpoint end (and we'd better define that field as only updating at *successful* checkpoint end). Consider the case where some dirty buffer has a persistent write failure condition. I'm almost inclined to say that the patch shouldn't expose checkpoint start at all, just to make sure people won't get this wrong. We've pretty thoroughly trashed the notion that looking at the interval is helpful anyway. regards, tom lane
On Fri, 2008-04-04 at 02:21 -0400, Greg Smith wrote: > Database stops checkpointing. WAL files pile up. In the middle of > backup, system finally dies, and when it starts recovery there's a bad > record in the WAL files--which there are now thousands of to apply, and > the bad one is 4 hours of replay in. Believe it or not, it goes downhill > from there. > > It's what kicked off the first step that's the big mystery. The only code > path I thought of that can block checkpoints like this is when the > archive_command isn't working anymore, and that wasn't being used. Given > some of the other corruption found later and the bad memory issues > discovered, a bit flipping in the "do I need to checkpoint now?" code or > data seems just as likely as any other explanation. A few additional comments here: If you set checkpoint_segments very, very high you can avoid a checkpoint via checkpoint_timeout for up to 60 minutes. If you did this for performance reasons presumably you've got lots of WAL files and might end up with 1000s of them in that time period. If you set it too high, you hit the disk limits first and can then crash the server if the pg_xlog directory's physical limits are unluckily low enough. Starvation of the checkpoint start lock has been witnessed previously, so if you're running 8.2 or previous that could be a possible explanation here. What can happen is that a checkpoint is triggered yet the bgwriter needs to wait to get access to the CheckpointStartLock. I witnessed a starvation of 3 minutes once during testing a server running at max velocity with 200 users, in 2006. I assumed that was an outlier, but its possible for that to be longer. I wouldn't believe too much longer, though. That was patched in 8.3 as a result. Anyway, either of those factors, or their combination, plus a small pg_xlog disk would be sufficient to explain the crash and wal file build up. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Robert Treat wrote: > 1) Alert if checkpointing stops occuring within a reasonable time frame (note > there are failure cases and normal use cases where this might occur) (also > note I'll agree, this isn't common, but the results are pretty disatrous if > it does happen) What are the normal use cases where this would occur? I can't think of any. Wrt. failure cases, there's a million things that can go wrong in a system, and only few of them will give the symptom of "checkpoints stopped happening", so I'm not excited about adding a facility to monitor just that. More hooks for monitoring purposes in general would be nice, and I would like to see them exposed as SQL functions, but I'd like to see a much bigger proposal for that. > 2) Can be graphed over time (using rrdtool and others) for trending checkpoint > activity Hmm. You'd need the historical data to do that properly. In particular, if two checkpoints happen between the polling interval, you'd miss that. log_checkpoints=on, in CSV output, seems like a better approach for that. > 3) Ease monitoring of checkpoints across pitr setups How is that different from monitoring in a non-pitr setup? > 4) Help determine if your checkpoints are being timeout driven or segment > driven, or if you need to look at those settings Seems like the log_checkpoints output is more useful for that, as it directly tells you what triggered the checkpoint. > 5) Determine the number of log files that will need to be replayed in the > event of a crash If you have a requirement for that, just set checkpoint_segments accordingly. And again, you can get the information in the logs by log_checkpoints=on already. > 6) Helps give an indication on if you should enter a manual checkpoint before > issuing a pg_start_backup call If you want the backup to begin immediately, just do a manual checkpoint unconditionally. It'll finish quickly if there hasn't been much activity since the last one. We talked about adding a new variant of pg_start_backup() that does that automatically (or rather, just hurries the current checkpoint) in the 8.3 cycle, but didn't do it in the end. Perhaps we should add that, after all? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Sat, 05 Apr 2008 16:37:15 +0100 Heikki Linnakangas <heikki@enterprisedb.com> wrote: May I just say that every person that is currently talking on this thread is offtopic? Move it to -hackers please. Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Heikki Linnakangas wrote: > Robert Treat wrote: >> 2) Can be graphed over time (using rrdtool and others) for trending >> checkpoint activity > > Hmm. You'd need the historical data to do that properly. In particular, > if two checkpoints happen between the polling interval, you'd miss that. Yes, Munin and other tools collect the values over time to create the graph. It's not like "top" saves historical system load data either, yet the collector is able to present you with a graph. > log_checkpoints=on, in CSV output, seems like a better approach for that. It's not, because the tool would have a much harder time trolling the logs. CSV does not really make it much easier -- the DBA is still tasked with importing the thing after the current file is complete; and you don't have an up-to-date picture either. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support