Обсуждение: Exposing the stats snapshot timestamp to SQL
In a previous thread Tom Lane said:
(I'm also wondering if it'd make sense to expose the stats timestamp
as a callable function, so that the case could be dealt with
programmatically as well. But that's future-feature territory.)
It seemed the appropriate scope for my first submission, and that feature has been on my wish list for a while, so I thought I'd grab it.
Main purpose of this patch is to expose the timestamp of the current stats snapshot so that it can be leveraged by monitoring code. The obvious case is alerting if the stats collector becomes unresponsive. The common use case is to smooth out graphs which are built from high frequency monitoring of the stats collector.
The timestamp is already available as part of PgStat_GlobalStats. This patch is just the boilerplate (+docs & tests) needed to expose that value to SQL. It shouldn't impact anything else in the server.
I'm not particularly attached to the function name, but I didn't have a better idea.
The patch should apply cleanly to master.
- Matt K
Вложения
On Thu, Jan 29, 2015 at 12:18 AM, Matt Kelly <mkellycs@gmail.com> wrote: > In a previous thread Tom Lane said: > >> (I'm also wondering if it'd make sense to expose the stats timestamp >> as a callable function, so that the case could be dealt with >> programmatically as well. But that's future-feature territory.) > > > (http://www.postgresql.org/message-id/27251.1421684169@sss.pgh.pa.us) > > It seemed the appropriate scope for my first submission, and that feature > has been on my wish list for a while, so I thought I'd grab it. > > Main purpose of this patch is to expose the timestamp of the current stats > snapshot so that it can be leveraged by monitoring code. The obvious case > is alerting if the stats collector becomes unresponsive. The common use > case is to smooth out graphs which are built from high frequency monitoring > of the stats collector. > > The timestamp is already available as part of PgStat_GlobalStats. This > patch is just the boilerplate (+docs & tests) needed to expose that value to > SQL. It shouldn't impact anything else in the server. > > I'm not particularly attached to the function name, but I didn't have a > better idea. > > The patch should apply cleanly to master. Please add your patch here so we don't forget about it: https://commitfest.postgresql.org/action/commitfest_view/open -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 1/28/15 11:18 PM, Matt Kelly wrote: > In a previous thread Tom Lane said: > > (I'm also wondering if it'd make sense to expose the stats timestamp > as a callable function, so that the case could be dealt with > programmatically as well. But that's future-feature territory.) > > (http://www.postgresql.org/message-id/27251.1421684169@sss.pgh.pa.us) > > It seemed the appropriate scope for my first submission, and that feature has been on my wish list for a while, so I thoughtI'd grab it. I've reviewed the patch (though haven't tested it myself) and it looks good. The only thing I'm not sure of is this: + /* Get the timestamp of the current statistics snapshot */ + Datum + pg_stat_snapshot_timestamp(PG_FUNCTION_ARGS) + { + PG_RETURN_TIMESTAMPTZ(pgstat_fetch_global()->stats_timestamp); + } Is the community OK with referencing stats_timestamp that way? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
Robert, I'll add it to the commitfest.
Jim, I'm not sure I understand what you mean? This new function follows the same conventions as everything else in the file. TimestampTz is just a typedef for int64. Functions like pg_stat_get_buf_alloc follow the exact same pattern on the int64 fields of the global stats struct.
- Matt K.
On Thu, Jan 29, 2015 at 6:49 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 1/28/15 11:18 PM, Matt Kelly wrote:In a previous thread Tom Lane said:
(I'm also wondering if it'd make sense to expose the stats timestamp
as a callable function, so that the case could be dealt with
programmatically as well. But that's future-feature territory.)
(http://www.postgresql.org/message-id/27251.1421684169@sss.pgh.pa.us)
It seemed the appropriate scope for my first submission, and that feature has been on my wish list for a while, so I thought I'd grab it.
I've reviewed the patch (though haven't tested it myself) and it looks good. The only thing I'm not sure of is this:
+ /* Get the timestamp of the current statistics snapshot */
+ Datum
+ pg_stat_snapshot_timestamp(PG_FUNCTION_ARGS)
+ {
+ PG_RETURN_TIMESTAMPTZ(pgstat_fetch_global()->stats_timestamp);
+ }
Is the community OK with referencing stats_timestamp that way?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
This is now: https://commitfest.postgresql.org/4/128/
On Thu, Jan 29, 2015 at 7:01 PM, Matt Kelly <mkellycs@gmail.com> wrote:
Robert, I'll add it to the commitfest.Jim, I'm not sure I understand what you mean? This new function follows the same conventions as everything else in the file. TimestampTz is just a typedef for int64. Functions like pg_stat_get_buf_alloc follow the exact same pattern on the int64 fields of the global stats struct.- Matt K.On Thu, Jan 29, 2015 at 6:49 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:On 1/28/15 11:18 PM, Matt Kelly wrote:In a previous thread Tom Lane said:
(I'm also wondering if it'd make sense to expose the stats timestamp
as a callable function, so that the case could be dealt with
programmatically as well. But that's future-feature territory.)
(http://www.postgresql.org/message-id/27251.1421684169@sss.pgh.pa.us)
It seemed the appropriate scope for my first submission, and that feature has been on my wish list for a while, so I thought I'd grab it.
I've reviewed the patch (though haven't tested it myself) and it looks good. The only thing I'm not sure of is this:
+ /* Get the timestamp of the current statistics snapshot */
+ Datum
+ pg_stat_snapshot_timestamp(PG_FUNCTION_ARGS)
+ {
+ PG_RETURN_TIMESTAMPTZ(pgstat_fetch_global()->stats_timestamp);
+ }
Is the community OK with referencing stats_timestamp that way?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
Matt Kelly <mkellycs@gmail.com> writes: > Jim, I'm not sure I understand what you mean? This new function follows > the same conventions as everything else in the file. TimestampTz is just a > typedef for int64. ... or double. Have you checked that the code behaves properly with --disable-integer-timestamps? regards, tom lane
On Thu, Jan 29, 2015 at 8:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Matt Kelly <mkellycs@gmail.com> writes:
> Jim, I'm not sure I understand what you mean? This new function follows
> the same conventions as everything else in the file. TimestampTz is just a
> typedef for int64.
... or double. Have you checked that the code behaves properly with
--disable-integer-timestamps?
regards, tom lane
Well, yes int or double. I should have been more clear about that. Its a good point though that I should run the server with disable for completeness.
I presume you meant --disable-integer-datetimes. I just ran that test case now, all set.
For my own edification, was there really any risk of something so trivial breaking due to that setting, or was it just for completeness? (which is a perfectly valid reason)
Thanks,
- Matt K.
Matt Kelly <mkellycs@gmail.com> writes: > On Thu, Jan 29, 2015 at 8:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ... or double. Have you checked that the code behaves properly with >> --disable-integer-timestamps? > I presume you meant --disable-integer-datetimes. I just ran that test case > now, all set. Right, my own sloppiness there. > For my own edification, was there really any risk of something so trivial > breaking due to that setting, or was it just for completeness? (which is a > perfectly valid reason) I hadn't looked at your code at that point, and now that I have, I agree it's unlikely to have had a problem with this. Still, it's a good thing to check, particularly considering the lack of buildfarm coverage of this option :-(. regards, tom lane
Actually, I just did one more code review of myself, and somehow missed that I submitted the version with the wrong oid. The oid used in the first version is wrong (10000) and was from before I read the docs on properly picking one.
Attached is the fixed version. (hopefully with the right mime-type and wrong extension. Alas, gmail doesn't let you set mime-types; time to find a new email client...)
- Matt K.
Вложения
On Fri, Jan 30, 2015 at 12:07:22AM -0500, Tom Lane wrote: > Matt Kelly <mkellycs@gmail.com> writes: > > On Thu, Jan 29, 2015 at 8:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> ... or double. Have you checked that the code behaves properly with > >> --disable-integer-timestamps? > > > I presume you meant --disable-integer-datetimes. I just ran that test case > > now, all set. > > Right, my own sloppiness there. > > > For my own edification, was there really any risk of something so trivial > > breaking due to that setting, or was it just for completeness? (which is a > > perfectly valid reason) > > I hadn't looked at your code at that point, and now that I have, I agree > it's unlikely to have had a problem with this. Still, it's a good thing > to check, particularly considering the lack of buildfarm coverage of this > option :-(. Given that this has been deprecated for years, maybe it's time we took it out in 9.5. That the interested parties haven't bothered to put buildfarm members in that use the option tells me that they're not all that interested. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Hi there, On 30.1.2015 06:27, Matt Kelly wrote: > Actually, I just did one more code review of myself, and somehow missed > that I submitted the version with the wrong oid. The oid used in the > first version is wrong (10000) and was from before I read the docs on > properly picking one. > > Attached is the fixed version. (hopefully with the right mime-type and > wrong extension. Alas, gmail doesn't let you set mime-types; time to > find a new email client...) I do have a question regarding the patch, although I see the patch is already marked as 'ready for committer' (sorry, somehow managed to miss the patch until now). I see the patch only works with the top-level snapshot timestamp, stored in globalStats, but since 9.3 (when the stats were split into per-db files) we track per-database timestamps too. Shouldn't we make those timestamps accessible too? It's not necessary for detecting unresponsive statistics collector (if it's stuck it's not writing anything, so the global timestamp will be old too), but it seems more appropriate for querying database-level stats to query database-level timestamp too. But maybe that's not necessary, because to query database stats you have to be connected to that particular database and that should write fresh stats, so the timestamps should not be very different. regards -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > I see the patch only works with the top-level snapshot timestamp, stored > in globalStats, but since 9.3 (when the stats were split into per-db > files) we track per-database timestamps too. > Shouldn't we make those timestamps accessible too? It's not necessary > for detecting unresponsive statistics collector (if it's stuck it's not > writing anything, so the global timestamp will be old too), but it seems > more appropriate for querying database-level stats to query > database-level timestamp too. I'm inclined to say not; I think that's exposing an implementation detail that we might regret exposing, later. (IOW, it's not hard to think of rearrangements that might mean there wasn't a per-database stamp anymore.) > But maybe that's not necessary, because to query database stats you have > to be connected to that particular database and that should write fresh > stats, so the timestamps should not be very different. Yeah. The only use-case that's been suggested is detecting an unresponsive stats collector, and the main timestamp should be plenty for that. regards, tom lane
On 20.2.2015 02:58, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> I see the patch only works with the top-level snapshot timestamp, >> stored in globalStats, but since 9.3 (when the stats were split >> into per-db files) we track per-database timestamps too. > >> Shouldn't we make those timestamps accessible too? It's not >> necessary for detecting unresponsive statistics collector (if it's >> stuck it's not writing anything, so the global timestamp will be >> old too), but it seems more appropriate for querying database-level >> stats to query database-level timestamp too. > > I'm inclined to say not; I think that's exposing an implementation > detail that we might regret exposing, later. (IOW, it's not hard to > think of rearrangements that might mean there wasn't a per-database > stamp anymore.) Fair point, but isn't the global timestamp an implementation detail too? Although we're less likely to remove the global timestamp, no doubt about that ... >> But maybe that's not necessary, because to query database stats you >> have to be connected to that particular database and that should >> write fresh stats, so the timestamps should not be very different. > > Yeah. The only use-case that's been suggested is detecting an > unresponsive stats collector, and the main timestamp should be plenty > for that. Well, the patch also does this: *** 28,34 **** SELECT pg_sleep_for('2 seconds'); CREATE TEMP TABLE prevstats AS SELECT t.seq_scan, t.seq_tup_read, t.idx_scan,t.idx_tup_fetch, (b.heap_blks_read + b.heap_blks_hit) AS heap_blks, ! (b.idx_blks_read + b.idx_blks_hit) AS idx_blks FROM pg_catalog.pg_stat_user_tables AS t, pg_catalog.pg_statio_user_tablesAS b WHERE t.relname='tenk2' AND b.relname='tenk2'; --- 28,35 ---- CREATE TEMP TABLE prevstats AS SELECT t.seq_scan, t.seq_tup_read, t.idx_scan, t.idx_tup_fetch, (b.heap_blks_read+ b.heap_blks_hit) AS heap_blks, ! (b.idx_blks_read + b.idx_blks_hit) AS idx_blks, ! pg_stat_snapshot_timestamp() as snap_ts FROM pg_catalog.pg_stat_user_tables AS t, pg_catalog.pg_statio_user_tablesAS b WHERE t.relname='tenk2' AND b.relname='tenk2'; *************** i.e. it adds the timestamp into queries selecting from database-level catalogs. But OTOH we're usually using now() here, so the global snapshot is an improvement here, and if the collector is stuck it's not going to update of the timestamps. So I'm OK with this patch. -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Matt Kelly <mkellycs@gmail.com> writes: > Attached is the fixed version. (hopefully with the right mime-type and > wrong extension. Alas, gmail doesn't let you set mime-types; time to find > a new email client...) Committed with a couple of changes: * I changed the function name from pg_stat_snapshot_timestamp to pg_stat_get_snapshot_timestamp, which seemed to me to be the style in general use in the stats stuff for inquiry functions. * The function should be marked stable not volatile, since its value doesn't change any faster than all the other stats inquiry functions. regards, tom lane
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > Well, the patch also does this: > *** 28,34 **** SELECT pg_sleep_for('2 seconds'); > CREATE TEMP TABLE prevstats AS > SELECT t.seq_scan, t.seq_tup_read, t.idx_scan, t.idx_tup_fetch, > (b.heap_blks_read + b.heap_blks_hit) AS heap_blks, > ! (b.idx_blks_read + b.idx_blks_hit) AS idx_blks > FROM pg_catalog.pg_stat_user_tables AS t, > pg_catalog.pg_statio_user_tables AS b > WHERE t.relname='tenk2' AND b.relname='tenk2'; > --- 28,35 ---- > CREATE TEMP TABLE prevstats AS > SELECT t.seq_scan, t.seq_tup_read, t.idx_scan, t.idx_tup_fetch, > (b.heap_blks_read + b.heap_blks_hit) AS heap_blks, > ! (b.idx_blks_read + b.idx_blks_hit) AS idx_blks, > ! pg_stat_snapshot_timestamp() as snap_ts > FROM pg_catalog.pg_stat_user_tables AS t, > pg_catalog.pg_statio_user_tables AS b > WHERE t.relname='tenk2' AND b.relname='tenk2'; > *************** That's merely a regression test to verify that the value appears to advance from time to time ... I don't think it has any larger meaning. (It will be interesting to see what this new test query reports in the transient buildfarm failures we still occasionally see in the stats test.) regards, tom lane
Yeah. The only use-case that's been suggested is detecting an
unresponsive stats collector, and the main timestamp should be plenty for
that.
Lately, I've spent most of my time doing investigation into increasing qps. Turned out we've been able to triple our throughput by monitoring experiments at highly granular time steps (1 to 2 seconds). Effects that were invisible with 30 second polls of the stats were obvious with 2 second polls.
The problem with doing highly granular snapshots is that the postgres counters are monotonically increasing, but only when stats are published. Currently you have no option except to divide by the delta of now() between the polling intervals. If you poll every 2 seconds the max error is about .5/2 or 25%. It makes reading those numbers a bit noisy. Using (snapshot_timestamp_new - snapshot_timestamp_old) as the denominator in that calculation should help to smooth out that noise and show a clearer picture.
However, I'm happy with the committed version. Thanks Tom.
- Matt K.
On Thu, Feb 19, 2015 at 9:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Matt Kelly <mkellycs@gmail.com> writes:
> Attached is the fixed version. (hopefully with the right mime-type and
> wrong extension. Alas, gmail doesn't let you set mime-types; time to find
> a new email client...)
Committed with a couple of changes:
* I changed the function name from pg_stat_snapshot_timestamp to
pg_stat_get_snapshot_timestamp, which seemed to me to be the style
in general use in the stats stuff for inquiry functions.
* The function should be marked stable not volatile, since its value
doesn't change any faster than all the other stats inquiry functions.
regards, tom lane
Matt Kelly <mkellycs@gmail.com> writes: >> Yeah. The only use-case that's been suggested is detecting an >> unresponsive stats collector, and the main timestamp should be plenty for >> that. > The problem with doing highly granular snapshots is that the postgres > counters are monotonically increasing, but only when stats are published. > Currently you have no option except to divide by the delta of now() between > the polling intervals. If you poll every 2 seconds the max error is about > .5/2 or 25%. It makes reading those numbers a bit noisy. Using > (snapshot_timestamp_new > - snapshot_timestamp_old) as the denominator in that calculation should > help to smooth out that noise and show a clearer picture. Ah, interesting! Thanks for pointing out another use case. regards, tom lane