Обсуждение: autoanalyze criteria
Hi, If I understand http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html correctly, the autovacuum threshold in could be estimated like this in PostgreSQL 9.1: SELECT pg_stat_user_tables.relname, pg_stat_user_tables.n_dead_tup, CAST(current_setting('autovacuum_vacuum_threshold') AS bigint) + (CAST(current_setting('autovacuum_vacuum_scale_factor') AS numeric) * pg_class.reltuples) AS av_threshold FROM pg_stat_user_tables JOIN pg_class on pg_stat_user_tables.relid = pg_class.oid ORDER BY 1; If pg_stat_user_tables.n_dead_tup exceeds av_threshold autovacuum should kick in. Obviously, that does rely on up-to-date statistics. Is that how it is actually done? 2nd question: because pg_stat_user_tables.n_dead_tup is itself estimated by ANALYZE it cannot be used as a criterion for the next autoanalyze run, I think. Is there any way to query, whether a table currently qualifies for autoanalyze? Thanks and Regards, Stefan
On Fri, Feb 22, 2013 at 5:57 AM, Stefan Andreatta <s.andreatta@synedra.com>wrote: > Hi, > > If I understand http://www.postgresql.org/**docs/9.1/static/routine-** > vacuuming.html<http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html>correctly, the autovacuum threshold incould be estimated like this in > PostgreSQL 9.1: > > SELECT pg_stat_user_tables.relname, > pg_stat_user_tables.n_dead_**tup, > CAST(current_setting('**autovacuum_vacuum_threshold') AS bigint) > + (CAST(current_setting('**autovacuum_vacuum_scale_**factor') AS > numeric) > * pg_class.reltuples) AS av_threshold > FROM pg_stat_user_tables > JOIN pg_class on pg_stat_user_tables.relid = pg_class.oid > ORDER BY 1; > > If pg_stat_user_tables.n_dead_tup exceeds av_threshold autovacuum should > kick in. Obviously, that does rely on up-to-date statistics. Is that how it > is actually done? > Pretty much, yes. With the caveat that table storage settings can override the global settings. > > 2nd question: because pg_stat_user_tables.n_dead_tup is itself estimated > by ANALYZE it cannot be used as a criterion for the next autoanalyze run, I > think. n_dead_tup is updated by the stats collector, not by ANALYZE. Cheers, Jeff
On 02/22/2013 06:27 PM, Jeff Janes wrote: > On Fri, Feb 22, 2013 at 5:57 AM, Stefan Andreatta > <s.andreatta@synedra.com <mailto:s.andreatta@synedra.com>> wrote: > > Hi, > > If I understand > http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html > correctly, the autovacuum threshold in could be estimated like > this in PostgreSQL 9.1: > > SELECT pg_stat_user_tables.relname, > pg_stat_user_tables.n_dead_tup, > CAST(current_setting('autovacuum_vacuum_threshold') AS bigint) > + (CAST(current_setting('autovacuum_vacuum_scale_factor') > AS numeric) > * pg_class.reltuples) AS av_threshold > FROM pg_stat_user_tables > JOIN pg_class on pg_stat_user_tables.relid = pg_class.oid > ORDER BY 1; > > If pg_stat_user_tables.n_dead_tup exceeds av_threshold autovacuum > should kick in. Obviously, that does rely on up-to-date > statistics. Is that how it is actually done? > > > Pretty much, yes. With the caveat that table storage settings can > override the global settings. > > > 2nd question: because pg_stat_user_tables.n_dead_tup is itself > estimated by ANALYZE it cannot be used as a criterion for the next > autoanalyze run, I think. > > > n_dead_tup is updated by the stats collector, not by ANALYZE. > Thanks Jeff, that helped a lot (as did a careful rereading of http://www.postgresql.org/docs/9.1/static/monitoring-stats.html and http://www.postgresql.org/docs/9.1/static/catalog-pg-class.html ;-) However, to estimate whether autoanalyze should be triggered, I am still missing something: the analyze threshold is compared to the "total number of tuples inserted, updated, or deleted since the last ANALYZE." (according to http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html). pg_stat_user_tables.n_live tup - pg_class.reltuples should give something like the sum of rows inserted minus rows deleted since the last ANALYZE. But according to the documentation we would need the sum of those values. And we are still missing a number for rows updated since the last analyze. pg_stat_usert_tables. n_dead_tup, on the other hand, is only set back by successful VACUUM. autoanalyzing a table with more than 10% dead rows would therefore keep autoanalyze in a loop until the ratio rises beyond 20% (default configuration) and autovacuum kicks in. So that wouldn't make a lot of sense. Regards, Stefan
On Saturday, February 23, 2013, Stefan Andreatta wrote: > > > Thanks Jeff, that helped a lot (as did a careful rereading of > http://www.postgresql.org/docs/9.1/static/monitoring-stats.html and > http://www.postgresql.org/docs/9.1/static/catalog-pg-class.html ;-) > > However, to estimate whether autoanalyze should be triggered, I am still > missing something: the analyze threshold is compared to the "total number > of tuples inserted, updated, or deleted since the last ANALYZE." > (according to > http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html). > > pg_stat_user_tables.n_live tup - pg_class.reltuples should give something > like the sum of rows inserted minus rows deleted since the last ANALYZE. > But according to the documentation we would need the sum of those values. > And we are still missing a number for rows updated since the last analyze. > pg_stat_usert_tables. n_dead_tup, on the other hand, is only set back by > successful VACUUM. autoanalyzing a table with more than 10% dead rows would > therefore keep autoanalyze in a loop until the ratio rises beyond 20% > (default configuration) and autovacuum kicks in. So that wouldn't make a > lot of sense. > > Hi Stefan, Sorry, I got tunnel vision about the how the threshold was computed, and forgot about the thing it was compared to. There is a "secret" data point in the stats collector called changes_since_analyze. This is not exposed in the pg_stat_user_tables. But I think it should be as I often have wanted to see it. Cheers, Jeff
On 02/23/2013 05:10 PM, Jeff Janes wrote: > On Saturday, February 23, 2013, Stefan Andreatta wrote: > >> > Thanks Jeff, that helped a lot (as did a careful rereading of > http://www.postgresql.org/docs/9.1/static/monitoring-stats.html > and > http://www.postgresql.org/docs/9.1/static/catalog-pg-class.html ;-) > > However, to estimate whether autoanalyze should be triggered, I am > still missing something: the analyze threshold is compared to the > "total number of tuples inserted, updated, or deleted since the > last ANALYZE." (according to > http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html). > > pg_stat_user_tables.n_live tup - pg_class.reltuples should give > something like the sum of rows inserted minus rows deleted since > the last ANALYZE. But according to the documentation we would need > the sum of those values. And we are still missing a number for > rows updated since the last analyze. pg_stat_usert_tables. > n_dead_tup, on the other hand, is only set back by successful > VACUUM. autoanalyzing a table with more than 10% dead rows would > therefore keep autoanalyze in a loop until the ratio rises beyond > 20% (default configuration) and autovacuum kicks in. So that > wouldn't make a lot of sense. > > > Hi Stefan, > > Sorry, I got tunnel vision about the how the threshold was computed, > and forgot about the thing it was compared to. There is a "secret" > data point in the stats collector called changes_since_analyze. This > is not exposed in the pg_stat_user_tables. But I think it should be > as I often have wanted to see it. > > > Cheers, > > Jeff Sounds like a very good idea to me - any way I could help to make such a thing happen? Stefan
On Feb 23, 2013, at 14:11, Stefan Andreatta <s.andreatta@synedra.com> wrote: > And we are still missing a number for rows updated since the last analyse. In MVCC an update is an insert + delete, so you already got those numbers. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
On 02/24/2013 12:52 PM, Alban Hertroys wrote: > On Feb 23, 2013, at 14:11, Stefan Andreatta <s.andreatta@synedra.com > <mailto:s.andreatta@synedra.com>> wrote: > >> And we are still missing a number for rows updated since the last >> analyse. > > In MVCC an update is an insert + delete, so you already got those numbers. > Good point. But because they are an update and a delete, they cancel each other out and do not show up in pg_stat_user_tables.n_live_tup - and that's the only value for which we have a reference value from the time of the last analyze (pg_class.reltuples). On the other hand, I might again miss something out. I would be most happy if anybody could come up with a query to estimate autoanalyze trigger conditions from the values we have available now. Stefan
On Feb 25, 2013, at 7:23, Stefan Andreatta <s.andreatta@synedra.com> = wrote: > On 02/24/2013 12:52 PM, Alban Hertroys wrote: >> On Feb 23, 2013, at 14:11, Stefan Andreatta <s.andreatta@synedra.com> = wrote: >>=20 >>> And we are still missing a number for rows updated since the last = analyse. >>=20 >> In MVCC an update is an insert + delete, so you already got those = numbers. >>=20 > Good point. But because they are an update and a delete, they cancel = each other out and do not show up in pg_stat_user_tables.n_live_tup - = and that's the only value for which we have a reference value from the = time of the last analyze (pg_class.reltuples). I'm pretty sure that an update results in 1 live + 1 dead tuple, so they = don't cancel each other out - they end up adding to different = statistics. Assuming those statistics are both since last vacuum, added = together they are the total number of changed records since last vacuum. What gain do you expect from a number of updated tuples? And it seems to me those numbers are since last vacuum, not since last = analyse - analyse doesn't change the amount of dead tuples (it just = updates them to closer match reality), but vacuum does. Disclaimer: I'm not intimately familiar with the planner statistics, but = knowing what vacuum and analyse do in an MVCC database, like I described = above it makes sense to me. I might be wrong though. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
On 02/25/2013 09:00 AM, Alban Hertroys wrote: > On Feb 25, 2013, at 7:23, Stefan Andreatta <s.andreatta@synedra.com > <mailto:s.andreatta@synedra.com>> wrote: > >> On 02/24/2013 12:52 PM, Alban Hertroys wrote: >>> On Feb 23, 2013, at 14:11, Stefan Andreatta <s.andreatta@synedra.com >>> <mailto:s.andreatta@synedra.com>> wrote: >>> >>>> And we are still missing a number for rows updated since the last >>>> analyse. >>> >>> In MVCC an update is an insert + delete, so you already got those >>> numbers. >>> >> Good point. But because they are an update and a delete, they cancel >> each other out and do not show up in pg_stat_user_tables.n_live_tup - >> and that's the only value for which we have a reference value from >> the time of the last analyze (pg_class.reltuples). > > I'm pretty sure that an update results in 1 live + 1 dead tuple, so > they don't cancel each other out - they end up adding to different > statistics. Assuming those statistics are both since last vacuum, > added together they are the total number of changed records since last > vacuum. > What gain do you expect from a number of updated tuples? > > And it seems to me those numbers are since last vacuum, not since last > analyse - analyse doesn't change the amount of dead tuples (it just > updates them to closer match reality), but vacuum does. > > Disclaimer: I'm not intimately familiar with the planner statistics, > but knowing what vacuum and analyse do in an MVCC database, like I > described above it makes sense to me. I might be wrong though. 1 update = 1 insert + 1 delete cancel each other out with respect to pg_stat_user_tables.n_live_tup. Naturally, they dont't cancel each other out with pg_stat_user_tables.n_tup_ins or n_tup_del - they don't even show up in those values, presumably because that's what n_tup_upd is there for. However the update adds to n_dead_tup. VACUUM does not reset *any* of the statistics values that can be accessed via pg_stat_user_tables, apart from n_dead_tup (hopefully ;-) Anyway, to estimate the autoanalyze trigger, I would need statistics that get reset by autoanalyze not autovacuum. I wrote a test script to show the behaviour. Be sure to wait a second each time before accessing pg_stat_user_tables as there is a delay in getting those data: CREATE TABLE test_stat (id BIGINT, some_number BIGINT); INSERT INTO test_stat (SELECT generate_series(1,10000) AS i, random() AS r); SELECT count(*) FROM test_stat; ANALYZE test_stat; -- wait here (0.5 s) for statistics collector to catch up SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat'; SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat'; INSERT INTO test_stat (SELECT generate_series(10001,10900) AS i, random() AS r); -- wait here (0.5 s) for statistics collector to catch up SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat'; SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat'; DELETE FROM test_stat WHERE id > 10000; -- wait here (0.5 s) for statistics collector to catch up SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat'; SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat'; UPDATE test_stat set some_number = 1 where id > 9100; -- wait here (0.5 s) for statistics collector to catch up SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat'; SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat'; ANALYZE test_stat; SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat'; SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat'; VACUUM test_stat; -- wait here (0.5 s) for statistics collector to catch up SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat'; SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat'; DROP TABLE test_stat; Output from a postgres 9.2 database: -------------------------------------------------- test=# CREATE TABLE test_stat (id BIGINT, some_number BIGINT); CREATE TABLE test=# INSERT INTO test_stat (SELECT generate_series(1,10000) AS i, random() AS r); INSERT 0 10000 test=# SELECT count(*) FROM test_stat; count ------- 10000 (1 row) test=# ANALYZE test_stat; ANALYZE test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat'; relname | reltuples -----------+----------- test_stat | 10000 (1 row) test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat'; relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd -----------+------------+------------+-----------+-----------+-----------+--------------- test_stat | 10000 | 0 | 10000 | 0 | 0 | 0 (1 row) test=# INSERT INTO test_stat (SELECT generate_series(10001,10900) AS i, random() AS r); INSERT 0 900 test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat'; relname | reltuples -----------+----------- test_stat | 10000 (1 row) test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat'; relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd -----------+------------+------------+-----------+-----------+-----------+--------------- test_stat | 10900 | 0 | 10900 | 0 | 0 | 0 (1 row) test=# DELETE FROM test_stat WHERE id > 10000; DELETE 900 test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat'; relname | reltuples -----------+----------- test_stat | 10000 (1 row) test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat'; relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd -----------+------------+------------+-----------+-----------+-----------+--------------- test_stat | 10000 | 900 | 10900 | 0 | 900 | 0 (1 row) test=# UPDATE test_stat set some_number = 1 where id > 9100; UPDATE 900 test=# -- wait here (0.5 s) for statistics collector to catch up test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat'; relname | reltuples -----------+----------- test_stat | 10000 (1 row) test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat'; relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd -----------+------------+------------+-----------+-----------+-----------+--------------- test_stat | 10000 | 1800 | 10900 | 900 | 900 | 10 (1 row) test=# ANALYZE test_stat; ANALYZE test=# test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat'; relname | reltuples -----------+----------- test_stat | 10000 (1 row) test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat'; relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd -----------+------------+------------+-----------+-----------+-----------+--------------- test_stat | 10000 | 1800 | 10900 | 900 | 900 | 10 (1 row) test=# VACUUM test_stat; VACUUM test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat'; relname | reltuples -----------+----------- test_stat | 10000 (1 row) test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat'; relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd -----------+------------+------------+-----------+-----------+-----------+--------------- test_stat | 10000 | 0 | 10900 | 900 | 900 | 10 (1 row) test=# DROP TABLE test_stat; DROP TABLE Regards, Stefan