Обсуждение: autoanalyze criteria

Поиск
Список
Период
Сортировка

autoanalyze criteria

От
Stefan Andreatta
Дата:
Hi,

If I understand
http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html
correctly, the autovacuum threshold in could be estimated like this in
PostgreSQL 9.1:

SELECT pg_stat_user_tables.relname,
      pg_stat_user_tables.n_dead_tup,
      CAST(current_setting('autovacuum_vacuum_threshold') AS bigint)
          + (CAST(current_setting('autovacuum_vacuum_scale_factor') AS
numeric)
             * pg_class.reltuples) AS av_threshold
  FROM pg_stat_user_tables
      JOIN pg_class on pg_stat_user_tables.relid = pg_class.oid
  ORDER BY 1;

If pg_stat_user_tables.n_dead_tup exceeds av_threshold autovacuum should
kick in. Obviously, that does rely on up-to-date statistics. Is that how
it is actually done?

2nd question: because pg_stat_user_tables.n_dead_tup is itself estimated
by ANALYZE it cannot be used as a criterion for the next autoanalyze
run, I think. Is there any way to query, whether a table currently
qualifies for autoanalyze?

Thanks and Regards,
Stefan

Re: autoanalyze criteria

От
Jeff Janes
Дата:
On Fri, Feb 22, 2013 at 5:57 AM, Stefan Andreatta
<s.andreatta@synedra.com>wrote:

> Hi,
>
> If I understand http://www.postgresql.org/**docs/9.1/static/routine-**
> vacuuming.html<http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html>correctly, the autovacuum threshold
incould be estimated like this in 
> PostgreSQL 9.1:
>
> SELECT pg_stat_user_tables.relname,
>      pg_stat_user_tables.n_dead_**tup,
>      CAST(current_setting('**autovacuum_vacuum_threshold') AS bigint)
>          + (CAST(current_setting('**autovacuum_vacuum_scale_**factor') AS
> numeric)
>             * pg_class.reltuples) AS av_threshold
>  FROM pg_stat_user_tables
>      JOIN pg_class on pg_stat_user_tables.relid = pg_class.oid
>  ORDER BY 1;
>
> If pg_stat_user_tables.n_dead_tup exceeds av_threshold autovacuum should
> kick in. Obviously, that does rely on up-to-date statistics. Is that how it
> is actually done?
>

Pretty much, yes.  With the caveat that table storage settings can override
the global settings.


>
> 2nd question: because pg_stat_user_tables.n_dead_tup is itself estimated
> by ANALYZE it cannot be used as a criterion for the next autoanalyze run, I
> think.


n_dead_tup is updated by the stats collector, not by ANALYZE.

Cheers,

Jeff

Re: autoanalyze criteria

От
Stefan Andreatta
Дата:
On 02/22/2013 06:27 PM, Jeff Janes wrote:
> On Fri, Feb 22, 2013 at 5:57 AM, Stefan Andreatta
> <s.andreatta@synedra.com <mailto:s.andreatta@synedra.com>> wrote:
>
>     Hi,
>
>     If I understand
>     http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html
>     correctly, the autovacuum threshold in could be estimated like
>     this in PostgreSQL 9.1:
>
>     SELECT pg_stat_user_tables.relname,
>          pg_stat_user_tables.n_dead_tup,
>          CAST(current_setting('autovacuum_vacuum_threshold') AS bigint)
>              + (CAST(current_setting('autovacuum_vacuum_scale_factor')
>     AS numeric)
>                 * pg_class.reltuples) AS av_threshold
>      FROM pg_stat_user_tables
>          JOIN pg_class on pg_stat_user_tables.relid = pg_class.oid
>      ORDER BY 1;
>
>     If pg_stat_user_tables.n_dead_tup exceeds av_threshold autovacuum
>     should kick in. Obviously, that does rely on up-to-date
>     statistics. Is that how it is actually done?
>
>
> Pretty much, yes.  With the caveat that table storage settings can
> override the global settings.
>
>
>     2nd question: because pg_stat_user_tables.n_dead_tup is itself
>     estimated by ANALYZE it cannot be used as a criterion for the next
>     autoanalyze run, I think.
>
>
> n_dead_tup is updated by the stats collector, not by ANALYZE.
>
Thanks Jeff, that helped a lot (as did a careful rereading of
http://www.postgresql.org/docs/9.1/static/monitoring-stats.html and
http://www.postgresql.org/docs/9.1/static/catalog-pg-class.html ;-)

However, to estimate whether autoanalyze should be triggered, I am still
missing something: the analyze threshold is compared to the "total
number of tuples inserted, updated, or deleted since the last ANALYZE."
(according to
http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html).

pg_stat_user_tables.n_live tup - pg_class.reltuples should give
something like the sum of rows inserted minus rows deleted since the
last ANALYZE. But according to the documentation we would need the sum
of those values. And we are still missing a number for rows updated
since the last analyze. pg_stat_usert_tables. n_dead_tup, on the other
hand, is only set back by successful VACUUM. autoanalyzing a table with
more than 10% dead rows would therefore keep autoanalyze in a loop until
the ratio rises beyond 20% (default configuration) and autovacuum kicks
in. So that wouldn't make a lot of sense.

Regards,
Stefan

Re: autoanalyze criteria

От
Jeff Janes
Дата:
On Saturday, February 23, 2013, Stefan Andreatta wrote:

>
>
> Thanks Jeff, that helped a lot (as did a careful rereading of
> http://www.postgresql.org/docs/9.1/static/monitoring-stats.html and
> http://www.postgresql.org/docs/9.1/static/catalog-pg-class.html ;-)
>
> However, to estimate whether autoanalyze should be triggered, I am still
> missing something: the analyze threshold is compared to the "total number
> of tuples inserted, updated, or deleted since the last ANALYZE."
> (according to
> http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html).
>
> pg_stat_user_tables.n_live tup - pg_class.reltuples should give something
> like the sum of rows inserted minus rows deleted since the last ANALYZE.
> But according to the documentation we would need the sum of those values.
> And we are still missing a number for rows updated since the last analyze.
> pg_stat_usert_tables. n_dead_tup, on the other hand, is only set back by
> successful VACUUM. autoanalyzing a table with more than 10% dead rows would
> therefore keep autoanalyze in a loop until the ratio rises beyond 20%
> (default configuration) and autovacuum kicks in. So that wouldn't make a
> lot of sense.
>
>
Hi Stefan,

Sorry, I got tunnel vision about the how the threshold was computed, and
forgot about the thing it was compared to.  There is a "secret" data point
in the stats collector called changes_since_analyze.  This is not exposed
in the pg_stat_user_tables.  But I think it should be as I often have
wanted to see it.


Cheers,

Jeff

Re: autoanalyze criteria

От
Stefan Andreatta
Дата:
On 02/23/2013 05:10 PM, Jeff Janes wrote:
> On Saturday, February 23, 2013, Stefan Andreatta wrote:
>
>>
>     Thanks Jeff, that helped a lot (as did a careful rereading of
>     http://www.postgresql.org/docs/9.1/static/monitoring-stats.html
>     and
>     http://www.postgresql.org/docs/9.1/static/catalog-pg-class.html ;-)
>
>     However, to estimate whether autoanalyze should be triggered, I am
>     still missing something: the analyze threshold is compared to the
>     "total number of tuples inserted, updated, or deleted since the
>     last ANALYZE." (according to
>     http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html).
>
>     pg_stat_user_tables.n_live tup - pg_class.reltuples should give
>     something like the sum of rows inserted minus rows deleted since
>     the last ANALYZE. But according to the documentation we would need
>     the sum of those values. And we are still missing a number for
>     rows updated since the last analyze. pg_stat_usert_tables.
>     n_dead_tup, on the other hand, is only set back by successful
>     VACUUM. autoanalyzing a table with more than 10% dead rows would
>     therefore keep autoanalyze in a loop until the ratio rises beyond
>     20% (default configuration) and autovacuum kicks in. So that
>     wouldn't make a lot of sense.
>
>
> Hi Stefan,
>
> Sorry, I got tunnel vision about the how the threshold was computed,
> and forgot about the thing it was compared to.  There is a "secret"
> data point in the stats collector called changes_since_analyze.  This
> is not exposed in the pg_stat_user_tables.  But I think it should be
> as I often have wanted to see it.
>
>
> Cheers,
>
> Jeff

Sounds like a very good idea to me - any way I could help to make such a
thing happen?

Stefan

Re: autoanalyze criteria

От
Alban Hertroys
Дата:
On Feb 23, 2013, at 14:11, Stefan Andreatta <s.andreatta@synedra.com> wrote:

> And we are still missing a number for rows updated since the last analyse.

In MVCC an update is an insert + delete, so you already got those numbers.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

Re: autoanalyze criteria

От
Stefan Andreatta
Дата:
On 02/24/2013 12:52 PM, Alban Hertroys wrote:
> On Feb 23, 2013, at 14:11, Stefan Andreatta <s.andreatta@synedra.com
> <mailto:s.andreatta@synedra.com>> wrote:
>
>> And we are still missing a number for rows updated since the last
>> analyse.
>
> In MVCC an update is an insert + delete, so you already got those numbers.
>
Good point. But because they are an update and a delete, they cancel
each other out and do not show up in pg_stat_user_tables.n_live_tup -
and that's the only value for which we have a reference value from the
time of the last analyze (pg_class.reltuples).

On the other hand, I might again miss something out. I would be most
happy if anybody could come up with a query to estimate autoanalyze
trigger conditions from the values we have available now.

Stefan

Re: autoanalyze criteria

От
Alban Hertroys
Дата:
On Feb 25, 2013, at 7:23, Stefan Andreatta <s.andreatta@synedra.com> =
wrote:

> On 02/24/2013 12:52 PM, Alban Hertroys wrote:
>> On Feb 23, 2013, at 14:11, Stefan Andreatta <s.andreatta@synedra.com> =
wrote:
>>=20
>>> And we are still missing a number for rows updated since the last =
analyse.
>>=20
>> In MVCC an update is an insert + delete, so you already got those =
numbers.
>>=20
> Good point. But because they are an update and a delete, they cancel =
each other out and do not show up in pg_stat_user_tables.n_live_tup - =
and that's the only value for which we have a reference value from the =
time of the last analyze (pg_class.reltuples).

I'm pretty sure that an update results in 1 live + 1 dead tuple, so they =
don't cancel each other out - they end up adding to different =
statistics. Assuming those statistics are both since last vacuum, added =
together they are the total number of changed records since last vacuum.
What gain do you expect from a number of updated tuples?

And it seems to me those numbers are since last vacuum, not since last =
analyse - analyse doesn't change the amount of dead tuples (it just =
updates them to closer match reality), but vacuum does.

Disclaimer: I'm not intimately familiar with the planner statistics, but =
knowing what vacuum and analyse do in an MVCC database, like I described =
above it makes sense to me. I might be wrong though.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

Re: autoanalyze criteria

От
Stefan Andreatta
Дата:
On 02/25/2013 09:00 AM, Alban Hertroys wrote:
> On Feb 25, 2013, at 7:23, Stefan Andreatta <s.andreatta@synedra.com
> <mailto:s.andreatta@synedra.com>> wrote:
>
>> On 02/24/2013 12:52 PM, Alban Hertroys wrote:
>>> On Feb 23, 2013, at 14:11, Stefan Andreatta <s.andreatta@synedra.com
>>> <mailto:s.andreatta@synedra.com>> wrote:
>>>
>>>> And we are still missing a number for rows updated since the last
>>>> analyse.
>>>
>>> In MVCC an update is an insert + delete, so you already got those
>>> numbers.
>>>
>> Good point. But because they are an update and a delete, they cancel
>> each other out and do not show up in pg_stat_user_tables.n_live_tup -
>> and that's the only value for which we have a reference value from
>> the time of the last analyze (pg_class.reltuples).
>
> I'm pretty sure that an update results in 1 live + 1 dead tuple, so
> they don't cancel each other out - they end up adding to different
> statistics. Assuming those statistics are both since last vacuum,
> added together they are the total number of changed records since last
> vacuum.
> What gain do you expect from a number of updated tuples?
>
> And it seems to me those numbers are since last vacuum, not since last
> analyse - analyse doesn't change the amount of dead tuples (it just
> updates them to closer match reality), but vacuum does.
>
> Disclaimer: I'm not intimately familiar with the planner statistics,
> but knowing what vacuum and analyse do in an MVCC database, like I
> described above it makes sense to me. I might be wrong though.
1 update = 1 insert + 1 delete cancel each other out with respect to
pg_stat_user_tables.n_live_tup. Naturally, they dont't cancel each other
out with pg_stat_user_tables.n_tup_ins or n_tup_del - they don't even
show up in those values, presumably because that's what n_tup_upd is
there for. However the update adds to n_dead_tup.

VACUUM does not reset *any* of the statistics values that can be
accessed via pg_stat_user_tables, apart from n_dead_tup (hopefully ;-)
Anyway, to estimate the autoanalyze trigger, I would need statistics
that get reset by autoanalyze not autovacuum.


I wrote a test script to show the behaviour. Be sure to wait a second
each time before accessing pg_stat_user_tables as there is a delay in
getting those data:

CREATE TABLE test_stat (id BIGINT, some_number BIGINT);
INSERT INTO test_stat (SELECT generate_series(1,10000) AS i, random() AS r);
SELECT count(*) FROM test_stat;
ANALYZE test_stat;

-- wait here (0.5 s) for statistics collector to catch up
SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

INSERT INTO test_stat (SELECT generate_series(10001,10900) AS i,
random() AS r);

-- wait here (0.5 s) for statistics collector to catch up
SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

DELETE FROM test_stat WHERE id > 10000;

-- wait here (0.5 s) for statistics collector to catch up
SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

UPDATE test_stat set some_number = 1 where id > 9100;

-- wait here (0.5 s) for statistics collector to catch up
SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

ANALYZE test_stat;

SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

VACUUM test_stat;

-- wait here (0.5 s) for statistics collector to catch up
SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';


DROP TABLE test_stat;



Output from a postgres 9.2 database:
--------------------------------------------------

test=# CREATE TABLE test_stat (id BIGINT, some_number BIGINT);
CREATE TABLE
test=# INSERT INTO test_stat (SELECT generate_series(1,10000) AS i,
random() AS r);
INSERT 0 10000
test=# SELECT count(*) FROM test_stat;
  count
-------
  10000
(1 row)

test=# ANALYZE test_stat;
ANALYZE
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
   relname  | reltuples
-----------+-----------
  test_stat |     10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
   relname  | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
  test_stat |      10000 |          0 |     10000 |         0 |
0 |             0
(1 row)

test=# INSERT INTO test_stat (SELECT generate_series(10001,10900) AS i,
random() AS r);
INSERT 0 900
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
   relname  | reltuples
-----------+-----------
  test_stat |     10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
   relname  | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
  test_stat |      10900 |          0 |     10900 |         0 |
0 |             0
(1 row)

test=# DELETE FROM test_stat WHERE id > 10000;
DELETE 900
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
   relname  | reltuples
-----------+-----------
  test_stat |     10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
   relname  | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
  test_stat |      10000 |        900 |     10900 |         0 | 900
|             0
(1 row)

test=# UPDATE test_stat set some_number = 1 where id > 9100;
UPDATE 900
test=# -- wait here (0.5 s) for statistics collector to catch up
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
   relname  | reltuples
-----------+-----------
  test_stat |     10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
   relname  | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
  test_stat |      10000 |       1800 |     10900 |       900 | 900
|            10
(1 row)

test=# ANALYZE test_stat;
ANALYZE
test=#
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
   relname  | reltuples
-----------+-----------
  test_stat |     10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
   relname  | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
  test_stat |      10000 |       1800 |     10900 |       900 | 900
|            10
(1 row)

test=# VACUUM test_stat;
VACUUM
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
   relname  | reltuples
-----------+-----------
  test_stat |     10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
   relname  | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
  test_stat |      10000 |          0 |     10900 |       900 | 900
|            10
(1 row)

test=# DROP TABLE test_stat;
DROP TABLE


Regards,
Stefan