Обсуждение: Avoid Wraparound Failures

Поиск
Список
Период
Сортировка

Avoid Wraparound Failures

От
Loles
Дата:
Hi!

Suppose the databases on my instance are near to have a wraparound failure.

(I think so, from what I see, but in the PostgreSQL log I haven't seen any warning about It yet).

What do I have to do?

vacuum freeze;

better than,

vacuum analyze;

Or both?

If the autovacuum_freeze configuration parameters have defaults values, should I modify any first?

More I read of this topic, more confused I am.

Please, I need simple and wise advice :)

Thanks!

Re: Avoid Wraparound Failures

От
Ron
Дата:
On 3/25/22 17:16, Loles wrote:
Hi!

Suppose the databases on my instance are near to have a wraparound failure.

Is this actually the case, or are you just afraid?


(I think so, from what I see, but in the PostgreSQL log I haven't seen any warning about It yet).

What do I have to do?

vacuum freeze;

Kinda, but really... Heck No!!!

vacuum --freeze --jobs=`nproc` --dbname=your_db_name

That will VACUUM FREEZE every table in your database in multiple threads.

However, you only need to freeze tables getting near wraparound.  pg_class.relfrozenxid tells you which tables to worry about.


better than,

vacuum analyze;

VACUUM ANALYZE does a plain vacuum plus collects statistics for the query analyzer.  Collecting query stats has nothing to do with vacuuming or protecting against wraparound.


Or both?

If the autovacuum_freeze configuration parameters have defaults values, should I modify any first?


What version are you running?  Even the recently EOL versions protect from wraparound (though it's a painful last-ditch process).

--
Angular momentum makes the world go 'round.

Re: Avoid Wraparound Failures

От
Laurenz Albe
Дата:
On Fri, 2022-03-25 at 23:16 +0100, Loles wrote:
> Suppose the databases on my instance are near to have a wraparound failure.
> 
> (I think so, from what I see, but in the PostgreSQL log I haven't seen any warning about It yet).
> 
> What do I have to do?
> 
> vacuum freeze;
> 
> better than,
> 
> vacuum analyze;
> 
> Or both?
> 
> If the autovacuum_freeze configuration parameters have defaults values, should I modify any first?
> 
> More I read of this topic, more confused I am.
> 
> Please, I need simple and wise advice :)

DON'T PANIC

If what you see is the age of your oldest unfrozen rows approaching 200 million,
everything is just normal.  That's when anti-wraparound autovacuum *begins*.

Normally, you have nothing to do, except perhaps make sure than autovacuum is
fast enough (autovacuum_vacuum_cost_delay = 2).

If you want to prevent that autovacuum run from happening while your system is
busy (which normally also is no problem), you can trigger a manual VACUUM at
a time of lower database activity.  But make it a plain VACUUM, not a
VACUUM (FREEZE) or (god forbid) VACUUM (FULL), and only VACUUM those tables
that are large and approaching the threshold.  And don't VACUUM them all at the
same time.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Avoid Wraparound Failures

От
Ron
Дата:
On 3/25/22 23:34, Laurenz Albe wrote:
> On Fri, 2022-03-25 at 23:16 +0100, Loles wrote:
>> Suppose the databases on my instance are near to have a wraparound failure.
>>
>> (I think so, from what I see, but in the PostgreSQL log I haven't seen any warning about It yet).
>>
>> What do I have to do?
>>
>> vacuum freeze;
>>
>> better than,
>>
>> vacuum analyze;
>>
>> Or both?
>>
>> If the autovacuum_freeze configuration parameters have defaults values, should I modify any first?
>>
>> More I read of this topic, more confused I am.
>>
>> Please, I need simple and wise advice :)
> DON'T PANIC
>
> If what you see is the age of your oldest unfrozen rows approaching 200 million,
> everything is just normal.  That's when anti-wraparound autovacuum *begins*.
>
> Normally, you have nothing to do, except perhaps make sure than autovacuum is
> fast enough (autovacuum_vacuum_cost_delay = 2).
>
> If you want to prevent that autovacuum run from happening while your system is
> busy (which normally also is no problem), you can trigger a manual VACUUM at
> a time of lower database activity.  But make it a plain VACUUM, not a
> VACUUM (FREEZE)

Why not VACUUM FREEZE?

> or (god forbid) VACUUM (FULL), and only VACUUM those tables
> that are large and approaching the threshold.  And don't VACUUM them all at the
> same time.
>
> Yours,
> Laurenz Albe

-- 
Angular momentum makes the world go 'round.



Re: Avoid Wraparound Failures

От
Laurenz Albe
Дата:
On Fri, 2022-03-25 at 23:45 -0500, Ron wrote:
> > If you want to prevent that autovacuum run from happening while your system is
> > busy (which normally also is no problem), you can trigger a manual VACUUM at
> > a time of lower database activity.  But make it a plain VACUUM, not a
> > VACUUM (FREEZE)
> 
> Why not VACUUM FREEZE?

Because then you freeze *all* visible rows, not only those that are older than
"vacuum_freeze_min_age".  That will cause more pages to get dirtied, so there will
be more writing I/O.  And unless all these rows won't get modified in the forseeable
future, that is a waste.  "vacuum_freeze_min_age" is there for a reason: the
expectation is that rows that have not been modified for 50 million transactions
have better odds at not getting modified soon.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Avoid Wraparound Failures

От
Loles
Дата:
First of all, thank you very much for your attention.

This is exactly what happens to me: some articles on this subject say that "vacuum" does not guarantee that the problem will occur, so you have to do "vacuum freeze" and others say that "vacuum freeze" does not, which is very aggressive.

Specific details about my database is as follows:

PostgreSQL version 9.5.5 (I know. I do not like it. In Spanish there is a saying that says "the things of the Palace go slowly")

Parameter settings:

select name, setting, unit from pg_settings where name like '%vacuum%';
                name                 |  setting  | unit
-------------------------------------+-----------+------
 autovacuum                          | on        |
 autovacuum_analyze_scale_factor     | 0.1       |
 autovacuum_analyze_threshold        | 50        |
 autovacuum_freeze_max_age           | 200000000 |
 autovacuum_max_workers              | 3         |
 autovacuum_multixact_freeze_max_age | 400000000 |
 autovacuum_naptime                  | 60        | s
 autovacuum_vacuum_cost_delay        | 20        | ms
 autovacuum_vacuum_cost_limit        | -1        |
 autovacuum_vacuum_scale_factor      | 0.2       |
 autovacuum_vacuum_threshold         | 50        |
 autovacuum_work_mem                 | -1        | kB
 log_autovacuum_min_duration         | 0         | ms
 vacuum_cost_delay                   | 0         | ms
 vacuum_cost_limit                   | 200       |
 vacuum_cost_page_dirty              | 20        |
 vacuum_cost_page_hit                | 1         |
 vacuum_cost_page_miss               | 10        |
 vacuum_defer_cleanup_age            | 0         |
 vacuum_freeze_min_age               | 50000000  |
 vacuum_freeze_table_age             | 150000000 |
 vacuum_multixact_freeze_min_age     | 5000000   |
 vacuum_multixact_freeze_table_age   | 150000000 |
(23 filas)



Age of databases:

 select datname, age(datfrozenxid) from pg_database;
  datname     |    age
--------------+-----------
 template0    |  69349496
 postgres     | 169695544
 pgbench      | 169695544
 template1    | 168282793
 importantdb  | 196868106
(5 filas)



Age of tables in importantdb:

select relname, age(relfrozenxid) from pg_class where relkind = 'r' order by 2 desc limit 30;

relname | age
--------+-----------
 tab1   | 196869678
 tab2   | 196869678
 tab3   | 196869678
 tab4   | 196869678
 tab5   | 196869678
 tab6   | 196869678
 tab7   | 191936208
 tab8   | 191920251
 tab9   | 184863956
 tab10  | 184715542
 tab11  | 183564487
 tab12  | 182443645
 tab13  | 182443645
 tab14  | 182265657
 tab15  | 182026732
 tab16  | 177048525
 tab17  | 176699916
 tab18  | 176699916
 tab19  | 176699913
 tab20  | 176699913
 tab21  | 176699913
 tab22  | 174011719
 tab23  | 174011719
 tab24  | 174011719
 tab25  | 174011719
 tab26  | 172855144
 tab27  | 171968848
 tab28  | 170248697
 tab29  | 170246700
 tab30  | 170245634
(30 filas)


So yes, there are tables with age near of 200 million.

So, if I have understood it correctly:

1) set autovacuum_vacuum_cost_delay from 20 to 2

2) execute, only for the largest tables, vacuum command:

   vacuum tab1;
   vacuum tab2;

   and so on

3) Sleep peacefully :)

Is that so?

Another doubt: query filtered relkind = 'r' but there are also sequences, views, catalog tables... that are also close to 200 million.

Vacuum the catalog tables too or is it not necessary?

I understand that the views, sequences and everything that depends on the table will improve by vacuuming that table.

I don't abuse anymore. Many many thanks!

El sáb, 26 mar 2022 a las 5:50, Laurenz Albe (<laurenz.albe@cybertec.at>) escribió:
On Fri, 2022-03-25 at 23:45 -0500, Ron wrote:
> > If you want to prevent that autovacuum run from happening while your system is
> > busy (which normally also is no problem), you can trigger a manual VACUUM at
> > a time of lower database activity.  But make it a plain VACUUM, not a
> > VACUUM (FREEZE)
>
> Why not VACUUM FREEZE?

Because then you freeze *all* visible rows, not only those that are older than
"vacuum_freeze_min_age".  That will cause more pages to get dirtied, so there will
be more writing I/O.  And unless all these rows won't get modified in the forseeable
future, that is a waste.  "vacuum_freeze_min_age" is there for a reason: the
expectation is that rows that have not been modified for 50 million transactions
have better odds at not getting modified soon.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com



Re: Avoid Wraparound Failures

От
Vijaykumar Jain
Дата:


On Sat, 26 Mar 2022 at 3:47 AM Loles <lolesft@gmail.com> wrote:
Hi!

Suppose the databases on my instance are near to have a wraparound failure.

(I think so, from what I see, but in the PostgreSQL log I haven't seen any warning about It yet).

What do I have to do?

vacuum freeze;

better than,

vacuum analyze;

Or both?

If the autovacuum_freeze configuration parameters have defaults values, should I modify any first?

More I read of this topic, more confused I am.

Please, I need simple and wise advice :)

Thanks!




This guy asked on telegram how to simulate the problem and then actually went on to simulate and log the observations and how to resolve it.
Of course this might be the most simplest of all cases, but it does help visualise.



--

Re: Avoid Wraparound Failures

От
Laurenz Albe
Дата:
On Sat, 2022-03-26 at 07:47 +0100, Loles wrote:
> So, if I have understood it correctly:
> 
> 1) set autovacuum_vacuum_cost_delay from 20 to 2

It is not strictly required, but that will keep autovacuum from taking
forever to process your big tables.

It has no influence on manual VACUUM though.

> 2) execute, only for the largest tables, vacuum command:
> 
>    vacuum tab1;
>    vacuum tab2;
> 
>    and so on

If you don't want to wait for autovacuum to do it automatically, yes.

> 3) Sleep peacefully :)

Definitely.  This is normal behavior.

> Another doubt: query filtered relkind = 'r' but there are also sequences, views, catalog tables... that are also
closeto 200 million. 
 

Sequences and views are don't need VACUUM.
Materialized views and catalog tables do.

> Vacuum the catalog tables too or is it not necessary?

Same as all other tables: do it manually or let autovacuum do it automatically.

> I understand that the views, sequences and everything that depends on the table will improve by vacuuming that
table.

There is no influence on sequences.

VACUUM can improve the performance of some queries, but mostly it is about keeping your tables
healthy and in good shape.  Views may benefit just like all other queries do.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Avoid Wraparound Failures

От
Michael Banck
Дата:
Hi,

On Sat, Mar 26, 2022 at 07:47:17AM +0100, Loles wrote:
> PostgreSQL version 9.5.5 (I know. I do not like it. In Spanish there
> is a saying that says "the things of the Palace go slowly")

Note that 9.5 does not have the freeze map yet (that came in 9.6), so
anything freeze-related is way worse there.

So maybe just get some sign-off from the higher-ups that they are aware
of the risks of running an unpatched, end-of-lifed database to cover
your own bases.


Michael



Re: Avoid Wraparound Failures

От
Loles
Дата:
Thank you very much friends!!

I have read all of you carefully, and I have read the simulation of the failure and subsequent recovery. Very interesting.

With all the information, it seems clear that autovacuum must be well adjusted for this process to be able to maintenance the tables in good condition and include the execution of vacuum as maintenance routines.

I have done a test and... to my amazement, vacuum does not reduce the age of the relfrozenxid.

The test is very simple.

Create new database and connect to it:

postgres=# create database test_xid;
CREATE DATABASE
postgres=#
postgres=# \c test_xid
You are now connected to database "test_xid" as user "todopostgresql".


Check the current transaction:

test_xid=# select txid_current();
 txid_current
--------------
          983
(1 row)


Create a table and insert 10 million tuples:

test_xid=# create table t1 (id serial);
CREATE TABLE
test_xid=# insert into t1 select generate_series(1,10000000);
INSERT 0 10000000


The frozen transaction on table t1 is 984 and has a distance of 3 from txid, which is currently 987. So far so good.

test_xid=# select relfrozenxid, age(relfrozenxid) from pg_class where relname = 't1';
 relfrozenxid | age
--------------+-----
          984 |   3

test_xid=# select txid_current();
 txid_current
--------------
          987


"Dirty" table t1:

test_xid=# update t1 set id = id + 1;
UPDATE 10000000
test_xid=#
test_xid=# select relfrozenxid, age(relfrozenxid) from pg_class where relname = 't1';
 relfrozenxid | age
--------------+-----
          984 |   5
(1 row)

test_xid=# select txid_current();
 txid_current
--------------
          989
(1 row)

If I do a vacuum now, the 10 million "dirty" tuples will be marked as reusable.

test_xid=# vacuum t1;
VACUUM
test_xid=# select relfrozenxid, age(relfrozenxid) from pg_class where relname = 't1';
 relfrozenxid | age
--------------+-----
          984 |   7
(1 row)

test_xid=# select txid_current();
 txid_current
--------------
          991
(1 row)


If I now insert another 10 million tuples they should occupy the space of already past ¿frozen by vacuum? transactions:

test_xid=# insert into t1 select generate_series(1,10000000);
INSERT 0 10000000
test_xid=# select relfrozenxid, age(relfrozenxid) from pg_class where relname = 't1';
 relfrozenxid | age
--------------+-----
          984 |   9
(1 row)

test_xid=# select txid_current();
 txid_current
--------------
          993
(1 row)


But no. I don't see the improvement. Try again with vacuum analyze:

test_xid=# update t1 set id = id + 1;
UPDATE 20000000
test_xid=#
test_xid=# select relfrozenxid, age(relfrozenxid) from pg_class where relname = 't1';
 relfrozenxid | age
--------------+-----
          984 |  12
(1 row)

test_xid=# select txid_current();
 txid_current
--------------
          996
(1 row)

test_xid=# vacuum ANALYZE t1;
VACUUM
test_xid=# select relfrozenxid, age(relfrozenxid) from pg_class where relname = 't1';
 relfrozenxid | age
--------------+-----
          984 |  14
(1 row)

test_xid=# insert into t1 select generate_series(1,10000000);
INSERT 0 10000000
test_xid=# select relfrozenxid, age(relfrozenxid) from pg_class where relname = 't1';
 relfrozenxid | age
--------------+-----
          984 |  15
(1 row)

test_xid=# select txid_current();
 txid_current
--------------
          999
(1 row)

Either. Do a vacuum of the entire database:

test_xid=# vacuum ;
VACUUM
test_xid=# select relfrozenxid, age(relfrozenxid) from pg_class where relname = 't1';
 relfrozenxid | age
--------------+-----
          984 |  17

Neither. Try vacuum freeze:

test_xid=# vacuum FREEZE t1;
VACUUM
test_xid=# select relfrozenxid, age(relfrozenxid) from pg_class where relname = 't1';
 relfrozenxid | age
--------------+-----
         1001 |   0
(1 row)

test_xid=# select txid_current();
 txid_current
--------------
         1001
(1 row)

Yes now. Sure, because age is the difference between txid and relfrozenxid and "vacuum freexe" freezes all transactions before txid.

But we have agreed that vacuum freeze has disadvantages and that vacuum should be enough.

Why does vacuum seem to do nothing? Is my test wrong?

I need to understand.. and be able to continue with my life XD


El sáb, 26 mar 2022 a las 9:32, Michael Banck (<mbanck@gmx.net>) escribió:
Hi,

On Sat, Mar 26, 2022 at 07:47:17AM +0100, Loles wrote:
> PostgreSQL version 9.5.5 (I know. I do not like it. In Spanish there
> is a saying that says "the things of the Palace go slowly")

Note that 9.5 does not have the freeze map yet (that came in 9.6), so
anything freeze-related is way worse there.

So maybe just get some sign-off from the higher-ups that they are aware
of the risks of running an unpatched, end-of-lifed database to cover
your own bases.


Michael

Re: Avoid Wraparound Failures

От
Peter Geoghegan
Дата:
On Sat, Mar 26, 2022 at 11:29 AM Loles <lolesft@gmail.com> wrote:
> Why does vacuum seem to do nothing? Is my test wrong?

No, your test seems fine. I work on this area of the code, and I have
to concede that it's more confusing than it really needs to be.

> I need to understand.. and be able to continue with my life XD

Technically relfrozenxid can be advanced by any VACUUM operation. In
practice there are a couple of low-level issues that make it rather
unlikely that it will happen, outside of an aggressive VACUUM -- so a
person could be forgiven for thinking that it's only possible during
aggressive VACUUMs. Aggressive VACUUMs are (by definition) guaranteed
to be able to advance relfrozenxid such that the final
age(relfrozenxid) is set to a value approximately equal to your
vacuum_freeze_min_age setting. They happen because no non-aggressive
VACUUM ever advanced relfrozenxid (even though, as I said, that's very
much the common case!).

An aggressive VACUUM can be either an anti-wraparound autovacuum, or a
VACUUM (manual or autovacuum) that is aggressive by virtue of the
table's age(relfrozenxid) exceeding vacuum_freeze_table_age when the
VACUUM begins. In practice most installations usually have
relfrozenxid advanced by aggressive anti-wraparound VACUUMs.

Fortunately it's usually fine to ignore all of this -- you can just
rely on autovacuum. To answer your original question: the simplest
thing you could do to dramatically reduce the risk of wraparound
failure (which is probably very low for your application already) is
to upgrade.

In particular, the most recent stable version (Postgres 14) has a new
wraparound failsafe mechanism that makes autovacuum do everything it
can to avoid wraparound failure, should you get near the point of
running out of XID space. It also recovers from the situation without
operator intervention should the worst happen (unless maybe you have
something that totally holds back cleanup by VACUUM, like a leaked
replication slot). Finally, there is the freeze map work added to 9.6,
which also saves a lot of work during aggressive VACUUMs.

-- 
Peter Geoghegan



Re: Avoid Wraparound Failures

От
Loles
Дата:
The good thing about this is that I have another argument to convince my bosses of the need to upgrade.

I have noticed error in my test: vacuum will have the expected effect when the age of the txid stored in relfrozenxid is autovacuum_freeze_max_age . Before that, vacuum does not freeze the txids because it considers that they are likely to be modified soon.

Thank you! and happy Sunday!!


El sáb, 26 mar 2022 a las 19:54, Peter Geoghegan (<pg@bowt.ie>) escribió:
On Sat, Mar 26, 2022 at 11:29 AM Loles <lolesft@gmail.com> wrote:
> Why does vacuum seem to do nothing? Is my test wrong?

No, your test seems fine. I work on this area of the code, and I have
to concede that it's more confusing than it really needs to be.

> I need to understand.. and be able to continue with my life XD

Technically relfrozenxid can be advanced by any VACUUM operation. In
practice there are a couple of low-level issues that make it rather
unlikely that it will happen, outside of an aggressive VACUUM -- so a
person could be forgiven for thinking that it's only possible during
aggressive VACUUMs. Aggressive VACUUMs are (by definition) guaranteed
to be able to advance relfrozenxid such that the final
age(relfrozenxid) is set to a value approximately equal to your
vacuum_freeze_min_age setting. They happen because no non-aggressive
VACUUM ever advanced relfrozenxid (even though, as I said, that's very
much the common case!).

An aggressive VACUUM can be either an anti-wraparound autovacuum, or a
VACUUM (manual or autovacuum) that is aggressive by virtue of the
table's age(relfrozenxid) exceeding vacuum_freeze_table_age when the
VACUUM begins. In practice most installations usually have
relfrozenxid advanced by aggressive anti-wraparound VACUUMs.

Fortunately it's usually fine to ignore all of this -- you can just
rely on autovacuum. To answer your original question: the simplest
thing you could do to dramatically reduce the risk of wraparound
failure (which is probably very low for your application already) is
to upgrade.

In particular, the most recent stable version (Postgres 14) has a new
wraparound failsafe mechanism that makes autovacuum do everything it
can to avoid wraparound failure, should you get near the point of
running out of XID space. It also recovers from the situation without
operator intervention should the worst happen (unless maybe you have
something that totally holds back cleanup by VACUUM, like a leaked
replication slot). Finally, there is the freeze map work added to 9.6,
which also saves a lot of work during aggressive VACUUMs.

--
Peter Geoghegan