Обсуждение: pg_multixact issues
Hello,
Our pg_multixact directory keeps growing. I did a "vacuum freeze” which didn’t help. I also did a "vacuum full” which didn’t help either.
We had this condition with 9.3.4 as well. When I upgraded our cluster to 9.4.5 (via plain sql dump and load) as expected the issue was resolved but now it’s happening again. Luckily it has no ill effect other than consuming 4G of space for an otherwise 1G database.
Can you offer any hints as to how I can cure this?
thanks,
Kiriakos Georgiou
pg_controldata output:
pg_control version number: 942
Catalog version number: 201409291
Database system identifier: 6211781659140720513
Database cluster state: in production
pg_control last modified: Wed Feb 10 13:45:02 2016
Latest checkpoint location: D/FB5FE630
Prior checkpoint location: D/FB5FE558
Latest checkpoint's REDO location: D/FB5FE5F8
Latest checkpoint's REDO WAL file: 000000010000000D000000FB
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0/3556219
Latest checkpoint's NextOID: 2227252
Latest checkpoint's NextMultiXactId: 2316566
Latest checkpoint's NextMultiOffset: 823062151
Latest checkpoint's oldestXID: 668
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 3556219
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Time of latest checkpoint: Wed Feb 10 13:45:02 2016
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: hot_standby
Current wal_log_hints setting: off
Current max_connections setting: 100
Current max_worker_processes setting: 8
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 1024
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0
the offsets directory:
-rw------- 1 postgres dba 262144 Nov 3 15:22 0000
-rw------- 1 postgres dba 262144 Nov 5 12:45 0001
-rw------- 1 postgres dba 262144 Nov 9 14:25 0002
-rw------- 1 postgres dba 262144 Nov 13 10:10 0003
-rw------- 1 postgres dba 262144 Nov 16 15:40 0004
-rw------- 1 postgres dba 262144 Nov 20 09:55 0005
-rw------- 1 postgres dba 262144 Dec 1 08:00 0006
-rw------- 1 postgres dba 262144 Dec 9 11:50 0007
-rw------- 1 postgres dba 262144 Dec 16 08:14 0008
-rw------- 1 postgres dba 262144 Dec 21 09:40 0009
-rw------- 1 postgres dba 262144 Dec 31 09:55 000A
-rw------- 1 postgres dba 262144 Jan 4 21:17 000B
-rw------- 1 postgres dba 262144 Jan 6 10:50 000C
-rw------- 1 postgres dba 262144 Jan 7 18:20 000D
-rw------- 1 postgres dba 262144 Jan 13 13:55 000E
-rw------- 1 postgres dba 262144 Jan 15 11:55 000F
-rw------- 1 postgres dba 262144 Jan 22 07:50 0010
-rw------- 1 postgres dba 262144 Jan 26 16:35 0011
-rw------- 1 postgres dba 262144 Jan 29 10:16 0012
-rw------- 1 postgres dba 262144 Feb 3 13:17 0013
-rw------- 1 postgres dba 262144 Feb 3 16:13 0014
-rw------- 1 postgres dba 262144 Feb 4 08:24 0015
-rw------- 1 postgres dba 262144 Feb 5 13:20 0016
-rw------- 1 postgres dba 262144 Feb 8 11:26 0017
-rw------- 1 postgres dba 262144 Feb 8 11:46 0018
-rw------- 1 postgres dba 262144 Feb 8 12:25 0019
-rw------- 1 postgres dba 262144 Feb 8 13:19 001A
-rw------- 1 postgres dba 262144 Feb 8 14:23 001B
-rw------- 1 postgres dba 262144 Feb 8 15:32 001C
-rw------- 1 postgres dba 262144 Feb 8 17:01 001D
-rw------- 1 postgres dba 262144 Feb 8 19:19 001E
-rw------- 1 postgres dba 262144 Feb 8 22:11 001F
-rw------- 1 postgres dba 262144 Feb 9 01:44 0020
-rw------- 1 postgres dba 262144 Feb 9 05:57 0021
-rw------- 1 postgres dba 262144 Feb 9 10:45 0022
-rw------- 1 postgres dba 98304 Feb 10 13:35 0023
the members directory has 15723 files:
ls -l|wc -l
15723
On Thu, Feb 11, 2016 at 7:52 AM, Kiriakos Georgiou <kg.postgresql@olympiakos.com> wrote: > Hello, > > Our pg_multixact directory keeps growing. I did a "vacuum freeze” which > didn’t help. I also did a "vacuum full” which didn’t help either. > We had this condition with 9.3.4 as well. When I upgraded our cluster to > 9.4.5 (via plain sql dump and load) as expected the issue was resolved but > now it’s happening again. Luckily it has no ill effect other than consuming > 4G of space for an otherwise 1G database. > > Can you offer any hints as to how I can cure this? > > thanks, > Kiriakos Georgiou > > > pg_controldata output: > > pg_control version number: 942 > Catalog version number: 201409291 > Database system identifier: 6211781659140720513 > Database cluster state: in production > pg_control last modified: Wed Feb 10 13:45:02 2016 > Latest checkpoint location: D/FB5FE630 > Prior checkpoint location: D/FB5FE558 > Latest checkpoint's REDO location: D/FB5FE5F8 > Latest checkpoint's REDO WAL file: 000000010000000D000000FB > Latest checkpoint's TimeLineID: 1 > Latest checkpoint's PrevTimeLineID: 1 > Latest checkpoint's full_page_writes: on > Latest checkpoint's NextXID: 0/3556219 > Latest checkpoint's NextOID: 2227252 > Latest checkpoint's NextMultiXactId: 2316566 > Latest checkpoint's NextMultiOffset: 823062151 > Latest checkpoint's oldestXID: 668 > Latest checkpoint's oldestXID's DB: 1 > Latest checkpoint's oldestActiveXID: 3556219 > Latest checkpoint's oldestMultiXid: 1 > Latest checkpoint's oldestMulti's DB: 1 > Time of latest checkpoint: Wed Feb 10 13:45:02 2016 > Fake LSN counter for unlogged rels: 0/1 > Minimum recovery ending location: 0/0 > Min recovery ending loc's timeline: 0 > Backup start location: 0/0 > Backup end location: 0/0 > End-of-backup record required: no > Current wal_level setting: hot_standby > Current wal_log_hints setting: off > Current max_connections setting: 100 > Current max_worker_processes setting: 8 > Current max_prepared_xacts setting: 0 > Current max_locks_per_xact setting: 1024 > Maximum data alignment: 8 > Database block size: 8192 > Blocks per segment of large relation: 131072 > WAL block size: 8192 > Bytes per WAL segment: 16777216 > Maximum length of identifiers: 64 > Maximum columns in an index: 32 > Maximum size of a TOAST chunk: 1996 > Size of a large-object chunk: 2048 > Date/time type storage: 64-bit integers > Float4 argument passing: by value > Float8 argument passing: by value > Data page checksum version: 0 > > the offsets directory: > > -rw------- 1 postgres dba 262144 Nov 3 15:22 0000 > -rw------- 1 postgres dba 262144 Nov 5 12:45 0001 > -rw------- 1 postgres dba 262144 Nov 9 14:25 0002 > -rw------- 1 postgres dba 262144 Nov 13 10:10 0003 > -rw------- 1 postgres dba 262144 Nov 16 15:40 0004 > -rw------- 1 postgres dba 262144 Nov 20 09:55 0005 > -rw------- 1 postgres dba 262144 Dec 1 08:00 0006 > -rw------- 1 postgres dba 262144 Dec 9 11:50 0007 > -rw------- 1 postgres dba 262144 Dec 16 08:14 0008 > -rw------- 1 postgres dba 262144 Dec 21 09:40 0009 > -rw------- 1 postgres dba 262144 Dec 31 09:55 000A > -rw------- 1 postgres dba 262144 Jan 4 21:17 000B > -rw------- 1 postgres dba 262144 Jan 6 10:50 000C > -rw------- 1 postgres dba 262144 Jan 7 18:20 000D > -rw------- 1 postgres dba 262144 Jan 13 13:55 000E > -rw------- 1 postgres dba 262144 Jan 15 11:55 000F > -rw------- 1 postgres dba 262144 Jan 22 07:50 0010 > -rw------- 1 postgres dba 262144 Jan 26 16:35 0011 > -rw------- 1 postgres dba 262144 Jan 29 10:16 0012 > -rw------- 1 postgres dba 262144 Feb 3 13:17 0013 > -rw------- 1 postgres dba 262144 Feb 3 16:13 0014 > -rw------- 1 postgres dba 262144 Feb 4 08:24 0015 > -rw------- 1 postgres dba 262144 Feb 5 13:20 0016 > -rw------- 1 postgres dba 262144 Feb 8 11:26 0017 > -rw------- 1 postgres dba 262144 Feb 8 11:46 0018 > -rw------- 1 postgres dba 262144 Feb 8 12:25 0019 > -rw------- 1 postgres dba 262144 Feb 8 13:19 001A > -rw------- 1 postgres dba 262144 Feb 8 14:23 001B > -rw------- 1 postgres dba 262144 Feb 8 15:32 001C > -rw------- 1 postgres dba 262144 Feb 8 17:01 001D > -rw------- 1 postgres dba 262144 Feb 8 19:19 001E > -rw------- 1 postgres dba 262144 Feb 8 22:11 001F > -rw------- 1 postgres dba 262144 Feb 9 01:44 0020 > -rw------- 1 postgres dba 262144 Feb 9 05:57 0021 > -rw------- 1 postgres dba 262144 Feb 9 10:45 0022 > -rw------- 1 postgres dba 98304 Feb 10 13:35 0023 > > the members directory has 15723 files: > ls -l|wc -l > 15723 Maybe you didn't vacuum freeze your template0 database and it's probably holding back the oldest multixact ID -- take a look at the datminmxid column in pg_database. You seem to be generating fairly large multixacts, and they use disk space until the cluster-wide oldest multixact ID is advanced by vacuum freeze. You can estimate the average number of members per multixact like this (assuming default page size): number of members = number of member segment files * 1636 * 32 number of multixacts = number of offsets segment files * 2048 * 32 You have about 2.3 million multixacts, and about 823 million members, so your multixacts have an average of around 358 members each. There are two ways for autovacuum to be automatically triggered to clean up old multixact state: 1. When you have more than vacuum_multixact_freeze_table_age (default 150 million) multixacts, it will perform a wraparound vacuum to advance the cluster's oldest multixact ID. The oldest is 1 for you, meaning this has probably never happened, and you're only up to around 2.3 million so it's not going to happen for a long time with the default setting. 2. When it thinks that the member address space is in danger of wrapping, which kicks in before the above for clusters with large average multixact size. This happens after MULTIXACT_MEMBER_SAFE_THRESHOLD is exceeded, which is when you get to ~2.2 billion members, and you're not quite half way there yet. I'm not sure what the best course of action is for you, but here are some thoughts: 1. You could manually freeze all databases with vacuumdb --freeze --all, but this requires first setting datallowcon to true for your template0 database, because otherwise it would not be vacuum-frozen by any command you can issue manually, and would therefore prevent the oldest multixact ID from advancing. 2. You could change the vacuum_multixact_freeze_table_age to a value that will cause it to do vacuum freezes more often. Assuming your workload continues with the same average multixact size, you could work backwards from the numbers above: since 2.3 million multixacts occupy 4GB, if you set vacuum_multixact_freeze_table_age = 500000 you should expect a size under 1GB. But note that the freeze operations can be IO intensive and take a long time depending on your database size so you don't want them too often. 3. You could do nothing and wait for autovacuum to detect that you are using more than half the member address space and trigger a freeze, which will happen some time after you have around 41k member segments (occupying around 10GB of disk since they hold 256kb each). 4. You could look into whether all those multixacts with many member are really expected. (Large numbers of concurrent FK checks or explicit share locks on the same rows perhaps? A good proportion of this happened on one day this week I see.) -- Thomas Munro http://www.enterprisedb.com
Thomas Munro wrote: > 4. You could look into whether all those multixacts with many member > are really expected. (Large numbers of concurrent FK checks or > explicit share locks on the same rows perhaps? A good proportion of > this happened on one day this week I see.) I think maybe this patch https://www.postgresql.org/message-id/20150620043650.GX133018%40postgresql.org should help with this. I expect to come back to this and get it pushed to 9.3 and 9.4 sometime ... -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Feb 11, 2016 at 11:05 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Thomas Munro wrote: > >> 4. You could look into whether all those multixacts with many member >> are really expected. (Large numbers of concurrent FK checks or >> explicit share locks on the same rows perhaps? A good proportion of >> this happened on one day this week I see.) > > I think maybe this patch > https://www.postgresql.org/message-id/20150620043650.GX133018%40postgresql.org > should help with this. I expect to come back to this and get it pushed > to 9.3 and 9.4 sometime ... Ah, so the OP might get some improvement today by moving to 9.5, which has the optimization already. BTW in my message above I said vacuum_multixact_freeze_table_age when I meant to say autovacuum_multixact_freeze_max_age (and its default is 400 million). -- Thomas Munro http://www.enterprisedb.com
Καλημέρα Κυριάκο
We have been running 9.3.4 as our test system for quite some years, and 9.3.10 on production for a month or so (little less than 1T size, WAL changes worth of 2-5GB/day, about 250K xations/day) we never experienced any problems with data/pg_multixact.
In our 9.3.4 :
% pg_controldata data | grep -i multi
Latest checkpoint's NextMultiXactId: 69
Latest checkpoint's NextMultiOffset: 135
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 0
% du -h data/pg_multixact/
10K data/pg_multixact/members
10K data/pg_multixact/offsets
22K data/pg_multixact/
In our prod 9.3.10:
~> pg_controldata data | grep -i multi
Latest checkpoint's NextMultiXactId: 12404
Latest checkpoint's NextMultiOffset: 356
Latest checkpoint's oldestMultiXid: 12232
Latest checkpoint's oldestMulti's DB: 16426
~> du -h data/pg_multixact/
12K data/pg_multixact/members
24K data/pg_multixact/offsets
40K data/pg_multixact/
Our system comprises of a JEE installation with ~ 500 uesrs, and at least 2 other applications hitting the same tables at the same time, we have about one case of deadlocks per week.
What could be different on yours?
On 10/02/2016 20:52, Kiriakos Georgiou wrote:
We have been running 9.3.4 as our test system for quite some years, and 9.3.10 on production for a month or so (little less than 1T size, WAL changes worth of 2-5GB/day, about 250K xations/day) we never experienced any problems with data/pg_multixact.
In our 9.3.4 :
% pg_controldata data | grep -i multi
Latest checkpoint's NextMultiXactId: 69
Latest checkpoint's NextMultiOffset: 135
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 0
% du -h data/pg_multixact/
10K data/pg_multixact/members
10K data/pg_multixact/offsets
22K data/pg_multixact/
In our prod 9.3.10:
~> pg_controldata data | grep -i multi
Latest checkpoint's NextMultiXactId: 12404
Latest checkpoint's NextMultiOffset: 356
Latest checkpoint's oldestMultiXid: 12232
Latest checkpoint's oldestMulti's DB: 16426
~> du -h data/pg_multixact/
12K data/pg_multixact/members
24K data/pg_multixact/offsets
40K data/pg_multixact/
Our system comprises of a JEE installation with ~ 500 uesrs, and at least 2 other applications hitting the same tables at the same time, we have about one case of deadlocks per week.
What could be different on yours?
On 10/02/2016 20:52, Kiriakos Georgiou wrote:
Hello,Our pg_multixact directory keeps growing. I did a "vacuum freeze” which didn’t help. I also did a "vacuum full” which didn’t help either.We had this condition with 9.3.4 as well. When I upgraded our cluster to 9.4.5 (via plain sql dump and load) as expected the issue was resolved but now it’s happening again. Luckily it has no ill effect other than consuming 4G of space for an otherwise 1G database.Can you offer any hints as to how I can cure this?thanks,Kiriakos Georgioupg_controldata output:pg_control version number: 942Catalog version number: 201409291Database system identifier: 6211781659140720513Database cluster state: in productionpg_control last modified: Wed Feb 10 13:45:02 2016Latest checkpoint location: D/FB5FE630Prior checkpoint location: D/FB5FE558Latest checkpoint's REDO location: D/FB5FE5F8Latest checkpoint's REDO WAL file: 000000010000000D000000FBLatest checkpoint's TimeLineID: 1Latest checkpoint's PrevTimeLineID: 1Latest checkpoint's full_page_writes: onLatest checkpoint's NextXID: 0/3556219Latest checkpoint's NextOID: 2227252Latest checkpoint's NextMultiXactId: 2316566Latest checkpoint's NextMultiOffset: 823062151Latest checkpoint's oldestXID: 668Latest checkpoint's oldestXID's DB: 1Latest checkpoint's oldestActiveXID: 3556219Latest checkpoint's oldestMultiXid: 1Latest checkpoint's oldestMulti's DB: 1Time of latest checkpoint: Wed Feb 10 13:45:02 2016Fake LSN counter for unlogged rels: 0/1Minimum recovery ending location: 0/0Min recovery ending loc's timeline: 0Backup start location: 0/0Backup end location: 0/0End-of-backup record required: noCurrent wal_level setting: hot_standbyCurrent wal_log_hints setting: offCurrent max_connections setting: 100Current max_worker_processes setting: 8Current max_prepared_xacts setting: 0Current max_locks_per_xact setting: 1024Maximum data alignment: 8Database block size: 8192Blocks per segment of large relation: 131072WAL block size: 8192Bytes per WAL segment: 16777216Maximum length of identifiers: 64Maximum columns in an index: 32Maximum size of a TOAST chunk: 1996Size of a large-object chunk: 2048Date/time type storage: 64-bit integersFloat4 argument passing: by valueFloat8 argument passing: by valueData page checksum version: 0the offsets directory:-rw------- 1 postgres dba 262144 Nov 3 15:22 0000-rw------- 1 postgres dba 262144 Nov 5 12:45 0001-rw------- 1 postgres dba 262144 Nov 9 14:25 0002-rw------- 1 postgres dba 262144 Nov 13 10:10 0003-rw------- 1 postgres dba 262144 Nov 16 15:40 0004-rw------- 1 postgres dba 262144 Nov 20 09:55 0005-rw------- 1 postgres dba 262144 Dec 1 08:00 0006-rw------- 1 postgres dba 262144 Dec 9 11:50 0007-rw------- 1 postgres dba 262144 Dec 16 08:14 0008-rw------- 1 postgres dba 262144 Dec 21 09:40 0009-rw------- 1 postgres dba 262144 Dec 31 09:55 000A-rw------- 1 postgres dba 262144 Jan 4 21:17 000B-rw------- 1 postgres dba 262144 Jan 6 10:50 000C-rw------- 1 postgres dba 262144 Jan 7 18:20 000D-rw------- 1 postgres dba 262144 Jan 13 13:55 000E-rw------- 1 postgres dba 262144 Jan 15 11:55 000F-rw------- 1 postgres dba 262144 Jan 22 07:50 0010-rw------- 1 postgres dba 262144 Jan 26 16:35 0011-rw------- 1 postgres dba 262144 Jan 29 10:16 0012-rw------- 1 postgres dba 262144 Feb 3 13:17 0013-rw------- 1 postgres dba 262144 Feb 3 16:13 0014-rw------- 1 postgres dba 262144 Feb 4 08:24 0015-rw------- 1 postgres dba 262144 Feb 5 13:20 0016-rw------- 1 postgres dba 262144 Feb 8 11:26 0017-rw------- 1 postgres dba 262144 Feb 8 11:46 0018-rw------- 1 postgres dba 262144 Feb 8 12:25 0019-rw------- 1 postgres dba 262144 Feb 8 13:19 001A-rw------- 1 postgres dba 262144 Feb 8 14:23 001B-rw------- 1 postgres dba 262144 Feb 8 15:32 001C-rw------- 1 postgres dba 262144 Feb 8 17:01 001D-rw------- 1 postgres dba 262144 Feb 8 19:19 001E-rw------- 1 postgres dba 262144 Feb 8 22:11 001F-rw------- 1 postgres dba 262144 Feb 9 01:44 0020-rw------- 1 postgres dba 262144 Feb 9 05:57 0021-rw------- 1 postgres dba 262144 Feb 9 10:45 0022-rw------- 1 postgres dba 98304 Feb 10 13:35 0023the members directory has 15723 files:ls -l|wc -l15723
-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt