Обсуждение: Unexpectedly high disk space usage
We upgraded to 9.1 from 8.4 over the summer a few months ago, to new DB servers with more disk space and memory. Unexpectedly, the DB servers have steadily increased their disk space usage since. Reported system load doesn't seem to be affected. It's happening to all our DB servers running 9.1. When we reload all pg_dumps from our worst-affected server into an offline server, the disk space usage is about 26 GB, but the production database is using 166 GB. (# df /var/lib/pgsql;) To resolve this, we've tried: 1) reindexed everything (cut about 10% of disk usage temporarily) 2) tried vacuum full, and vacuum analyze on all databases. (to minimal effect) 3) Restarting PG (no discernable effect) including a full stop/start. 4) We've looked for stale prepared transactions (none found) 5) instructions from the wiki to try to determine what the cause of all the disk usage is: http://wiki.postgresql.org/wiki/Disk_Usage but when we add up all the results for all the different databases, tables, indexes, etc. in a script, we get a number very close to the usage of the freshly loaded server. (24 GB) What is Postgres doing with ~ 80% of its disk space usage? This is not normal, is it? I would hate to have to take the servers off line just to dump/restore in order to bring disk usage back to normal... SYSTEM SPECS: I've attached the postgresql.conf on a RHEL6/64 Linux server with 128 GB Of RAM and 16 real CPU cores. (HT turned on, 32 CPUs according to the O/S) #FROM: sysctl.conf: # Controls the maximum shared segment size, in bytes kernel.shmmax = 136365211648 # Controls the maximum number of shared memory segments, in pages kernel.shmall = 4294967296 -Ben
Вложения
On Mon, Nov 5, 2012 at 8:01 PM, Lists <lists@benjamindsmith.com> wrote: > We upgraded to 9.1 from 8.4 over the summer a few months ago, to new DB > servers with more disk space and memory. Unexpectedly, the DB servers have > steadily increased their disk space usage since. Reported system load > doesn't seem to be affected. It's happening to all our DB servers running > 9.1. > > When we reload all pg_dumps from our worst-affected server into an offline > server, the disk space usage is about 26 GB, but the production database is > using 166 GB. (# df /var/lib/pgsql;) > > To resolve this, we've tried: > > 1) reindexed everything (cut about 10% of disk usage temporarily) > > 2) tried vacuum full, and vacuum analyze on all databases. (to minimal > effect) > > 3) Restarting PG (no discernable effect) including a full stop/start. > > 4) We've looked for stale prepared transactions (none found) > > 5) instructions from the wiki to try to determine what the cause of all the > disk usage is: http://wiki.postgresql.org/wiki/Disk_Usage but when we add up > all the results for all the different databases, tables, indexes, etc. in a > script, we get a number very close to the usage of the freshly loaded > server. (24 GB) What does du -sh have to say about it? Use unix tools to examine your file system and see where the usage is going. For instance, I can do this: cd /var/lib/postgresql/8.4/main/ du -s *|sort -n 0 server.crt 0 server.key 4 pg_tblspc 4 pg_twophase 4 PG_VERSION 4 postmaster.opts 4 postmaster.pid 12 pg_clog 12 pg_stat_tmp 12 pg_subtrans 28 pg_multixact 460 global 16392 pg_xlog 16396 base which tells me that I'm using about 16MB for each pg_xlog and base. I can then do cd into base and look around: cd base du -s *|sort -n 5416 1 5416 11563 5560 11564 Which shows me using about 5MB each for three different dbs. And so on. On an off guess, did you go from a SQL_ASCII encoding to UTF8? That might increase disk space usage a bit.
On Mon, Nov 5, 2012 at 7:01 PM, Lists <lists@benjamindsmith.com> wrote: > We upgraded to 9.1 from 8.4 over the summer a few months ago, to new DB > servers with more disk space and memory. Unexpectedly, the DB servers have > steadily increased their disk space usage since. Reported system load > doesn't seem to be affected. It's happening to all our DB servers running > 9.1. > > When we reload all pg_dumps from our worst-affected server into an offline > server, the disk space usage is about 26 GB, but the production database is > using 166 GB. (# df /var/lib/pgsql;) My first guesses would be things that are not directly under the databases control, such as: 1) your server logs are accumulating and you aren't doing anything about them 2) you are taking backup snapshots to somewhere in that directory and not cleaning them up 3) your archive_command is failing (which you should see reports of in the server logs) and so you are accumulating xlog files. Cheers, Jeff
Jeff, thanks for the feedback! On 11/05/2012 08:51 PM, Jeff Janes wrote: > My first guesses would be things that are not directly under the > databases control, such as: > > 1) your server logs are accumulating and you aren't doing anything about them I'm guessing that this is not the case: [root@delta data]# du -shc * | grep -i log 47M pg_clog 15M pg_log 641M pg_xlog > 2) you are taking backup snapshots to somewhere in that directory and > not cleaning them up Our backup snapshots (taken with pg_dump) are taken on a different server over the network. Dumps are made several times during each day. Could this be part of the problem if (somehow) they didn't complete? And if so, would there be some cleanup I'd have to do other than restarting PG? > 3) your archive_command is failing (which you should see reports of in > the server logs) and so you are accumulating xlog files. As I understand things, the result above under 1) demonstrates that this, also, is not the cause.
I followed your example, the result is at the bottom. Based on this it would seem that there are 3-4 databases that seem to be the culprit. How could I get more depth/detail on what specifically is the problem? -Ben On 11/05/2012 07:10 PM, Scott Marlowe wrote: > What does du -sh have to say about it? Use unix tools to examine your > file system and see where the usage is going. For instance, I can do > this: cd /var/lib/postgresql/8.4/main/ du -s *|sort -n 0 server.crt 0 > server.key 4 pg_tblspc 4 pg_twophase 4 PG_VERSION 4 postmaster.opts 4 > postmaster.pid 12 pg_clog 12 pg_stat_tmp 12 pg_subtrans 28 > pg_multixact 460 global 16392 pg_xlog 16396 base which tells me that > I'm using about 16MB for each pg_xlog and base. I can then do cd into > base and look around: cd base du -s *|sort -n 5416 1 5416 11563 5560 > 11564 Which shows me using about 5MB each for three different dbs. And > so on. On an off guess, did you go from a SQL_ASCII encoding to UTF8? > That might increase disk space usage a bit. [root@delta ~]# cd /var/lib/pgsql/9.1/data/ [root@delta data]# du -s * | sort -n 4 pg_ident.conf 4 pg_serial 4 pg_tblspc 4 PG_VERSION 4 postmaster.opts 4 postmaster.pid 8 pg_hba.conf 12 pg_notify 12 pg_twophase 20 postgresql.300 20 postgresql.conf 20 postgresql.conf.20120903 20 postgresql.conf.300 76 pg_subtrans 104 pg_multixact 15044 pg_log 18184 global 25216 pg_stat_tmp 47916 pg_clog 671916 pg_xlog 164753204 base [root@delta data]# cd base [root@delta base]# du -s * | sort -n 4 pgsql_tmp 6124 12772 6388 12780 6424 1 72424 331506 72700 160676 72896 391655 73200 52389 73216 523672 74104 619675 74956 295646 76768 307580 77896 547597 80824 571547 87368 475799 90940 631604 113876 124651 123548 148525 130096 367533 149792 439726 173648 355578 175404 679545 190732 559580 225780 511706 326468 667547 352736 655477 398736 535644 469408 136582 483716 499753 513124 270926 575612 715601 590408 487780 666604 463779 713208 643540 714896 583515 803216 343438 806952 427663 855156 739506 872200 197221 975692 64371 987692 775594 1005268 595488 1024812 691482 1042212 727552 1047464 379566 1260044 76601 1276756 16384 1345072 403667 1474468 209158 1477808 172604 1536168 221124 1637652 258798 1811504 88598 1963740 245588 2076748 703467 2193536 415671 2430908 801322 2552640 319552 2785212 28315 3454880 112612 3755548 451666 3929420 100666 4651876 40451 5714940 751514 6257740 233293 7313900 184735 9334796 763606 10940780 283609 20837264 788338 45285640 607471
On Tue, Nov 6, 2012 at 10:49 AM, Lists <lists@benjamindsmith.com> wrote: > I followed your example, the result is at the bottom. Based on this it would > seem that there are 3-4 databases that seem to be the culprit. How could I > get more depth/detail on what specifically is the problem? If you have installed the contrib modules (oid2name specifically), you can use that to get the name of the bloated database: oid2name | fgrep 607471 If the name of the database doesn't give you any insight, then look for large files in the directory base/607471 that whose names all start with the same digits and use oid2name to get the names of the relations for those files. oid2name -d <name of database> -o <base name of large files> Cheers, Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > On Tue, Nov 6, 2012 at 10:49 AM, Lists <lists@benjamindsmith.com> wrote: >> I followed your example, the result is at the bottom. Based on this it would >> seem that there are 3-4 databases that seem to be the culprit. How could I >> get more depth/detail on what specifically is the problem? > If you have installed the contrib modules (oid2name specifically), you > can use that to get the name of the bloated database: > oid2name | fgrep 607471 Or, if you didn't install contrib, try select datname from pg_database where oid = 607471 > If the name of the database doesn't give you any insight, then look > for large files in the directory base/607471 that whose names all > start with the same digits and use oid2name to get the names of the > relations for those files. > oid2name -d <name of database> -o <base name of large files> For this you can try select relname from pg_class where relfilenode = <whatever> Or let the database do the work: select relname, pg_relation_size(oid) from pg_class order by 2 desc; regards, tom lane
On Tue, Nov 6, 2012 at 11:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Janes <jeff.janes@gmail.com> writes: >> On Tue, Nov 6, 2012 at 10:49 AM, Lists <lists@benjamindsmith.com> wrote: >>> I followed your example, the result is at the bottom. Based on this it would >>> seem that there are 3-4 databases that seem to be the culprit. How could I >>> get more depth/detail on what specifically is the problem? > >> If you have installed the contrib modules (oid2name specifically), you >> can use that to get the name of the bloated database: >> oid2name | fgrep 607471 > > Or, if you didn't install contrib, try > > select datname from pg_database where oid = 607471 Thanks, I knew there had to be a more direct way to do that. > >> If the name of the database doesn't give you any insight, then look >> for large files in the directory base/607471 that whose names all >> start with the same digits and use oid2name to get the names of the >> relations for those files. > >> oid2name -d <name of database> -o <base name of large files> > > For this you can try > > select relname from pg_class where relfilenode = <whatever> > > Or let the database do the work: > > select relname, pg_relation_size(oid) from pg_class order by 2 desc; Ben described using something like this method originally and not finding the space, so I wanted to work backwards from certain knowledge of where the OS says the space is being used. But now I think maybe his scripts to aggregate table sizes over all databases (and also his script to load pg_dumps of those databases into a new cluster) are accidentally omitting some databases--the largest ones. Is there a simple query for a super-user to get a list of all relation sizes over all databases cluster-wide? If "\l+" can get the size of databases other than the one currently connected to, maybe there is a way to extend that to tables in those other databases. It would at least be nice to be able to get the sizes of all databases. Since '\l+' doesn't sort by size and I don't know how to make it do so, I pulled the query from psql source code and modified it: SELECT d.datname as Name, pg_catalog.pg_get_userbyid(d.datdba) as Owner, CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname)) ELSE 'No Access' END as Size FROM pg_catalog.pg_database d order by CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_database_size(d.datname) ELSE NULL END desc nulls first; (And discovered a long forgotten unused database I had sitting around taking up space) Ben, did you ever figure out where the space was going? Cheers, Jeff
On 11/07/2012 09:01 AM, Jeff Janes wrote: > Ben, did you ever figure out where the space was going? I think we've found where the space is going, but I still don't yet know how to resolve it. I modified your query thusly in order to get a total of space used, and got an answer that matches closely: with stuff as (SELECT d.datname as Name, pg_catalog.pg_get_userbyid(d.datdba) as Owner, CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_database_size(d.datname) ELSE -1 END as Size FROM pg_catalog.pg_database d order by CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_database_size(d.datname) ELSE NULL END desc nulls first) SELECT sum(size) AS overall from stuff; Result: 171,276,369,124 # du -sbc /var/lib/pgsql/9.1/data/* Result: 172,087,129,512 Now, the question is, I see several databases that uses disk usage with sizes that are dramatically different than I get from a dump/restore to another machine: Production: santarosa444 | postgres | 44 GB Dump/Restore: santarosa444 | postgres | 685 MB Now, here's where it gets weird. From the disk space usage wiki, (http://wiki.postgresql.org/wiki/Disk_Usage) you'll recognize how I used it to get a total disk space used result: with mytable AS ( SELECT nspname || '.' || relname AS "relation", pg_total_relation_size(C.oid) AS "size" FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND C.relkind <> 'i' AND nspname !~ '^pg_toast' ORDER BY pg_total_relation_size(C.oid) DESC ) SELECT sum(size) AS size FROM mytable ... but the total result is 747,569,152 which is close to the dump/restore value, not the production server value, even though I'm running this query on the production server. So there's *something* that the latter query isn't identifying that the former is. On a hunch, ran this query: with mytable AS ( SELECT nspname || '.' || relname AS "relation", pg_total_relation_size(C.oid) AS "size" FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE (true or (nspname NOT IN ('pg_catalog', 'information_schema'))) AND C.relkind <> 'i' AND nspname !~ '^pg_toast' ORDER BY pg_total_relation_size(C.oid) DESC ) SELECT sum(size) AS size FROM mytable; And the result is 46,771,216,384! Removing the "mytable" wrapper stuff, here are the top results: pg_catalog.pg_attribute | 36727480320 pg_catalog.pg_attrdef | 3800072192 pg_catalog.pg_depend | 2665930752 pg_catalog.pg_class | 1508925440 pg_catalog.pg_type | 1113038848 public.att_claims | 451698688 public.stgrades | 127639552 pg_catalog.pg_index | 107806720 Google returns this page: http://www.postgresql.org/docs/9.1/static/catalog-pg-attribute.html which doesn't help me much. So, am I doing something wrong with admin? Our current process is that every night in the middle of the night, a script connects to each database on each server and runs a query to get all tables in each database and, for each, run "VACUUM ANALYZE $table" for each table in the database. And then once a week: psql -U postgres -c "\l" | grep -Po "(\w+444)" | xargs -t -i psql -U postgres {} -c "REINDEX DATABASE {};" (note: there is a database for the "postgres" user on each DB server) The script is a remnant from PG 8.x days, so am I missing something fundamental about 9.x? I will note that autovacuum is off because it occasionally causes transactions and queries to hang when an update causes a vacuum mid-day, effectively taking us offline randomly. Our scenario is pretty much a worst-possible case of transactions, prepared transactions, temp tables, and concurrent read/write queries.
Lists <lists@benjamindsmith.com> writes: > pg_catalog.pg_attribute | 36727480320 Ouch. > Our current process is that every night in the middle of the night, a > script connects to each database on each server and runs a query to get > all tables in each database and, for each, run > "VACUUM ANALYZE $table" > for each table in the database. > (note: there is a database for the "postgres" user on each DB server) > The script is a remnant from PG 8.x days, so am I missing something > fundamental about 9.x? I will note that autovacuum is off ... So you've turned off autovacuum, and are carefully not vacuuming the system catalogs. That's your problem all right. Is there a particularly good reason why this script isn't a one-liner "VACUUM"? Are you sure that once-a-day vacuuming is sufficient, even if it was covering the system catalogs? If you've managed to bloat pg_attribute to 36GB, I suspect you've got enough churn (perhaps from temp tables) that you really need the catalogs vacuumed more often. My advice is dump, reload, and *don't* turn off autovacuum. > ... because it > occasionally causes transactions and queries to hang when an update > causes a vacuum mid-day, effectively taking us offline randomly. I suspect this claim is based on ancient and no longer very relevant experience. regards, tom lane
Hi Greg, I've added you to the cc list because I'm proposing to change some wiki content which you wrote On Wed, Nov 7, 2012 at 11:54 AM, Lists <lists@benjamindsmith.com> wrote: > On 11/07/2012 09:01 AM, Jeff Janes wrote: >> >> Ben, did you ever figure out where the space was going? > > > > Now, here's where it gets weird. From the disk space usage wiki, > (http://wiki.postgresql.org/wiki/Disk_Usage) you'll recognize how I used it > to get a total disk space used result: > > with mytable AS ( > SELECT > nspname || '.' || relname AS "relation", > pg_total_relation_size(C.oid) AS "size" > FROM > pg_class C > LEFT JOIN pg_namespace N ON > (N.oid = C.relnamespace) > WHERE nspname NOT IN ('pg_catalog', 'information_schema') > AND C.relkind <> 'i' > AND nspname !~ '^pg_toast' > ORDER BY > pg_total_relation_size(C.oid) DESC > ) > SELECT sum(size) AS size FROM mytable I question the wisdom of that where clause (from the wiki) If the pg_catalog relations are big, then they are big and why shouldn't they get reported as such? And if they are not big, that is why the limit is there on the wiki page. I'm tempted to go change it, but maybe there is a good reason it is there which I do not understand. ... > > Google returns this page: > http://www.postgresql.org/docs/9.1/static/catalog-pg-attribute.html which > doesn't help me much. So, am I doing something wrong with admin? Our current > process is that every night in the middle of the night, a script connects to > each database on each server and runs a query to get all tables in each > database and, for each, run > > "VACUUM ANALYZE $table" > > for each table in the database. I take it your script that does that is not including the pg_catalog tables? Why not just run "vacuum analyze" and let it do the entire database? > I will note that autovacuum is off because it occasionally causes > transactions and queries to hang when an update causes a vacuum mid-day, > effectively taking us offline randomly. Hang as in they are blocking on locks? Or they just get slow because the autovacuum is consuming too much IO? Cheers, Jeff
On Wed, Nov 7, 2012 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Lists <lists@benjamindsmith.com> writes: > >> ... because it >> occasionally causes transactions and queries to hang when an update >> causes a vacuum mid-day, effectively taking us offline randomly. > > I suspect this claim is based on ancient and no longer very relevant > experience. My experience is that if autovac is causing problems with stalled queries etc you're either A: running ancient pg versions (pre 8.3), B: Running WAY too aggressive settings in autovac (100 threads, no nap time, cost limit of 100000 etc.) or C: Your IO subsystem is absolute crap. On any modern server, default autovac settings from 8.3 and on should only have the possible problem of not being tuned aggressively enough.
On Wed, Nov 7, 2012 at 1:58 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Wed, Nov 7, 2012 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Lists <lists@benjamindsmith.com> writes: >> >>> ... because it >>> occasionally causes transactions and queries to hang when an update >>> causes a vacuum mid-day, effectively taking us offline randomly. >> >> I suspect this claim is based on ancient and no longer very relevant >> experience. > > My experience is that if autovac is causing problems with stalled > queries etc you're either A: running ancient pg versions (pre 8.3), B: > Running WAY too aggressive settings in autovac (100 threads, no nap > time, cost limit of 100000 etc.) or C: Your IO subsystem is absolute > crap. > > On any modern server, default autovac settings from 8.3 and on should > only have the possible problem of not being tuned aggressively enough. Oh another failure scenario up there is that you're running DDL in production, which is stalling behind an autovac, and in turn the two are stalling other queries. This has happened for me once or twice on more modern versions (8.3 and 8.4)
On 11/07/2012 12:42 PM, Tom Lane wrote: > So you've turned off autovacuum, and are carefully not vacuuming the > system catalogs. That's your problem all right. Is there a > particularly good reason why this script isn't a one-liner "VACUUM"? Back in the 8.x days, we experienced "vacuum full analyze" occasionally causing other processes to hang/timeout. In an attempt to minimize the impact of the locking, we updated the script to vacuum one table at a time, which seemed to work well throughout the 8.x series. I'd happily accept that this conclusion may have simply have been wrong, but it worked well enough that nobody complained and life was good. After switching to 9.x, we read that the "full" vacuum was less useful and so the script was changed to "vacuum analyze $table" rather than "vacuum full analyze $table". > Are you sure that once-a-day vacuuming is sufficient, even if it was > covering the system catalogs? If you've managed to bloat pg_attribute > to 36GB, I suspect you've got enough churn (perhaps from temp tables) > that you really need the catalogs vacuumed more often. The only thing that I could find in the docs even mentioning the idea of vacuuming catalogs is this sentence: (A manual VACUUM should fix the problem, as suggested by the hint; but note that the VACUUM must be performed by a superuser, else it will fail to process system catalogs and thus not be able to advance the database's datfrozenxid.) http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html This does NOT clearly say that the end user could vacuum catalogs, let alone that it's necessary or even a good idea. Otherwise, the only mention is of tables, and there's no mention of the idea that tables are anything but user space. > My advice is dump, reload, and *don't* turn off autovacuum. > >> ... because it >> occasionally causes transactions and queries to hang when an update >> causes a vacuum mid-day, effectively taking us offline randomly. > I suspect this claim is based on ancient and no longer very relevant > experience. > We tried several times to turn on autovacuum with 9.1 and had problems every time. If our use case is particularly special, I'd love to work with you to get autovacuum to work in our situation too as it would make life easier for us! But for the past few months, every time we've turned it on, we've had our phones swamped with customers who are unable to use our system while our application monitors scream bloody murder, at least weekly. From what we could tell (under extreme pressure to get it all working again ASAP, mind you) it seemed that when doing a large update from within a transaction, autovacuum would get triggered before the transaction completed, causing the transaction to hang or at least slow way down, causing timeouts to occur with load balancers, so customers would then try again, compounding the ongoing problem. Pretty soon you have not only I/O issues, but also locking issues and upset customers. This issue may be compounded because we make fairly extensive use of dblink and temp tables to aggregate data for our customers who have multiple sites. -Ben
On 11/07/2012 12:58 PM, Scott Marlowe wrote: > My experience is that if autovac is causing problems with stalled > queries etc you're either A: running ancient pg versions (pre 8.3), B: > Running WAY too aggressive settings in autovac (100 threads, no nap > time, cost limit of 100000 etc.) or C: Your IO subsystem is absolute > crap. On any modern server, default autovac settings from 8.3 and on > should only have the possible problem of not being tuned aggressively > enough. A) We are running PG 9.1. B) We used the default settings in the RPMs provided by yum.postgresql.org. At the bottom of this message is information about the RPMs we currently are using. C) I/O subsystem for PG is twin SATA III SSDs in a RAID 1 configuration, capable of tens of thousands of IO operations per second. Servers are recent, SATA III, 16-core Xeons with 128 GB ECC RAM in 1U rackmount cases. As stated previously, we make extensive use of temp tables, transactions, and dblink, but had no trouble with catalog table bloat in 8.x; this is a new phenomenon for us. # rpm -qi postgresql91-9.1.5-3PGDG.rhel6.x86_64 Name : postgresql91 Relocations: (not relocatable) Version : 9.1.5 Vendor: (none) Release : 3PGDG.rhel6 Build Date: Sun 02 Sep 2012 12:13:18 PM UTC Install Date: Wed 12 Sep 2012 03:04:24 AM UTC Build Host: koji-sl6-x86-64-pg91 Group : Applications/Databases Source RPM: postgresql91-9.1.5-3PGDG.rhel6.src.rpm Size : 5193673 License: PostgreSQL Signature : DSA/SHA1, Sun 02 Sep 2012 12:13:20 PM UTC, Key ID 1f16d2e1442df0f8 URL : http://www.postgresql.org/ Summary : PostgreSQL client programs and libraries Description : PostgreSQL is an advanced Object-Relational database management system (DBMS) that supports almost all SQL constructs (including transactions, subselects and user-defined types and functions). The postgresql package includes the client programs and libraries that you'll need to access a PostgreSQL DBMS server. These PostgreSQL client programs are programs that directly manipulate the internal structure of PostgreSQL databases on a PostgreSQL server. These client programs can be located on the same machine with the PostgreSQL server, or may be on a remote machine which accesses a PostgreSQL server over a network connection. This package contains the command-line utilities for managing PostgreSQL databases on a PostgreSQL server. If you want to manipulate a PostgreSQL database on a local or remote PostgreSQL server, you need this package. You also need to install this package if you're installing the postgresql91-server package.
On 11/7/12 3:58 PM, Jeff Janes wrote: >> WHERE nspname NOT IN ('pg_catalog', 'information_schema') > > I question the wisdom of that where clause (from the wiki) > > If the pg_catalog relations are big, then they are big and why > shouldn't they get reported as such? And if they are not big, that is > why the limit is there on the wiki page. The idea was that in a new database with a relatively small number of tables, your own tables will be lost among the catalog data unless you filter them out. Testing against an install with a single real table, the query there will show something like this right now: relation | total_size ----------------+------------ public.t | 3568 kB public.t_k_seq | 8192 bytes But if the filter on pg_catalog is removed, you get this instead: relation | total_size -----------------------------+------------ public.t | 3568 kB pg_catalog.pg_depend | 808 kB pg_catalog.pg_proc | 752 kB pg_catalog.pg_attribute | 568 kB pg_catalog.pg_rewrite | 464 kB pg_catalog.pg_description | 392 kB pg_catalog.pg_statistic | 328 kB pg_catalog.pg_operator | 208 kB pg_catalog.pg_collation | 152 kB pg_catalog.pg_type | 152 kB pg_catalog.pg_amop | 136 kB pg_catalog.pg_class | 136 kB pg_catalog.pg_constraint | 112 kB pg_catalog.pg_conversion | 104 kB pg_catalog.pg_index | 88 kB pg_catalog.pg_amproc | 80 kB pg_catalog.pg_opclass | 80 kB pg_catalog.pg_ts_config_map | 80 kB pg_catalog.pg_cast | 80 kB pg_catalog.pg_authid | 72 kB That is overload for a lot of people, and confusing to new users. That's why I opted for the shorter version. There's no perfect answer to all use cases here. This sort of thing is why there's three sets of queries for pg_stat_user_tables, pg_stat_sys_tables, and pg_stat_all_tables. The wiki disk space queries aim to be like the user tables version from that trio. Adding a note pointing out that you might want to remove pg_catalog and see the size of those relations would be appropriate. I wouldn't make that the default case though, due to the issue highlighted above. I'd rather optimize the initially suggested query so that new users get simple output, even if it means that might hide problems on larger installs, where the catalog data became big. The other way I sometimes balance these two requirements--want to show all the big data, but not clutter small installs with the catalog--is to make the filter size-based instead: SELECT nspname || '.' || relname AS "relation", pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size" pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('information_schema') AND C.relkind <> 'i' AND nspname !~ '^pg_toast' AND relpages > 100 ORDER BY pg_total_relation_size(C.oid) DESC LIMIT 20; On my trivial test install that gives me just the one user table: relation | total_size ----------+------------ public.t | 3568 kB While still showing larger catalog tables if they grow to be noticeable. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
Jeff Janes <jeff.janes@gmail.com> writes: >> WHERE nspname NOT IN ('pg_catalog', 'information_schema') >> AND C.relkind <> 'i' >> AND nspname !~ '^pg_toast' > I question the wisdom of that where clause (from the wiki) > If the pg_catalog relations are big, then they are big and why > shouldn't they get reported as such? Agreed, please change it. (The index and toast exclusions are reasonable, since those will be accounted for in pg_total_relation_size of the parent. Personally I'd code the toast exclusion using relkind not a namespace check though.) regards, tom lane
On Wed, Nov 7, 2012 at 3:15 PM, Lists <lists@benjamindsmith.com> wrote: > On 11/07/2012 12:42 PM, Tom Lane wrote: >> >> So you've turned off autovacuum, and are carefully not vacuuming the >> system catalogs. That's your problem all right. Is there a >> particularly good reason why this script isn't a one-liner "VACUUM"? > > > Back in the 8.x days, we experienced "vacuum full analyze" occasionally > causing other processes to hang/timeout. That was your first mistake. By 8.0 the need for vacuum full was almost zero. Except for instances where bloat got out of hand, vacuum full should generally be avoided after 8.0. Regular vacuum should be plenty, whether it's run by autovacuum daemon or a cron job. Until 8.3 autovacuum was single threaded so therefore often had trouble keeping up with bloat. While vacuum full is a blocking operation plain vacuums are not, so unless you REALLY need a vacuum full they should be avoided. > In an attempt to minimize the > impact of the locking, we updated the script to vacuum one table at a time, > which seemed to work well throughout the 8.x series. I'd happily accept that > this conclusion may have simply have been wrong, but it worked well enough > that nobody complained and life was good. Yeah you still had blocking but it was probably less noticeable. > After switching to 9.x, we read > that the "full" vacuum was less useful and so the script was changed to > "vacuum analyze $table" rather than "vacuum full analyze $table". Yeah at that point you'd have been better off tuning autovacuum to be more aggressive and let it do the job. Generally the time to call vacuum by hand is right after you've done something like delete half the rows in a large table.
On 11/07/2012 12:42 PM, Tom Lane wrote: >> ... because it >> >occasionally causes transactions and queries to hang when an update >> >causes a vacuum mid-day, effectively taking us offline randomly. > I suspect this claim is based on ancient and no longer very relevant > experience. Even so, if I felt the need to keep autovacuum off, what would I need to run regularly in order to keep things neat and tidy under the hood? Would a simple "vacuum" within each database suffice? Should I be logged in as the database owner or as an administrative user? Thanks, Ben
Lists <lists@benjamindsmith.com> writes: > Even so, if I felt the need to keep autovacuum off, what would I need to > run regularly in order to keep things neat and tidy under the hood? > Would a simple "vacuum" within each database suffice? Should I be logged > in as the database owner or as an administrative user? A plain "vacuum" (or probably better, "vacuum analyze") done as superuser will suffice, as long as you do it often enough. regards, tom lane
On Thu, Nov 8, 2012 at 6:05 PM, Lists <lists@benjamindsmith.com> wrote: > On 11/07/2012 12:42 PM, Tom Lane wrote: >>> >>> ... because it >>> >occasionally causes transactions and queries to hang when an update >>> >causes a vacuum mid-day, effectively taking us offline randomly. >> >> I suspect this claim is based on ancient and no longer very relevant >> experience. > > Even so, if I felt the need to keep autovacuum off, what would I need to run > regularly in order to keep things neat and tidy under the hood? Would a > simple "vacuum" within each database suffice? Should I be logged in as the > database owner or as an administrative user? Just know that most of the time people think they need to turn off autovacuum they usually need to tune it instead. either more or less agressive depending on why they think they need to turn it off. If it's consuming too much IO then reduce cost limit / increase cost delay, if it's not aggressive enough, then reverse that and increase cost limit and decrease cost delay. If your IO subsystem can't keep up, then turning off autovacuum or turning it down simply be delaying the problem rather than solving it (i.e. throw more IO at it).
As I've spent a considerable amount of time trying to sort this out, I'm posting it for the benefit other users. I've experienced persistent, ongoing issues with autovacuum in a mixed read/write environment with midrange hardware (16 core Xeon, 128 GB RAM, 200 GB SATA3 6 Gb SSDs for disk I/O on RHEL6) using the defaults in the 9.1 RPMs provided by Postgres. (yum.postgresql.org) The cause of this is not yet determined. It may be related to the any or all of the combination of: A) extensive use of temp tables; B) extensive use of multiple dblink() calls in a single query; C) use of transactions, especially prepared transactions and multiple savepoints; D) concurrent use of pg_dump; E) use of numerous databases on a single server, average about 50; To offset this, we turned off autovacuum, and used an old script to vacuum the tables in the middle of the night when nobody was looking. Unfortunately, the vacuum script only vacuumed the "userland" tables and tremendous amounts of disk space were being wasted, particularly in the pg_attribute tables. However, use of any of the statements "vacuum analyze", "vacuum full analyze", "vacuum full verbose analyze" without mentioning specific tables did not resolve the extra disk space used issue, disk usage still remained at least 5x the expected amount in all cases. (in one case, use of all of these open-ended vacuum queries did almost nothing) Nor did running any variation of "vacuum analyze $table" in a loop thru all tables (including the pg_* tables) completely resolve the issue, either. In order to completely clean things up, we ended up writing a script do the following: 1) Determine the databases using excessive disk space, in descending order of use with this query: SELECT d.datname as Name, d.datistemplate::int AS datistemplate, pg_catalog.pg_get_userbyid(d.datdba) as Owner, CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_database_size(d.datname) ELSE -1 END as Size FROM pg_catalog.pg_database d order by CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_database_size(d.datname) ELSE NULL END desc nulls first; 2) For each database from #1, get a list of tables to be cleaned up with this query: SELECT nspname || '.' || relname AS "table", pg_total_relation_size(C.oid) AS "size" FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE (true or (nspname NOT IN ('pg_catalog', 'information_schema'))) AND C.relkind = 'r' AND nspname !~ '^pg_toast' ORDER BY pg_total_relation_size(C.oid) DESC; 3) For each of the tables from #2, run the commands REINDEX TABLE $table; VACUUM FULL ANALYZE $table; The end result is a squeaky-clean database server with expected disk usage. NOTES: 1) The above queries are derived from queries found to determine how much disk space was used, even though the additional information provided isn't actually used by the script. 2) It was sheer chance that I discovered the need to reindex prior to vacuum in order to get the disk space back. 3) I'd like to get autovacuum to work. I've read suggestions to tweak cost_delay and/or cost_limit. I haven't yet determined if the problem is I/O based or lock/deadlock based. I'm guessing the problem is the latter, though it's hard to tell because queries stack up quickly and load average is sky high when autovacuum fails for us. 4) The aforementioned process is S-L-O-W. Think at least hours and probably days depending on your databases, your server(s), and the load. 5) I don't yet know if the "full" option for the vacuum is necessary to free up all space. I will experiment with this and post results if useful.
On 11/09/2012 04:28 PM, Lists wrote: > As I've spent a considerable amount of time trying to sort this out, > I'm posting it for the benefit other users. Having missed the earlier conversation....a couple comments: > I've experienced persistent, ongoing issues with autovacuum in a mixed > read/write environment with midrange hardware (16 core Xeon, 128 GB > RAM, 200 GB SATA3 6 Gb SSDs for disk I/O on RHEL6) using the defaults > in the 9.1 RPMs provided by Postgres. (yum.postgresql.org) Don't do that. Defaults are good for ensuring that PostgreSQL will start on the widest reasonable variety of systems. They are *terrible* for performance and are certainly wrong for the system you describe. > The cause of this is not yet determined. It may be related to the any > or all of the combination of: > > A) extensive use of temp tables; > B) extensive use of multiple dblink() calls in a single query; > C) use of transactions, especially prepared transactions and > multiple savepoints; > D) concurrent use of pg_dump; > E) use of numerous databases on a single server, average about 50; > > To offset this, we turned off autovacuum, and used an old script to > vacuum the tables in the middle of the night when nobody was looking. > Unfortunately, the vacuum script only vacuumed the "userland" tables > and tremendous amounts of disk space were being wasted, particularly > in the pg_attribute tables. Bloat in pg_attribute would correlate with A) (or any constant creation/destruction of tables). You can vacuum and/or reindex the system tables if you are connected as the superuser but you are better off preventing bloat by appropriate adjustment of your configuration settings. However note that if you do frequent bulk creation/destruction of tables you could end up bloating the attribute table between vacuum runs and may need to periodically manually shrink it. Cheers, Steve
On Fri, Nov 9, 2012 at 5:28 PM, Lists <lists@benjamindsmith.com> wrote: > As I've spent a considerable amount of time trying to sort this out, I'm > posting it for the benefit other users. SNIP > D) concurrent use of pg_dump; Not usually a problem, unless it's overloading your IO subsystem. > C) use of transactions, especially prepared transactions and multiple > savepoints; > E) use of numerous databases on a single server, average about 50; These two can be killers. Long running transactions can cause autovacuum processes to stall out or be autocancelled. As well, since the default nap time is 1 minute, it will take at least 50 minutes to vacuum each db as nap time is how long autovac waits between databases. Reducing autovacuum nap time to 5 or 10 seconds would be a good move here, also possibly making it more aggressive by increasing max worker threads, decreasing cost delay (possibly to zero or close to it) and / or increasing cost limit. After making such a change then watching iostat when vacuum is running to see how hard its hitting your IO subsystem. I'm guessing that with SSDs it isn't gonna be a big problem. As Greg Smith has pointed out in the past, usually the answer to an autovacuum problem is making it more, not less aggressive. Unless you're flooding your IO this is almost always the right answer. Keep in mind that autovacuum by default is setup to be VERY unaggressive because it may be running on a netbook for all it knows. To tune autovacuum with 50 databases, start by dropping nap time to something much lower, like 10s. Then if you need to, drop cost delay until you get to 0. If you get to 0 and it's still not hitting your IO too hard, but not keeping up, then increase cost limit. If you get to something in the 5000 to 10000 range, and its still not keeping up then start bumping the thread count
Scott -- <...> >To tune autovacuum with 50 databases, start by dropping nap time to >something much lower, like 10s. Then if you need to, drop cost delay >until you get to 0. If you get to 0 and it's still not hitting your >IO too hard, but not keeping up, then increase cost limit. If you get >to something in the 5000 to 10000 range, and its still not keeping up >then start bumping the thread count > Thanks for outlining a strategy on this -- useful advice. Greg Williamson
On Fri, Nov 9, 2012 at 5:56 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > > As well, since the default nap time is 1 minute, it will take at least > 50 minutes to vacuum each db as nap time is how long autovac waits > between databases. That isn't how it works. The naptime is per database, not per cluster. If the naptime is 1 minute and there are 50 "active" databases, then it will launch a new worker every 1.2 seconds (assuming the old ones finish fast enough that doing so would not exceed autovacuum_max_workers) Cheers, Jeff
On Sat, Nov 10, 2012 at 2:12 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Fri, Nov 9, 2012 at 5:56 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> >> As well, since the default nap time is 1 minute, it will take at least >> 50 minutes to vacuum each db as nap time is how long autovac waits >> between databases. > > That isn't how it works. The naptime is per database, not per > cluster. If the naptime is 1 minute and there are 50 "active" > databases, then it will launch a new worker every 1.2 seconds > (assuming the old ones finish fast enough that doing so would not > exceed autovacuum_max_workers) Hmmm. That was not my understanding from previous discussions on nap time.
On 11/10/2012 02:08 PM, Scott Marlowe wrote: > On Sat, Nov 10, 2012 at 2:12 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> On Fri, Nov 9, 2012 at 5:56 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >>> >>> As well, since the default nap time is 1 minute, it will take at least >>> 50 minutes to vacuum each db as nap time is how long autovac waits >>> between databases. >> >> That isn't how it works. The naptime is per database, not per >> cluster. If the naptime is 1 minute and there are 50 "active" >> databases, then it will launch a new worker every 1.2 seconds >> (assuming the old ones finish fast enough that doing so would not >> exceed autovacuum_max_workers) > > Hmmm. That was not my understanding from previous discussions on nap time. > > When in doubt there are the docs:) http://www.postgresql.org/docs/9.1/interactive/routine-vacuuming.html#AUTOVACUUM "The "autovacuum daemon" actually consists of multiple processes. There is a persistent daemon process, called the autovacuum launcher, which is in charge of starting autovacuum worker processes for all databases. The launcher will distribute the work across time, attempting to start one worker within each database every autovacuum_naptime seconds. (Therefore, if the installation has N databases, a new worker will be launched every autovacuum_naptime/N seconds.)" -- Adrian Klaver adrian.klaver@gmail.com
On Fri, Nov 9, 2012 at 4:28 PM, Lists <lists@benjamindsmith.com> wrote: ... > 3) For each of the tables from #2, run the commands > REINDEX TABLE $table; > VACUUM FULL ANALYZE $table; > > The end result is a squeaky-clean database server with expected disk usage. > > NOTES: ... > > > 2) It was sheer chance that I discovered the need to reindex prior to vacuum > in order to get the disk space back. As of 9.0, a "vacuum full" inherently does a reindex, so doing an explicit one is neither necessary nor beneficial. I don't know if your discovery is based on a non-full vacuum, or on an older server. > 5) I don't yet know if the "full" option for the vacuum is necessary to free > up all space. I will experiment with this and post results if useful. The answer to this is mostly non-deterministic. non-full vacuum can only free space from the "end" of the table. If all of your long-lived objects were created before pg_attribute got bloated and so the bloat was due only to short-lived objects, then non-full vacuum (if run often enough) should eventually be able to return that space as the short-lived objects near the end start to go away. However, if even a single long-live object finds itself at the end of the table, then only a vacuum full will ever be able to reclaim that space. Cheers, Jeff
On Sat, Nov 10, 2012 at 3:20 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote: > On 11/10/2012 02:08 PM, Scott Marlowe wrote: >> >> On Sat, Nov 10, 2012 at 2:12 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >>> >>> On Fri, Nov 9, 2012 at 5:56 PM, Scott Marlowe <scott.marlowe@gmail.com> >>> wrote: >>>> >>>> >>>> As well, since the default nap time is 1 minute, it will take at least >>>> 50 minutes to vacuum each db as nap time is how long autovac waits >>>> between databases. >>> >>> >>> That isn't how it works. The naptime is per database, not per >>> cluster. If the naptime is 1 minute and there are 50 "active" >>> databases, then it will launch a new worker every 1.2 seconds >>> (assuming the old ones finish fast enough that doing so would not >>> exceed autovacuum_max_workers) >> >> >> Hmmm. That was not my understanding from previous discussions on nap >> time. >> >> > > > When in doubt there are the docs:) > > http://www.postgresql.org/docs/9.1/interactive/routine-vacuuming.html#AUTOVACUUM > > "The "autovacuum daemon" actually consists of multiple processes. There is a > persistent daemon process, called the autovacuum launcher, which is in > charge of starting autovacuum worker processes for all databases. The > launcher will distribute the work across time, attempting to start one > worker within each database every autovacuum_naptime seconds. (Therefore, if > the installation has N databases, a new worker will be launched every > autovacuum_naptime/N seconds.)" And apparently it wasn't always this way: http://www.postgresql.org/docs/8.2/interactive/routine-vacuuming.html#AUTOVACUUM The autovacuum daemon, when enabled, runs every autovacuum_naptime seconds. On each run, it selects one database to process and checks each table within that database. VACUUM or ANALYZE commands are issued as needed.
On 11/10/2012 02:23 PM, Scott Marlowe wrote: > >> >> When in doubt there are the docs:) >> >> http://www.postgresql.org/docs/9.1/interactive/routine-vacuuming.html#AUTOVACUUM >> >> "The "autovacuum daemon" actually consists of multiple processes. There is a >> persistent daemon process, called the autovacuum launcher, which is in >> charge of starting autovacuum worker processes for all databases. The >> launcher will distribute the work across time, attempting to start one >> worker within each database every autovacuum_naptime seconds. (Therefore, if >> the installation has N databases, a new worker will be launched every >> autovacuum_naptime/N seconds.)" > > And apparently it wasn't always this way: > > http://www.postgresql.org/docs/8.2/interactive/routine-vacuuming.html#AUTOVACUUM > > The autovacuum daemon, when enabled, runs every autovacuum_naptime > seconds. On each run, it selects one database to process and checks > each table within that database. VACUUM or ANALYZE commands are issued > as needed. Seems to have changed in 8.3: http://www.postgresql.org/docs/8.3/interactive/routine-vacuuming.html "Beginning in PostgreSQL 8.3, autovacuum has a multiprocess architecture: There is a daemon process, called the autovacuum launcher, which is in charge of starting autovacuum worker processes for all databases. The launcher will distribute the work across time, but attempt to start one worker on each database every autovacuum_naptime seconds. One worker will be launched for each database, with a maximum of autovacuum_max_workers processes running at the same time..." > > -- Adrian Klaver adrian.klaver@gmail.com
On 11/09/2012 05:26 PM, Steve Crawford wrote: > Bloat in pg_attribute would correlate with A) (or any constant > creation/destruction of tables). You can vacuum and/or reindex the > system tables if you are connected as the superuser but you are better > off preventing bloat by appropriate adjustment of your configuration > settings. However note that if you do frequent bulk > creation/destruction of tables you could end up bloating the attribute > table between vacuum runs and may need to periodically manually shrink > it. > Steve, Our system divides customers into distinct databases, however customers are often clustered. (Think: different locations of 7/11) and so we have to aggregate data from different databases. We do this with dblink to get the data and temp tables to collate it, which appears to be a cause of the bloat we're seeing. -Ben
On 11/10/2012 02:21 PM, Jeff Janes wrote: > On Fri, Nov 9, 2012 at 4:28 PM, Lists <lists@benjamindsmith.com> wrote: > > > ... > >> 3) For each of the tables from #2, run the commands >> REINDEX TABLE $table; >> VACUUM FULL ANALYZE $table; >> >> The end result is a squeaky-clean database server with expected disk usage. >> >> NOTES: > ... >> >> 2) It was sheer chance that I discovered the need to reindex prior to vacuum >> in order to get the disk space back. > As of 9.0, a "vacuum full" inherently does a reindex, so doing an > explicit one is neither necessary nor beneficial. > > I don't know if your discovery is based on a non-full vacuum, or on an > older server. I can only state that merely doing a "vacuum full" or "vacuum full $tables" sequentially did not free the space, whereas the sequential reindex $table, each followed immediately by a vacuum full $table) did. If you'd like I can easily recreate the scenario by simply not "cleaning up" one of the DB servers until it bloats up and make available (limit distribution) a binary copy of the database (EG: rsync the /var/lib/pgsql/ filesystem late at night) in order to help identify why it didn't work as expected. > >> 5) I don't yet know if the "full" option for the vacuum is necessary to free >> up all space. I will experiment with this and post results if useful. > The answer to this is mostly non-deterministic. non-full vacuum can > only free space from the "end" of the table. > > If all of your long-lived objects were created before pg_attribute got > bloated and so the bloat was due only to short-lived objects, then > non-full vacuum (if run often enough) should eventually be able to > return that space as the short-lived objects near the end start to go > away. However, if even a single long-live object finds itself at the > end of the table, then only a vacuum full will ever be able to reclaim > that space. > Since the time period involved (weeks/months) would have included both a large number of created/destroyed temp tables and occasionally altered persistent objects it would appear that the full option a very good idea, at least periodically. -Ben
The good news is that we have now resolved our critical problem (disk space overuse) with a somewhat hackish, slow answer that is nonetheless good enough for now.
Now I'd like to work out how to get autovacuum to work smoothly within our cluster. I'm happy to try to clarify my notes and post them either here or on the PG wiki as I get them resolved.
There's a wealth of how to tune PG instruction that's old and (based on this thread alone) often stale enough to be classified as disinformative. For example, nearest I can tell, the entirety of this page is just wrong and/or irrelevant for 9.x and up: http://wiki.postgresql.org/wiki/VACUUM_FULL. In constrast to the advice to "avoid Vacuum Full" ( http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT) comes the insight that vacuum full is necessary to clean up stale data that is not at the "end" of the table. (See Jeff Janes 11/10/2012 email)
On 11/10/2012 02:49 PM, Adrian Klaver wrote:
Now I'd like to work out how to get autovacuum to work smoothly within our cluster. I'm happy to try to clarify my notes and post them either here or on the PG wiki as I get them resolved.
There's a wealth of how to tune PG instruction that's old and (based on this thread alone) often stale enough to be classified as disinformative. For example, nearest I can tell, the entirety of this page is just wrong and/or irrelevant for 9.x and up: http://wiki.postgresql.org/wiki/VACUUM_FULL. In constrast to the advice to "avoid Vacuum Full" ( http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT) comes the insight that vacuum full is necessary to clean up stale data that is not at the "end" of the table. (See Jeff Janes 11/10/2012 email)
This would imply that a full analyze is a good idea, at least periodically (weekly/monthly/quarterly) in a database that combines the use of temp tables and periodic changes to persistent objects. Does autovacuum ever do a "full" analyze"? What about autovacuum and the reindex question at the end of this email?non-full vacuum can only free space from the "end" of the table.
On 11/10/2012 02:49 PM, Adrian Klaver wrote:
Seems to have changed in 8.3:
http://www.postgresql.org/docs/8.3/interactive/routine-vacuuming.html
"Beginning in PostgreSQL 8.3, autovacuum has a multiprocess architecture: There is a daemon process, called the autovacuum launcher, which is in charge of starting autovacuum worker processes for all databases. The launcher will distribute the work across time, but attempt to start one worker on each database every autovacuum_naptime seconds. One worker will be launched for each database, with a maximum of autovacuum_max_workers processes running at the same time..."
Sadly, this change means that I can no be certain of the utility of the otherwise excellent-sounding advice originally offered by Scott, quoted below. It appears that naptime is (as of 9.x) almost irrelevant since it's defined per database, and dropping this from 1 minute to 5 seconds would have very little overall impact.
"Long running transactions" - is now long? In our system it's rare to have a transaction (even a prepared transaction) last much longer than a few minutes. Is that enough time to cause problems with AutoVacuum?These two can be killers. Long running transactions can cause autovacuum processes to stall out or be autocancelled.
As well, since the default nap time is 1 minute, it will take at least 50 minutes to vacuum each db as nap time is how long autovac waits between databases. Reducing autovacuum nap time to 5 or 10 seconds would be a good move here, also possibly making it more aggressive by increasing max worker threads, decreasing cost delay (possibly to zero or close to it) and / or increasing cost limit. After making such a change then watching iostat when vacuum is running to see how hard its hitting your IO subsystem. I'm guessing that with SSDs it isn't gonna be a big problem. As Greg Smith has pointed out in the past, usually the answer to an autovacuum problem is making it more, not less aggressive. Unless you're flooding your IO this is almost always the right answer. Keep in mind that autovacuum by default is setup to be VERY unaggressive because it may be running on a netbook for all it knows. To tune autovacuum with 50 databases, start by dropping nap time to something much lower, like 10s. Then if you need to, drop cost delay until you get to 0. If you get to 0 and it's still not hitting your IO too hard, but not keeping up, then increase cost limit. If you get to something in the 5000 to 10000 range, and its still not keeping up then start bumping the thread count
Should I increase the max_workers field from the default of 3 to (perhaps) 10? Noting that my solution to the disk space problem is effectively a max_worker of 1 since it's all done sequentially, I wonder if reducing max_workers would actually be better?
Also, what's the "thread count" ? Is that max_workers?
Why would I want to reduce the cost delay to 0, and how does this relate to cost_limit? Careful reading of the docs: http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html makes me believe that, given my substantial I/O subsystem, I'd want to drop cost_delay to near zero and set the cost_limit really high, which is a rough restatement of the last quoted paragraph above. (I think)
Assuming that I make these suggestions and notice a subsequent system load problem, what information should I be gathering in order to provide better post-incident forensics? We have statistics turned on, and aren't using replication. (yet)
Lastly, there's the question of reindexing before full vacuum. I've observed that not doing a manual reindex prior to vacuum full did not, in fact, free up the space, even though I've been told that reindex is implicit in the vacuum process. (?!) I'm confident that I can reproduce this behavior given a bit of time to allow one of our database servers to bloat back up.
-Ben
On Mon, Nov 12, 2012 at 10:38 AM, Lists <lists@benjamindsmith.com> wrote: > On 11/10/2012 02:21 PM, Jeff Janes wrote: >> >> On Fri, Nov 9, 2012 at 4:28 PM, Lists <lists@benjamindsmith.com> wrote: >> >>> >>> 2) It was sheer chance that I discovered the need to reindex prior to >>> vacuum >>> in order to get the disk space back. >> >> As of 9.0, a "vacuum full" inherently does a reindex, so doing an >> explicit one is neither necessary nor beneficial. >> >> I don't know if your discovery is based on a non-full vacuum, or on an >> older server. > > > I can only state that merely doing a "vacuum full" or "vacuum full $tables" > sequentially did not free the space, whereas the sequential reindex $table, > each followed immediately by a vacuum full $table) did. With what version? > If you'd like I can > easily recreate the scenario by simply not "cleaning up" one of the DB > servers until it bloats up and make available (limit distribution) a binary > copy of the database (EG: rsync the /var/lib/pgsql/ filesystem late at > night) in order to help identify why it didn't work as expected. Do you think can make an easily script-able way to re-create the resistant bloat? That would be better than trying to disseminate binary files, I think. What I did was just create and drop temp tables in a tight loop, with autovacuum off, and then once pg_attribute got good and bloated, did a vacuum full as the database owner or superuser. >> >> If all of your long-lived objects were created before pg_attribute got >> bloated and so the bloat was due only to short-lived objects, then >> non-full vacuum (if run often enough) should eventually be able to >> return that space as the short-lived objects near the end start to go >> away. However, if even a single long-live object finds itself at the >> end of the table, then only a vacuum full will ever be able to reclaim >> that space. >> > > Since the time period involved (weeks/months) would have included both a > large number of created/destroyed temp tables and occasionally altered > persistent objects it would appear that the full option a very good idea, at > least periodically. If you can prevent the extreme bloat from occurring in the first place, then the "end" of the table would not be so far away from its desired size that it needs to get reset by a vacuum full. If you find your self in need of a vacuum full, then you should do one. But you should ask yourself what went wrong that you got into that situation in the first place. Cheers, Jeff
On 11/12/2012 01:31 PM, Jeff Janes wrote: > On Mon, Nov 12, 2012 at 10:38 AM, Lists <lists@benjamindsmith.com> wrote: >> On 11/10/2012 02:21 PM, Jeff Janes wrote: >>> On Fri, Nov 9, 2012 at 4:28 PM, Lists <lists@benjamindsmith.com> wrote: >>> >>>> 2) It was sheer chance that I discovered the need to reindex prior to >>>> vacuum >>>> in order to get the disk space back. >>> As of 9.0, a "vacuum full" inherently does a reindex, so doing an >>> explicit one is neither necessary nor beneficial. >>> >>> I don't know if your discovery is based on a non-full vacuum, or on an >>> older server. >> >> I can only state that merely doing a "vacuum full" or "vacuum full $tables" >> sequentially did not free the space, whereas the sequential reindex $table, >> each followed immediately by a vacuum full $table) did. > With what version? [root@alpha ~]# rpm -qi postgresql91-server Name : postgresql91-server Relocations: (not relocatable) Version : 9.1.5 Vendor: (none) Release : 3PGDG.rhel6 Build Date: Sun 02 Sep 2012 12:13:18 PM UTC Install Date: Wed 12 Sep 2012 03:04:41 AM UTC Build Host: koji-sl6-x86-64-pg91 Group : Applications/Databases Source RPM: postgresql91-9.1.5-3PGDG.rhel6.src.rpm Size : 15191132 License: PostgreSQL Signature : DSA/SHA1, Sun 02 Sep 2012 12:13:24 PM UTC, Key ID 1f16d2e1442df0f8 URL : http://www.postgresql.org/ > > >> If you'd like I can >> easily recreate the scenario by simply not "cleaning up" one of the DB >> servers until it bloats up and make available (limit distribution) a binary >> copy of the database (EG: rsync the /var/lib/pgsql/ filesystem late at >> night) in order to help identify why it didn't work as expected. > Do you think can make an easily script-able way to re-create the > resistant bloat? That would be better than trying to disseminate > binary files, I think. It would only be better if it actually created the situation that caused the space to not be freed. But, until you know the actual cause of a problem, I've found that it's often not productive to create simulations that may or may not be actually related to the problem. > > What I did was just create and drop temp tables in a tight loop, with > autovacuum off, and then once pg_attribute got good and bloated, did a > vacuum full as the database owner or superuser. Based on my understanding, if your loop included an intermittent schema change from within a transaction it might better approximate my actual scenario. Merely creating temp tables and then dropping them would create lots of activity "at the end" of the table which would free correctly. This still does not explain why reindex $table works when reindex is supposedly implicit in the vacuum. >>> If all of your long-lived objects were created before pg_attribute got >>> bloated and so the bloat was due only to short-lived objects, then >>> non-full vacuum (if run often enough) should eventually be able to >>> return that space as the short-lived objects near the end start to go >>> away. However, if even a single long-live object finds itself at the >>> end of the table, then only a vacuum full will ever be able to reclaim >>> that space. >>> >> Since the time period involved (weeks/months) would have included both a >> large number of created/destroyed temp tables and occasionally altered >> persistent objects it would appear that the full option a very good idea, at >> least periodically. > If you can prevent the extreme bloat from occurring in the first > place, then the "end" of the table would not be so far away from its > desired size that it needs to get reset by a vacuum full. > > If you find your self in need of a vacuum full, then you should do > one. But you should ask yourself what went wrong that you got into > that situation in the first place. > I agree; this is why my questions on enabling autovacuum in a related thread.
On 11/13/2012 04:04 AM, Lists wrote:
Agreed, that needs fixing. I'll sort that out. That information was important before the VACUUM FULL rewrite, but is now severely outdated. It needs to be split into pre-9.1 and 9.1+ sections.
There's a wealth of how to tune PG instruction that's old and (based on this thread alone) often stale enough to be classified as disinformative. For example, nearest I can tell, the entirety of this page is just wrong and/or irrelevant for 9.x and up: http://wiki.postgresql.org/wiki/VACUUM_FULL.
That part isn't quite right AFAIK.In constrast to the advice to "avoid Vacuum Full" ( http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT) comes the insight that vacuum full is necessary to clean up stale data that is not at the "end" of the table. (See Jeff Janes 11/10/2012 email)
If you keep table bloat under control, ordinary VACCUM is perfectly sufficient. You only need/want to VACUUM FULL if you wish to truncate a table, reducing its size by compacting rows. In an actively used database that's usually pretty pointless, since new rows will then be added to the end, growing the table. You might as well just keep on re-using the space, rather than going through those compact-and-expand cycles. You'll suffer from less file fragmentation that way and won't be paying the costs of file system allocations.
If you have a table that you've dramatically reduced in size (say, by deleting the vast majority of it) and you won't be adding more rows to replace the old ones, that's when VACUUM FULL makes sense.
It's a bit like those utilities that claim to "free" or "clean" or "de-fragment" memory. They seem good, but they're actually grossly counter-productive, because the system then has to re-read cached data and otherwise fight to get back to its old equilibrium. It's typically the same for Pg: you want to aim for equilibrium, not free space that'll just promptly get re-allocated.
If you do have a bad bloat problem, I'd set a non-default FILLFACTOR before doing a VACUUM FULL, so you still have some free space within the table after vacuum completes. That way you won't be immediately paying the cost of allocating space for new rows as soon as any UPDATEs or INSERTs come in.
I'm not sure there's any such thing as a full analyze. "VACUUM FULL ANALYZE" is "Do a VACUUM FULL and an ANALYZE", not "Do a full analyze".This would imply that a full analyze is a good idea, at least periodically (weekly/monthly/quarterly) in a database that combines the use of temp tables and periodic changes to persistent objects. Does autovacuum ever do a "full" analyze"? What about autovacuum and the reindex question at the end of this email?non-full vacuum can only free space from the "end" of the table.
Autovacuum should be taking care of analyze and table statistics. If it isn't, adjust autovacuum parameters so that it does.
"Long running transactions" - is now long? In our system it's rare to have a transaction (even a prepared transaction) last much longer than a few minutes. Is that enough time to cause problems with AutoVacuum?These two can be killers. Long running transactions can cause autovacuum processes to stall out or be autocancelled.
Not generally, no.
PostgreSQL can't clean up rows that are still visible to a transaction. So if your transactions are three minutes long, that's a three minute delay before VACUUM can clean up DELETEd rows or dead rows left by UPDATEs. Not a biggie even on a pretty high load DB.
You should generally be concerned only when transactions are open over "user think time" or are abandoned by buggy applications - cases where the transaction length is many minutes or hours, potentially unbounded. Uncommitted prepared transactions are also a problem for similar reasons.
Lastly, there's the question of reindexing before full vacuum. I've observed that not doing a manual reindex prior to vacuum full did not, in fact, free up the space, even though I've been told that reindex is implicit in the vacuum process. (?!)
VACUUM FULL, post-9.1, should take care of index bloat.
Pre-9.1 VACUUM FULL could make index bloat worse.
Ordinary VACUUM will not truncate indexes AFAIK, only mark free space within them so it can be re-used. Same deal as with the table its self: this is usually what you want.
-- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 11/13/2012 10:29 AM, Craig Ringer wrote:
Correction - pre-9.0 and 9.0+On 11/13/2012 04:04 AM, Lists wrote:Agreed, that needs fixing. I'll sort that out. That information was important before the VACUUM FULL rewrite, but is now severely outdated. It needs to be split into pre-9.1 and 9.1+ sections.
There's a wealth of how to tune PG instruction that's old and (based on this thread alone) often stale enough to be classified as disinformative. For example, nearest I can tell, the entirety of this page is just wrong and/or irrelevant for 9.x and up: http://wiki.postgresql.org/wiki/VACUUM_FULL.
-- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Nov 12, 2012 at 12:04 PM, Lists <lists@benjamindsmith.com> wrote: > > > Should I increase the max_workers field from the default of 3 to (perhaps) > 10? I would not. You report that the reason you turned off autovac is because it made your database crawl when it kicked in. That suggests that if anything you should reduce that parameter (assuming you are still doing manual vacuums at off-peak hours, leaving autovacuum to only mop up what is left). > > Why would I want to reduce the cost delay to 0, and how does this relate to > cost_limit? Careful reading of the docs: > http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html makes > me believe that, given my substantial I/O subsystem, I'd want to drop > cost_delay to near zero and set the cost_limit really high, which is a rough > restatement of the last quoted paragraph above. (I think) Given that autovac kicking in destroys your performance, I think that your I/O subsystem may not be all that you think it is. Do you have test/dev/QA system with the same subsystem that you can use for investigation? If so, do you have a vaguely realistic load generator to drive those systems? > Assuming that I make these suggestions and notice a subsequent system load > problem, what information should I be gathering in order to provide better > post-incident forensics? If you are going to be focusing your undivided attention on monitoring the system during the period, just keeping a window open with "top" running is invaluable. (On most implementations, if you hit 'c' it will toggle the command display so you can see the results of "update_process_title=on") Also, "sar" is useful, and on most systems has the advantage that its stats are always being gathered without you having to do anything, so it works well for unexpected problems arising. I often just have "vmstat 1 -t" running in the background streaming into a log file, for the same reason. For internal to pgsql, set log_min_duration_statement to a value which few statements will exceed under normal operations, but many will when things bog down. That way you can figure out exactly when things bogged down after unattended operation, to correlate it with the sar/vmstat/etc reports. I'd also set for the probationary period (if you haven't already): log_lock_waits = on log_checkpoints = on log_autovacuum_min_duration = 0 (or some smallish positive value) The last one only logs when it finishes vacuuming a table. I wish there was a way to make it log when it started as well, but I don't think there is. Cheers, Jeff