Обсуждение: Unexpectedly high disk space usage

Поиск
Список
Период
Сортировка

Unexpectedly high disk space usage

От
Lists
Дата:
We upgraded to 9.1 from 8.4 over the summer a few months ago, to new DB
servers with more disk space and memory. Unexpectedly, the DB servers
have steadily increased their disk space usage since. Reported system
load doesn't seem to be affected. It's happening to all our DB servers
running 9.1.

When we reload all pg_dumps from our worst-affected server into an
offline server, the disk space usage is about 26 GB, but the production
database is using 166 GB. (# df /var/lib/pgsql;)

To resolve this, we've tried:

1) reindexed everything (cut about 10% of disk usage temporarily)

2) tried vacuum full, and vacuum analyze on all databases. (to minimal
effect)

3) Restarting PG (no discernable effect) including a full stop/start.

4) We've looked for stale prepared transactions (none found)

5) instructions from the wiki to try to determine what the cause of all
the disk usage is: http://wiki.postgresql.org/wiki/Disk_Usage but when
we add up all the results for all the different databases, tables,
indexes, etc. in a script, we get a number very close to the usage of
the freshly loaded server. (24 GB)

What  is Postgres doing with ~ 80% of its disk space usage? This is not
normal, is it? I would hate to have to take the servers off line just to
dump/restore in order to bring disk usage back to normal...


SYSTEM SPECS:
I've attached the postgresql.conf on a RHEL6/64 Linux server with 128 GB
Of RAM and 16 real CPU cores. (HT turned on, 32 CPUs according to the O/S)

#FROM: sysctl.conf:
# Controls the maximum shared segment size, in bytes
kernel.shmmax = 136365211648
# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 4294967296


-Ben

Вложения

Re: Unexpectedly high disk space usage

От
Scott Marlowe
Дата:
On Mon, Nov 5, 2012 at 8:01 PM, Lists <lists@benjamindsmith.com> wrote:
> We upgraded to 9.1 from 8.4 over the summer a few months ago, to new DB
> servers with more disk space and memory. Unexpectedly, the DB servers have
> steadily increased their disk space usage since. Reported system load
> doesn't seem to be affected. It's happening to all our DB servers running
> 9.1.
>
> When we reload all pg_dumps from our worst-affected server into an offline
> server, the disk space usage is about 26 GB, but the production database is
> using 166 GB. (# df /var/lib/pgsql;)
>
> To resolve this, we've tried:
>
> 1) reindexed everything (cut about 10% of disk usage temporarily)
>
> 2) tried vacuum full, and vacuum analyze on all databases. (to minimal
> effect)
>
> 3) Restarting PG (no discernable effect) including a full stop/start.
>
> 4) We've looked for stale prepared transactions (none found)
>
> 5) instructions from the wiki to try to determine what the cause of all the
> disk usage is: http://wiki.postgresql.org/wiki/Disk_Usage but when we add up
> all the results for all the different databases, tables, indexes, etc. in a
> script, we get a number very close to the usage of the freshly loaded
> server. (24 GB)

What does du -sh have to say about it?  Use unix tools to examine your
file system and see where the usage is going.  For instance, I can do
this:

cd /var/lib/postgresql/8.4/main/
du -s *|sort -n
0    server.crt
0    server.key
4    pg_tblspc
4    pg_twophase
4    PG_VERSION
4    postmaster.opts
4    postmaster.pid
12    pg_clog
12    pg_stat_tmp
12    pg_subtrans
28    pg_multixact
460    global
16392    pg_xlog
16396    base

which tells me that I'm using about 16MB for each pg_xlog and base.  I
can then do cd into base and look around:

cd base
du -s *|sort -n
5416    1
5416    11563
5560    11564

Which shows me using about 5MB each for three different dbs.

And so on.

On an off guess, did you go from a SQL_ASCII encoding to UTF8?  That
might increase disk space usage a bit.


Re: Unexpectedly high disk space usage

От
Jeff Janes
Дата:
On Mon, Nov 5, 2012 at 7:01 PM, Lists <lists@benjamindsmith.com> wrote:
> We upgraded to 9.1 from 8.4 over the summer a few months ago, to new DB
> servers with more disk space and memory. Unexpectedly, the DB servers have
> steadily increased their disk space usage since. Reported system load
> doesn't seem to be affected. It's happening to all our DB servers running
> 9.1.
>
> When we reload all pg_dumps from our worst-affected server into an offline
> server, the disk space usage is about 26 GB, but the production database is
> using 166 GB. (# df /var/lib/pgsql;)

My first guesses would be things that are not directly under the
databases control, such as:

1) your server logs are accumulating and you aren't doing anything about them
2) you are taking backup snapshots to somewhere in that directory and
not cleaning them up
3) your archive_command is failing (which you should see reports of in
the server logs) and so you are accumulating xlog files.

Cheers,

Jeff


Re: Unexpectedly high disk space usage

От
Lists
Дата:
Jeff, thanks for the feedback!

On 11/05/2012 08:51 PM, Jeff Janes wrote:
> My first guesses would be things that are not directly under the
> databases control, such as:
>
> 1) your server logs are accumulating and you aren't doing anything about them

I'm guessing that this is not the case:

[root@delta data]# du -shc *  | grep -i log
47M     pg_clog
15M     pg_log
641M    pg_xlog

> 2) you are taking backup snapshots to somewhere in that directory and
> not cleaning them up

Our backup snapshots (taken with pg_dump) are taken on a different
server over the network. Dumps are made several times during each day.
Could this be part of the problem if (somehow) they didn't complete? And
if so, would there be some cleanup I'd have to do other than restarting PG?

> 3) your archive_command is failing (which you should see reports of in
> the server logs) and so you are accumulating xlog files.

As I understand things, the result above under 1) demonstrates that
this, also, is not the cause.



Re: Unexpectedly high disk space usage

От
Lists
Дата:
I followed your example, the result is at the bottom. Based on this it
would seem that there are 3-4 databases that seem to be the culprit. How
could I get more depth/detail on what specifically is the problem?

-Ben

On 11/05/2012 07:10 PM, Scott Marlowe wrote:
> What does du -sh have to say about it? Use unix tools to examine your
> file system and see where the usage is going. For instance, I can do
> this: cd /var/lib/postgresql/8.4/main/ du -s *|sort -n 0 server.crt 0
> server.key 4 pg_tblspc 4 pg_twophase 4 PG_VERSION 4 postmaster.opts 4
> postmaster.pid 12 pg_clog 12 pg_stat_tmp 12 pg_subtrans 28
> pg_multixact 460 global 16392 pg_xlog 16396 base which tells me that
> I'm using about 16MB for each pg_xlog and base. I can then do cd into
> base and look around: cd base du -s *|sort -n 5416 1 5416 11563 5560
> 11564 Which shows me using about 5MB each for three different dbs. And
> so on. On an off guess, did you go from a SQL_ASCII encoding to UTF8?
> That might increase disk space usage a bit.

[root@delta ~]# cd /var/lib/pgsql/9.1/data/
[root@delta data]# du -s * | sort -n
4       pg_ident.conf
4       pg_serial
4       pg_tblspc
4       PG_VERSION
4       postmaster.opts
4       postmaster.pid
8       pg_hba.conf
12      pg_notify
12      pg_twophase
20      postgresql.300
20      postgresql.conf
20      postgresql.conf.20120903
20      postgresql.conf.300
76      pg_subtrans
104     pg_multixact
15044   pg_log
18184   global
25216   pg_stat_tmp
47916   pg_clog
671916 pg_xlog
164753204       base

[root@delta data]# cd base
[root@delta base]# du -s * | sort -n
4       pgsql_tmp
6124    12772
6388    12780
6424    1
72424   331506
72700   160676
72896   391655
73200   52389
73216   523672
74104   619675
74956   295646
76768   307580
77896   547597
80824   571547
87368   475799
90940   631604
113876  124651
123548  148525
130096  367533
149792  439726
173648  355578
175404  679545
190732  559580
225780  511706
326468  667547
352736  655477
398736  535644
469408  136582
483716  499753
513124  270926
575612  715601
590408  487780
666604  463779
713208  643540
714896  583515
803216  343438
806952  427663
855156  739506
872200  197221
975692  64371
987692  775594
1005268 595488
1024812 691482
1042212 727552
1047464 379566
1260044 76601
1276756 16384
1345072 403667
1474468 209158
1477808 172604
1536168 221124
1637652 258798
1811504 88598
1963740 245588
2076748 703467
2193536 415671
2430908 801322
2552640 319552
2785212 28315
3454880 112612
3755548 451666
3929420 100666
4651876 40451
5714940 751514
6257740 233293
7313900 184735
9334796 763606
10940780        283609
20837264        788338
45285640        607471


Re: Unexpectedly high disk space usage

От
Jeff Janes
Дата:
On Tue, Nov 6, 2012 at 10:49 AM, Lists <lists@benjamindsmith.com> wrote:
> I followed your example, the result is at the bottom. Based on this it would
> seem that there are 3-4 databases that seem to be the culprit. How could I
> get more depth/detail on what specifically is the problem?

If you have installed the contrib modules (oid2name specifically), you
can use that to get the name of the bloated database:

oid2name | fgrep 607471

If the name of the database doesn't give you any insight, then look
for large files in the directory base/607471 that whose names all
start with the same digits and use oid2name to get the names of the
relations for those files.

oid2name -d <name of database>  -o <base name of large files>

Cheers,

Jeff


Re: Unexpectedly high disk space usage

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Tue, Nov 6, 2012 at 10:49 AM, Lists <lists@benjamindsmith.com> wrote:
>> I followed your example, the result is at the bottom. Based on this it would
>> seem that there are 3-4 databases that seem to be the culprit. How could I
>> get more depth/detail on what specifically is the problem?

> If you have installed the contrib modules (oid2name specifically), you
> can use that to get the name of the bloated database:
> oid2name | fgrep 607471

Or, if you didn't install contrib, try

    select datname from pg_database where oid = 607471

> If the name of the database doesn't give you any insight, then look
> for large files in the directory base/607471 that whose names all
> start with the same digits and use oid2name to get the names of the
> relations for those files.

> oid2name -d <name of database>  -o <base name of large files>

For this you can try

    select relname from pg_class where relfilenode = <whatever>

Or let the database do the work:

    select relname, pg_relation_size(oid) from pg_class order by 2 desc;

            regards, tom lane


Re: Unexpectedly high disk space usage

От
Jeff Janes
Дата:
On Tue, Nov 6, 2012 at 11:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> On Tue, Nov 6, 2012 at 10:49 AM, Lists <lists@benjamindsmith.com> wrote:
>>> I followed your example, the result is at the bottom. Based on this it would
>>> seem that there are 3-4 databases that seem to be the culprit. How could I
>>> get more depth/detail on what specifically is the problem?
>
>> If you have installed the contrib modules (oid2name specifically), you
>> can use that to get the name of the bloated database:
>> oid2name | fgrep 607471
>
> Or, if you didn't install contrib, try
>
>         select datname from pg_database where oid = 607471

Thanks, I knew there had to be a more direct way to do that.

>
>> If the name of the database doesn't give you any insight, then look
>> for large files in the directory base/607471 that whose names all
>> start with the same digits and use oid2name to get the names of the
>> relations for those files.
>
>> oid2name -d <name of database>  -o <base name of large files>
>
> For this you can try
>
>         select relname from pg_class where relfilenode = <whatever>
>
> Or let the database do the work:
>
>         select relname, pg_relation_size(oid) from pg_class order by 2 desc;

Ben described using something like this method originally and not
finding the space, so I wanted to work backwards from certain
knowledge of where the OS says the space is being used.

But now I think maybe his scripts to aggregate table sizes over all
databases (and also his script to load pg_dumps of those databases
into a new cluster) are accidentally omitting some databases--the
largest ones.

Is there a simple query for a super-user to get a list of all relation
sizes over all databases cluster-wide?

If "\l+" can get the size of databases other than the one currently
connected to, maybe there is a way to extend that to tables in those
other databases.

It would at least be nice to be able to get the sizes of all
databases.  Since '\l+' doesn't sort by size and I don't know how to
make it do so, I pulled the query from psql source code and modified
it:

SELECT d.datname as Name,  pg_catalog.pg_get_userbyid(d.datdba) as Owner,
    CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
        THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))
        ELSE 'No Access'
    END as Size
FROM pg_catalog.pg_database d
    order by
    CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
        THEN pg_catalog.pg_database_size(d.datname)
        ELSE NULL
    END desc nulls first;

(And discovered a long forgotten unused database I had sitting around
taking up space)

Ben, did you ever figure out where the space was going?

Cheers,

Jeff


Re: Unexpectedly high disk space usage

От
Lists
Дата:
On 11/07/2012 09:01 AM, Jeff Janes wrote:
> Ben, did you ever figure out where the space was going?

I think we've found where the space is going, but I still don't yet know
how to resolve it. I modified your query thusly in order to get a total
of space used, and got an answer that matches closely:

with stuff as (SELECT d.datname as Name,
pg_catalog.pg_get_userbyid(d.datdba) as Owner,
     CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
         THEN pg_catalog.pg_database_size(d.datname)
         ELSE -1
     END as Size
FROM pg_catalog.pg_database d
     order by
     CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
         THEN pg_catalog.pg_database_size(d.datname)
         ELSE NULL
     END desc nulls first) SELECT sum(size) AS overall from stuff;
Result: 171,276,369,124

# du -sbc /var/lib/pgsql/9.1/data/*
Result: 172,087,129,512

Now, the question is, I see several databases that uses disk usage with
sizes that are dramatically different than I get from a dump/restore to
another machine:

Production:
  santarosa444        | postgres | 44 GB

Dump/Restore:
  santarosa444        | postgres | 685 MB

Now, here's where it gets weird. From the disk space usage wiki,
(http://wiki.postgresql.org/wiki/Disk_Usage) you'll recognize how I used
it to get a total disk space used result:

with mytable AS (
                 SELECT
                         nspname || '.' || relname AS "relation",
                         pg_total_relation_size(C.oid) AS "size"
                 FROM
                         pg_class C
                 LEFT JOIN pg_namespace N ON
                         (N.oid = C.relnamespace)
                 WHERE nspname NOT IN ('pg_catalog', 'information_schema')
                         AND C.relkind <> 'i'
                         AND nspname !~ '^pg_toast'
                 ORDER BY
                         pg_total_relation_size(C.oid) DESC
                 )
         SELECT sum(size) AS size FROM mytable

... but the total result is 747,569,152 which is close to the
dump/restore value, not the production server value, even though I'm
running this query on the production server. So there's *something* that
the latter query isn't identifying that the former is.

On a hunch, ran this query:

with mytable AS (
                 SELECT
                         nspname || '.' || relname AS "relation",
                         pg_total_relation_size(C.oid) AS "size"
                 FROM
                         pg_class C
                 LEFT JOIN pg_namespace N ON
                         (N.oid = C.relnamespace) WHERE (true or
(nspname NOT IN ('pg_catalog', 'information_schema')))
                         AND C.relkind <> 'i'
                         AND nspname !~ '^pg_toast'
                 ORDER BY
                         pg_total_relation_size(C.oid) DESC
                 )
         SELECT sum(size) AS size FROM mytable;

And the result is 46,771,216,384! Removing the "mytable" wrapper stuff,
here are the top results:

  pg_catalog.pg_attribute                                  | 36727480320
  pg_catalog.pg_attrdef                                    | 3800072192
  pg_catalog.pg_depend                                     | 2665930752
  pg_catalog.pg_class                                      | 1508925440
  pg_catalog.pg_type                                       | 1113038848
  public.att_claims                                        | 451698688
  public.stgrades                                          | 127639552
  pg_catalog.pg_index                                      | 107806720


Google returns this page:
http://www.postgresql.org/docs/9.1/static/catalog-pg-attribute.html
which doesn't help me much. So, am I doing something wrong with admin?
Our current process is that every night in the middle of the night, a
script connects to each database on each server and runs a query to get
all tables in each database and, for each, run

"VACUUM ANALYZE $table"

for each table in the database.

And then once a week:
psql -U postgres -c "\l" | grep -Po "(\w+444)" | xargs -t -i psql -U
postgres {} -c "REINDEX DATABASE {};"

(note: there is a database for the "postgres" user on each DB server)
The script is a remnant from PG 8.x days, so am I missing something
fundamental about 9.x? I will note that autovacuum is off because it
occasionally causes transactions and queries to hang when an update
causes a vacuum mid-day, effectively taking us offline randomly. Our
scenario is pretty much a worst-possible case of transactions, prepared
transactions, temp tables, and concurrent read/write queries.


Re: Unexpectedly high disk space usage

От
Tom Lane
Дата:
Lists <lists@benjamindsmith.com> writes:
>   pg_catalog.pg_attribute                                  | 36727480320

Ouch.

> Our current process is that every night in the middle of the night, a
> script connects to each database on each server and runs a query to get
> all tables in each database and, for each, run
> "VACUUM ANALYZE $table"
> for each table in the database.
> (note: there is a database for the "postgres" user on each DB server)
> The script is a remnant from PG 8.x days, so am I missing something
> fundamental about 9.x? I will note that autovacuum is off ...

So you've turned off autovacuum, and are carefully not vacuuming the
system catalogs.  That's your problem all right.  Is there a
particularly good reason why this script isn't a one-liner "VACUUM"?
Are you sure that once-a-day vacuuming is sufficient, even if it was
covering the system catalogs?  If you've managed to bloat pg_attribute
to 36GB, I suspect you've got enough churn (perhaps from temp tables)
that you really need the catalogs vacuumed more often.

My advice is dump, reload, and *don't* turn off autovacuum.

> ... because it
> occasionally causes transactions and queries to hang when an update
> causes a vacuum mid-day, effectively taking us offline randomly.

I suspect this claim is based on ancient and no longer very relevant
experience.

            regards, tom lane


Re: Unexpectedly high disk space usage

От
Jeff Janes
Дата:
Hi Greg, I've added you to the cc list because I'm proposing to change
some wiki content which you wrote

On Wed, Nov 7, 2012 at 11:54 AM, Lists <lists@benjamindsmith.com> wrote:
> On 11/07/2012 09:01 AM, Jeff Janes wrote:
>>
>> Ben, did you ever figure out where the space was going?
>
>
>
> Now, here's where it gets weird. From the disk space usage wiki,
> (http://wiki.postgresql.org/wiki/Disk_Usage) you'll recognize how I used it
> to get a total disk space used result:
>
> with mytable AS (
>                 SELECT
>                         nspname || '.' || relname AS "relation",
>                         pg_total_relation_size(C.oid) AS "size"
>                 FROM
>                         pg_class C
>                 LEFT JOIN pg_namespace N ON
>                         (N.oid = C.relnamespace)
>                 WHERE nspname NOT IN ('pg_catalog', 'information_schema')
>                         AND C.relkind <> 'i'
>                         AND nspname !~ '^pg_toast'
>                 ORDER BY
>                         pg_total_relation_size(C.oid) DESC
>                 )
>         SELECT sum(size) AS size FROM mytable

I question the wisdom of that where clause (from the wiki)

If the pg_catalog relations are big, then they are big and why
shouldn't they get reported as such?  And if they are not big, that is
why the limit is there on the wiki page.

I'm tempted to go change it, but maybe there is a good reason it is
there which I do not understand.



...
>
> Google returns this page:
> http://www.postgresql.org/docs/9.1/static/catalog-pg-attribute.html which
> doesn't help me much. So, am I doing something wrong with admin? Our current
> process is that every night in the middle of the night, a script connects to
> each database on each server and runs a query to get all tables in each
> database and, for each, run
>
> "VACUUM ANALYZE $table"
>
> for each table in the database.


I take it your script that does that is not including the pg_catalog tables?

Why not just run "vacuum analyze" and let it do the entire database?


> I will note that autovacuum is off because it occasionally causes
> transactions and queries to hang when an update causes a vacuum mid-day,
> effectively taking us offline randomly.

Hang as in they are blocking on locks?  Or they just get slow because
the autovacuum is consuming too much IO?

Cheers,

Jeff


Re: Unexpectedly high disk space usage

От
Scott Marlowe
Дата:
On Wed, Nov 7, 2012 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Lists <lists@benjamindsmith.com> writes:
>
>> ... because it
>> occasionally causes transactions and queries to hang when an update
>> causes a vacuum mid-day, effectively taking us offline randomly.
>
> I suspect this claim is based on ancient and no longer very relevant
> experience.

My experience is that if autovac is causing problems with stalled
queries etc you're either A: running ancient pg versions (pre 8.3), B:
Running WAY too aggressive settings in autovac (100 threads, no nap
time, cost limit of 100000 etc.) or C: Your IO subsystem is absolute
crap.

On any modern server, default autovac settings from 8.3 and on should
only have the possible problem of not being tuned aggressively enough.


Re: Unexpectedly high disk space usage

От
Scott Marlowe
Дата:
On Wed, Nov 7, 2012 at 1:58 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Wed, Nov 7, 2012 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Lists <lists@benjamindsmith.com> writes:
>>
>>> ... because it
>>> occasionally causes transactions and queries to hang when an update
>>> causes a vacuum mid-day, effectively taking us offline randomly.
>>
>> I suspect this claim is based on ancient and no longer very relevant
>> experience.
>
> My experience is that if autovac is causing problems with stalled
> queries etc you're either A: running ancient pg versions (pre 8.3), B:
> Running WAY too aggressive settings in autovac (100 threads, no nap
> time, cost limit of 100000 etc.) or C: Your IO subsystem is absolute
> crap.
>
> On any modern server, default autovac settings from 8.3 and on should
> only have the possible problem of not being tuned aggressively enough.

Oh another failure scenario up there is that you're running DDL in
production, which is stalling behind an autovac, and in turn the two
are stalling other queries.  This has happened for me once or twice on
more modern versions (8.3 and 8.4)


Re: Unexpectedly high disk space usage

От
Lists
Дата:
On 11/07/2012 12:42 PM, Tom Lane wrote:
> So you've turned off autovacuum, and are carefully not vacuuming the
> system catalogs.  That's your problem all right.  Is there a
> particularly good reason why this script isn't a one-liner "VACUUM"?

Back in the 8.x days, we experienced "vacuum full analyze" occasionally
causing other processes to hang/timeout. In an attempt to minimize the
impact of the locking, we updated the script to vacuum one table at a
time, which seemed to work well throughout the 8.x series. I'd happily
accept that this conclusion may have simply have been wrong, but it
worked well enough that nobody complained and life was good. After
switching to 9.x, we read that the "full" vacuum was less useful and so
the script was changed to "vacuum analyze $table" rather than "vacuum
full analyze $table".

> Are you sure that once-a-day vacuuming is sufficient, even if it was
> covering the system catalogs?  If you've managed to bloat pg_attribute
> to 36GB, I suspect you've got enough churn (perhaps from temp tables)
> that you really need the catalogs vacuumed more often.
The only thing that I could find in the docs even mentioning the idea of
vacuuming catalogs is this sentence:

(A manual VACUUM should fix the problem, as suggested by the hint; but
note that the VACUUM must be performed by a superuser, else it will fail
to process system catalogs and thus not be able to advance the
database's datfrozenxid.)
http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html

This does NOT clearly say that the end user could vacuum catalogs, let
alone that it's necessary or even a good idea. Otherwise, the only
mention is of tables, and there's no mention of the idea that tables are
anything but user space.

> My advice is dump, reload, and *don't* turn off autovacuum.
>
>> ... because it
>> occasionally causes transactions and queries to hang when an update
>> causes a vacuum mid-day, effectively taking us offline randomly.
> I suspect this claim is based on ancient and no longer very relevant
> experience.
>
We tried several times to turn on autovacuum with 9.1 and had problems
every time. If our use case is particularly special, I'd love to work
with you to get autovacuum to work in our situation too as it would make
life easier for us! But for the past few months, every time we've turned
it on, we've had our phones swamped with customers who are unable to use
our system while our application monitors scream bloody murder, at least
weekly.

 From what we could tell (under extreme pressure to get it all working
again ASAP, mind you) it seemed that when doing a large update from
within a transaction, autovacuum would get triggered before the
transaction completed, causing the transaction to hang or at least slow
way down, causing timeouts to occur with load balancers, so customers
would then try again, compounding the ongoing problem. Pretty soon you
have not only I/O issues, but also locking issues and upset customers.
This issue may be compounded because we make fairly extensive use of
dblink and temp tables to aggregate data for our customers who have
multiple sites.

-Ben


Re: Unexpectedly high disk space usage

От
Lists
Дата:
On 11/07/2012 12:58 PM, Scott Marlowe wrote:
> My experience is that if autovac is causing problems with stalled
> queries etc you're either A: running ancient pg versions (pre 8.3), B:
> Running WAY too aggressive settings in autovac (100 threads, no nap
> time, cost limit of 100000 etc.) or C: Your IO subsystem is absolute
> crap. On any modern server, default autovac settings from 8.3 and on
> should only have the possible problem of not being tuned aggressively
> enough.

A) We are running PG 9.1.

B) We used the default settings in the RPMs provided by
yum.postgresql.org. At the bottom of this message is information about
the RPMs we currently are using.

C) I/O subsystem for PG is twin SATA III SSDs in a RAID 1 configuration,
capable of tens of thousands of IO operations per second. Servers are
recent, SATA III, 16-core Xeons with 128 GB ECC RAM in 1U rackmount cases.

As stated previously, we make extensive use of temp tables,
transactions, and dblink, but had no trouble with catalog table bloat in
8.x; this is a new phenomenon for us.

# rpm -qi postgresql91-9.1.5-3PGDG.rhel6.x86_64
Name        : postgresql91                 Relocations: (not relocatable)
Version     : 9.1.5                             Vendor: (none)
Release     : 3PGDG.rhel6                   Build Date: Sun 02 Sep 2012
12:13:18 PM UTC
Install Date: Wed 12 Sep 2012 03:04:24 AM UTC      Build Host:
koji-sl6-x86-64-pg91
Group       : Applications/Databases        Source RPM:
postgresql91-9.1.5-3PGDG.rhel6.src.rpm
Size        : 5193673                          License: PostgreSQL
Signature   : DSA/SHA1, Sun 02 Sep 2012 12:13:20 PM UTC, Key ID
1f16d2e1442df0f8
URL         : http://www.postgresql.org/
Summary     : PostgreSQL client programs and libraries
Description :
PostgreSQL is an advanced Object-Relational database management system
(DBMS) that supports almost all SQL constructs (including
transactions, subselects and user-defined types and functions). The
postgresql package includes the client programs and libraries that
you'll need to access a PostgreSQL DBMS server.  These PostgreSQL
client programs are programs that directly manipulate the internal
structure of PostgreSQL databases on a PostgreSQL server. These client
programs can be located on the same machine with the PostgreSQL
server, or may be on a remote machine which accesses a PostgreSQL
server over a network connection. This package contains the command-line
utilities for managing PostgreSQL databases on a PostgreSQL server.

If you want to manipulate a PostgreSQL database on a local or remote
PostgreSQL
server, you need this package. You also need to install this package
if you're installing the postgresql91-server package.


Re: Unexpectedly high disk space usage

От
Greg Smith
Дата:
On 11/7/12 3:58 PM, Jeff Janes wrote:
>>                  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
>
> I question the wisdom of that where clause (from the wiki)
>
> If the pg_catalog relations are big, then they are big and why
> shouldn't they get reported as such?  And if they are not big, that is
> why the limit is there on the wiki page.

The idea was that in a new database with a relatively small number of
tables, your own tables will be lost among the catalog data unless you
filter them out.  Testing against an install with a single real table,
the query there will show something like this right now:

     relation    | total_size
----------------+------------
  public.t       | 3568 kB
  public.t_k_seq | 8192 bytes

But if the filter on pg_catalog is removed, you get this instead:

           relation           | total_size
-----------------------------+------------
  public.t                    | 3568 kB
  pg_catalog.pg_depend        | 808 kB
  pg_catalog.pg_proc          | 752 kB
  pg_catalog.pg_attribute     | 568 kB
  pg_catalog.pg_rewrite       | 464 kB
  pg_catalog.pg_description   | 392 kB
  pg_catalog.pg_statistic     | 328 kB
  pg_catalog.pg_operator      | 208 kB
  pg_catalog.pg_collation     | 152 kB
  pg_catalog.pg_type          | 152 kB
  pg_catalog.pg_amop          | 136 kB
  pg_catalog.pg_class         | 136 kB
  pg_catalog.pg_constraint    | 112 kB
  pg_catalog.pg_conversion    | 104 kB
  pg_catalog.pg_index         | 88 kB
  pg_catalog.pg_amproc        | 80 kB
  pg_catalog.pg_opclass       | 80 kB
  pg_catalog.pg_ts_config_map | 80 kB
  pg_catalog.pg_cast          | 80 kB
  pg_catalog.pg_authid        | 72 kB

That is overload for a lot of people, and confusing to new users.
That's why I opted for the shorter version.

There's no perfect answer to all use cases here.  This sort of thing is
why there's three sets of queries for pg_stat_user_tables,
pg_stat_sys_tables, and pg_stat_all_tables.  The wiki disk space queries
aim to be like the user tables version from that trio.

Adding a note pointing out that you might want to remove pg_catalog and
see the size of those relations would be appropriate.  I wouldn't make
that the default case though, due to the issue highlighted above.  I'd
rather optimize the initially suggested query so that new users get
simple output, even if it means that might hide problems on larger
installs, where the catalog data became big.

The other way I sometimes balance these two requirements--want to show
all the big data, but not clutter small installs with the catalog--is to
make the filter size-based instead:

SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('information_schema')
AND C.relkind <> 'i'
AND nspname !~ '^pg_toast'
AND relpages > 100
ORDER BY pg_total_relation_size(C.oid) DESC
                                                        LIMIT 20;

On my trivial test install that gives me just the one user table:

  relation | total_size
----------+------------
  public.t | 3568 kB

While still showing larger catalog tables if they grow to be noticeable.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


Re: Unexpectedly high disk space usage

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
>> WHERE nspname NOT IN ('pg_catalog', 'information_schema')
>> AND C.relkind <> 'i'
>> AND nspname !~ '^pg_toast'

> I question the wisdom of that where clause (from the wiki)

> If the pg_catalog relations are big, then they are big and why
> shouldn't they get reported as such?

Agreed, please change it.

(The index and toast exclusions are reasonable, since those will be
accounted for in pg_total_relation_size of the parent.  Personally I'd
code the toast exclusion using relkind not a namespace check though.)

            regards, tom lane


Re: Unexpectedly high disk space usage

От
Scott Marlowe
Дата:
On Wed, Nov 7, 2012 at 3:15 PM, Lists <lists@benjamindsmith.com> wrote:
> On 11/07/2012 12:42 PM, Tom Lane wrote:
>>
>> So you've turned off autovacuum, and are carefully not vacuuming the
>> system catalogs.  That's your problem all right.  Is there a
>> particularly good reason why this script isn't a one-liner "VACUUM"?
>
>
> Back in the 8.x days, we experienced "vacuum full analyze" occasionally
> causing other processes to hang/timeout.

That was your first mistake.  By 8.0 the need for vacuum full was
almost zero. Except for instances where bloat got out of hand, vacuum
full should generally be avoided after 8.0.  Regular vacuum should be
plenty, whether it's run by autovacuum daemon or a cron job. Until 8.3
autovacuum was single threaded so therefore often had trouble keeping
up with bloat.  While vacuum full is a blocking operation plain
vacuums are not, so unless you REALLY need a vacuum full they should
be avoided.


> In an attempt to minimize the
> impact of the locking, we updated the script to vacuum one table at a time,
> which seemed to work well throughout the 8.x series. I'd happily accept that
> this conclusion may have simply have been wrong, but it worked well enough
> that nobody complained and life was good.

Yeah you still had blocking but it was probably less noticeable.

> After switching to 9.x, we read
> that the "full" vacuum was less useful and so the script was changed to
> "vacuum analyze $table" rather than "vacuum full analyze $table".

Yeah at that point you'd have been better off tuning autovacuum to be
more aggressive and let it do the job.  Generally the time to call
vacuum by hand is right after you've done something like delete half
the rows in a large table.


Re: Unexpectedly high disk space usage

От
Lists
Дата:
On 11/07/2012 12:42 PM, Tom Lane wrote:
>> ... because it
>> >occasionally causes transactions and queries to hang when an update
>> >causes a vacuum mid-day, effectively taking us offline randomly.
> I suspect this claim is based on ancient and no longer very relevant
> experience.
Even so, if I felt the need to keep autovacuum off, what would I need to
run regularly in order to keep things neat and tidy under the hood?
Would a simple "vacuum" within each database suffice? Should I be logged
in as the database owner or as an administrative user?

Thanks,

Ben


Re: Unexpectedly high disk space usage

От
Tom Lane
Дата:
Lists <lists@benjamindsmith.com> writes:
> Even so, if I felt the need to keep autovacuum off, what would I need to
> run regularly in order to keep things neat and tidy under the hood?
> Would a simple "vacuum" within each database suffice? Should I be logged
> in as the database owner or as an administrative user?

A plain "vacuum" (or probably better, "vacuum analyze") done as
superuser will suffice, as long as you do it often enough.

            regards, tom lane


Re: Unexpectedly high disk space usage

От
Scott Marlowe
Дата:
On Thu, Nov 8, 2012 at 6:05 PM, Lists <lists@benjamindsmith.com> wrote:
> On 11/07/2012 12:42 PM, Tom Lane wrote:
>>>
>>> ... because it
>>> >occasionally causes transactions and queries to hang when an update
>>> >causes a vacuum mid-day, effectively taking us offline randomly.
>>
>> I suspect this claim is based on ancient and no longer very relevant
>> experience.
>
> Even so, if I felt the need to keep autovacuum off, what would I need to run
> regularly in order to keep things neat and tidy under the hood? Would a
> simple "vacuum" within each database suffice? Should I be logged in as the
> database owner or as an administrative user?

Just know that most of the time people think they need to turn off
autovacuum they usually need to tune it instead.  either more or less
agressive depending on why they think they need to turn it off.  If
it's consuming too much IO then reduce cost limit / increase cost
delay, if it's not aggressive enough, then reverse that and increase
cost limit and decrease cost delay.  If your IO subsystem can't keep
up, then turning off autovacuum or turning it down simply be delaying
the problem rather than solving it (i.e. throw more IO at it).


Re: Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

От
Lists
Дата:
As I've spent a considerable amount of time trying to sort this out, I'm
posting it for the benefit other users.

I've experienced persistent, ongoing issues with autovacuum in a mixed
read/write environment with midrange hardware (16 core Xeon, 128 GB RAM,
200 GB SATA3 6 Gb SSDs for disk I/O on RHEL6) using the defaults in the
9.1 RPMs provided by Postgres. (yum.postgresql.org) The cause of this is
not yet determined. It may be related to the any or all of the
combination of:

     A) extensive use of temp tables;
     B) extensive use of multiple dblink() calls in a single query;
     C) use of transactions, especially prepared transactions and
multiple savepoints;
     D) concurrent use of pg_dump;
     E) use of numerous databases on a single server, average about 50;

To offset this, we turned off autovacuum, and used an old script to
vacuum the tables in the middle of the night when nobody was looking.
Unfortunately, the vacuum script only vacuumed the "userland" tables and
tremendous amounts of disk space were being wasted, particularly in the
pg_attribute tables.

However, use of any of the statements "vacuum analyze", "vacuum full
analyze", "vacuum full verbose analyze" without mentioning specific
tables did not resolve the extra disk space used issue, disk usage still
remained at least 5x the expected amount in all cases. (in one case, use
of all of these open-ended vacuum queries did almost nothing)

Nor did running any variation of "vacuum analyze $table" in a loop thru
all tables (including the pg_* tables) completely resolve the issue,
either.

In order to completely clean things up, we ended up writing a script do
the following:

1) Determine the databases using excessive disk space, in descending
order of use with this query:

SELECT
d.datname as Name,
d.datistemplate::int AS datistemplate,
pg_catalog.pg_get_userbyid(d.datdba) as Owner,
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
     THEN pg_catalog.pg_database_size(d.datname)
     ELSE -1
     END as Size
FROM pg_catalog.pg_database d
     order by
         CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
         THEN pg_catalog.pg_database_size(d.datname)
         ELSE NULL
         END desc nulls first;

2) For each database from #1, get a list of tables to be cleaned up with
this query:

SELECT
     nspname || '.' || relname AS "table",
     pg_total_relation_size(C.oid) AS "size"
FROM
     pg_class C
LEFT JOIN pg_namespace N ON
     (N.oid = C.relnamespace) WHERE (true or (nspname NOT IN
('pg_catalog', 'information_schema')))
     AND C.relkind = 'r'
     AND nspname !~ '^pg_toast'
ORDER BY
     pg_total_relation_size(C.oid) DESC;

3) For each of the tables from #2, run the commands
REINDEX TABLE $table;
VACUUM FULL ANALYZE $table;

The end result is a squeaky-clean database server with expected disk usage.

NOTES:

1) The above queries are derived from queries found to determine how
much disk space was used, even though the additional information
provided isn't actually used by the script.

2) It was sheer chance that I discovered the need to reindex prior to
vacuum in order to get the disk space back.

3) I'd like to get autovacuum to work. I've read suggestions to tweak
cost_delay and/or cost_limit. I haven't yet determined if the problem is
I/O based or lock/deadlock based. I'm guessing the problem is the
latter, though it's hard to tell because queries stack up quickly and
load average is sky high when autovacuum fails for us.

4) The aforementioned process is S-L-O-W. Think at least hours and
probably days depending on your databases, your server(s), and the load.

5) I don't yet know if the "full" option for the vacuum is necessary to
free up all space. I will experiment with this and post results if useful.


Re: Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

От
Steve Crawford
Дата:
On 11/09/2012 04:28 PM, Lists wrote:
> As I've spent a considerable amount of time trying to sort this out,
> I'm posting it for the benefit other users.

Having missed the earlier conversation....a couple comments:

> I've experienced persistent, ongoing issues with autovacuum in a mixed
> read/write environment with midrange hardware (16 core Xeon, 128 GB
> RAM, 200 GB SATA3 6 Gb SSDs for disk I/O on RHEL6) using the defaults
> in the 9.1 RPMs provided by Postgres. (yum.postgresql.org)

Don't do that. Defaults are good for ensuring that PostgreSQL will start
on the widest reasonable variety of systems. They are *terrible* for
performance and are certainly wrong for the system you describe.

> The cause of this is not yet determined. It may be related to the any
> or all of the combination of:
>
>     A) extensive use of temp tables;
>     B) extensive use of multiple dblink() calls in a single query;
>     C) use of transactions, especially prepared transactions and
> multiple savepoints;
>     D) concurrent use of pg_dump;
>     E) use of numerous databases on a single server, average about 50;
>
> To offset this, we turned off autovacuum, and used an old script to
> vacuum the tables in the middle of the night when nobody was looking.
> Unfortunately, the vacuum script only vacuumed the "userland" tables
> and tremendous amounts of disk space were being wasted, particularly
> in the pg_attribute tables.

Bloat in pg_attribute would correlate with A) (or any constant
creation/destruction of tables). You can vacuum and/or reindex the
system tables if you are connected as the superuser but you are better
off preventing bloat by appropriate adjustment of your configuration
settings. However note that if you do frequent bulk creation/destruction
of tables you could end up bloating the attribute table between vacuum
runs and may need to periodically manually shrink it.

Cheers,
Steve



Re: Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

От
Scott Marlowe
Дата:
On Fri, Nov 9, 2012 at 5:28 PM, Lists <lists@benjamindsmith.com> wrote:
> As I've spent a considerable amount of time trying to sort this out, I'm
> posting it for the benefit other users.
SNIP

>     D) concurrent use of pg_dump;

Not usually a problem, unless it's overloading your IO subsystem.

>     C) use of transactions, especially prepared transactions and multiple
> savepoints;
>     E) use of numerous databases on a single server, average about 50;

These two can be killers.  Long running transactions can cause
autovacuum processes to stall out or be autocancelled.

As well, since the default nap time is 1 minute, it will take at least
50 minutes to vacuum each db as nap time is how long autovac waits
between databases.

Reducing autovacuum nap time to 5 or 10 seconds would be a good move
here, also possibly making it more aggressive by increasing max worker
threads, decreasing cost delay (possibly to zero or close to it) and /
or increasing cost limit.  After making such a change then watching
iostat when vacuum is running to see how hard its hitting your IO
subsystem.  I'm guessing that with SSDs it isn't gonna be a big
problem.

As Greg Smith has pointed out in the past, usually the answer to an
autovacuum problem is making it more, not less aggressive.  Unless
you're flooding your IO this is almost always the right answer.  Keep
in mind that autovacuum by default is setup to be VERY unaggressive
because it may be running on a netbook for all it knows.

To tune autovacuum with 50 databases, start by dropping nap time to
something much lower, like 10s.  Then if you need to, drop cost delay
until you get to 0.  If you get to 0 and it's still not hitting your
IO too hard, but not keeping up, then increase cost limit.  If you get
to something in the 5000 to 10000 range, and its still not keeping up
then start bumping the thread count


Re: Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

От
Greg Williamson
Дата:
Scott -- 

<...>

>To tune autovacuum with 50 databases, start by dropping nap time to
>something much lower, like 10s.  Then if you need to, drop cost delay
>until you get to 0.  If you get to 0 and it's still not hitting your
>IO too hard, but not keeping up, then increase cost limit.  If you get
>to something in the 5000 to 10000 range, and its still not keeping up
>then start bumping the thread count
>


Thanks for outlining a strategy on this -- useful advice.

Greg Williamson



Re: Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

От
Jeff Janes
Дата:
On Fri, Nov 9, 2012 at 5:56 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>
> As well, since the default nap time is 1 minute, it will take at least
> 50 minutes to vacuum each db as nap time is how long autovac waits
> between databases.

That isn't how it works.  The naptime is per database, not per
cluster.  If the naptime is 1 minute and there are 50 "active"
databases, then it will launch a new worker every 1.2 seconds
(assuming the old ones finish fast enough that doing so would not
exceed autovacuum_max_workers)

Cheers,

Jeff


Re: Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

От
Scott Marlowe
Дата:
On Sat, Nov 10, 2012 at 2:12 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Fri, Nov 9, 2012 at 5:56 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>>
>> As well, since the default nap time is 1 minute, it will take at least
>> 50 minutes to vacuum each db as nap time is how long autovac waits
>> between databases.
>
> That isn't how it works.  The naptime is per database, not per
> cluster.  If the naptime is 1 minute and there are 50 "active"
> databases, then it will launch a new worker every 1.2 seconds
> (assuming the old ones finish fast enough that doing so would not
> exceed autovacuum_max_workers)

Hmmm.  That was not my understanding from previous discussions on nap time.


Re: Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

От
Adrian Klaver
Дата:
On 11/10/2012 02:08 PM, Scott Marlowe wrote:
> On Sat, Nov 10, 2012 at 2:12 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> On Fri, Nov 9, 2012 at 5:56 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>>>
>>> As well, since the default nap time is 1 minute, it will take at least
>>> 50 minutes to vacuum each db as nap time is how long autovac waits
>>> between databases.
>>
>> That isn't how it works.  The naptime is per database, not per
>> cluster.  If the naptime is 1 minute and there are 50 "active"
>> databases, then it will launch a new worker every 1.2 seconds
>> (assuming the old ones finish fast enough that doing so would not
>> exceed autovacuum_max_workers)
>
> Hmmm.  That was not my understanding from previous discussions on nap time.
>
>


When in doubt there are the docs:)

http://www.postgresql.org/docs/9.1/interactive/routine-vacuuming.html#AUTOVACUUM

"The "autovacuum daemon" actually consists of multiple processes. There
is a persistent daemon process, called the autovacuum launcher, which is
in charge of starting autovacuum worker processes for all databases. The
launcher will distribute the work across time, attempting to start one
worker within each database every autovacuum_naptime seconds.
(Therefore, if the installation has N databases, a new worker will be
launched every autovacuum_naptime/N seconds.)"

--
Adrian Klaver
adrian.klaver@gmail.com


Re: Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

От
Jeff Janes
Дата:
On Fri, Nov 9, 2012 at 4:28 PM, Lists <lists@benjamindsmith.com> wrote:


...

> 3) For each of the tables from #2, run the commands
> REINDEX TABLE $table;
> VACUUM FULL ANALYZE $table;
>
> The end result is a squeaky-clean database server with expected disk usage.
>
> NOTES:
...
>
>
> 2) It was sheer chance that I discovered the need to reindex prior to vacuum
> in order to get the disk space back.

As of 9.0, a "vacuum full" inherently does a reindex, so doing an
explicit one is neither necessary nor beneficial.

I don't know if your discovery is based on a non-full vacuum, or on an
older server.


> 5) I don't yet know if the "full" option for the vacuum is necessary to free
> up all space. I will experiment with this and post results if useful.

The answer to this is mostly non-deterministic.  non-full vacuum can
only free space from the "end" of the table.

If all of your long-lived objects were created before pg_attribute got
bloated and so the bloat was due only to short-lived objects, then
non-full vacuum (if run often enough) should eventually be able to
return that space as the short-lived objects near the end start to go
away.  However, if even a single long-live object finds itself at the
end of the table, then only a vacuum full will ever be able to reclaim
that space.

Cheers,

Jeff


Re: Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

От
Scott Marlowe
Дата:
On Sat, Nov 10, 2012 at 3:20 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote:
> On 11/10/2012 02:08 PM, Scott Marlowe wrote:
>>
>> On Sat, Nov 10, 2012 at 2:12 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>>
>>> On Fri, Nov 9, 2012 at 5:56 PM, Scott Marlowe <scott.marlowe@gmail.com>
>>> wrote:
>>>>
>>>>
>>>> As well, since the default nap time is 1 minute, it will take at least
>>>> 50 minutes to vacuum each db as nap time is how long autovac waits
>>>> between databases.
>>>
>>>
>>> That isn't how it works.  The naptime is per database, not per
>>> cluster.  If the naptime is 1 minute and there are 50 "active"
>>> databases, then it will launch a new worker every 1.2 seconds
>>> (assuming the old ones finish fast enough that doing so would not
>>> exceed autovacuum_max_workers)
>>
>>
>> Hmmm.  That was not my understanding from previous discussions on nap
>> time.
>>
>>
>
>
> When in doubt there are the docs:)
>
> http://www.postgresql.org/docs/9.1/interactive/routine-vacuuming.html#AUTOVACUUM
>
> "The "autovacuum daemon" actually consists of multiple processes. There is a
> persistent daemon process, called the autovacuum launcher, which is in
> charge of starting autovacuum worker processes for all databases. The
> launcher will distribute the work across time, attempting to start one
> worker within each database every autovacuum_naptime seconds. (Therefore, if
> the installation has N databases, a new worker will be launched every
> autovacuum_naptime/N seconds.)"

And apparently it wasn't always this way:

http://www.postgresql.org/docs/8.2/interactive/routine-vacuuming.html#AUTOVACUUM

The autovacuum daemon, when enabled, runs every autovacuum_naptime
seconds. On each run, it selects one database to process and checks
each table within that database. VACUUM or ANALYZE commands are issued
as needed.


Re: Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

От
Adrian Klaver
Дата:
On 11/10/2012 02:23 PM, Scott Marlowe wrote:
>
>>
>> When in doubt there are the docs:)
>>
>> http://www.postgresql.org/docs/9.1/interactive/routine-vacuuming.html#AUTOVACUUM
>>
>> "The "autovacuum daemon" actually consists of multiple processes. There is a
>> persistent daemon process, called the autovacuum launcher, which is in
>> charge of starting autovacuum worker processes for all databases. The
>> launcher will distribute the work across time, attempting to start one
>> worker within each database every autovacuum_naptime seconds. (Therefore, if
>> the installation has N databases, a new worker will be launched every
>> autovacuum_naptime/N seconds.)"
>
> And apparently it wasn't always this way:
>
> http://www.postgresql.org/docs/8.2/interactive/routine-vacuuming.html#AUTOVACUUM
>
> The autovacuum daemon, when enabled, runs every autovacuum_naptime
> seconds. On each run, it selects one database to process and checks
> each table within that database. VACUUM or ANALYZE commands are issued
> as needed.

Seems to have changed in 8.3:

http://www.postgresql.org/docs/8.3/interactive/routine-vacuuming.html

"Beginning in PostgreSQL 8.3, autovacuum has a multiprocess
architecture: There is a daemon process, called the autovacuum launcher,
which is in charge of starting autovacuum worker processes for all
databases. The launcher will distribute the work across time, but
attempt to start one worker on each database every autovacuum_naptime
seconds. One worker will be launched for each database, with a maximum
of autovacuum_max_workers processes running at the same time..."

>
>


--
Adrian Klaver
adrian.klaver@gmail.com


Re: Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

От
Lists
Дата:
On 11/09/2012 05:26 PM, Steve Crawford wrote:
> Bloat in pg_attribute would correlate with A) (or any constant
> creation/destruction of tables). You can vacuum and/or reindex the
> system tables if you are connected as the superuser but you are better
> off preventing bloat by appropriate adjustment of your configuration
> settings. However note that if you do frequent bulk
> creation/destruction of tables you could end up bloating the attribute
> table between vacuum runs and may need to periodically manually shrink
> it.
>


Steve,

Our system divides customers into distinct databases, however customers
are often clustered. (Think: different locations of 7/11) and so we have
to aggregate data from different databases. We do this with dblink to
get the data and temp tables to collate it, which appears to be a cause
of the bloat we're seeing.

-Ben


Re: Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

От
Lists
Дата:
On 11/10/2012 02:21 PM, Jeff Janes wrote:
> On Fri, Nov 9, 2012 at 4:28 PM, Lists <lists@benjamindsmith.com> wrote:
>
>
> ...
>
>> 3) For each of the tables from #2, run the commands
>> REINDEX TABLE $table;
>> VACUUM FULL ANALYZE $table;
>>
>> The end result is a squeaky-clean database server with expected disk usage.
>>
>> NOTES:
> ...
>>
>> 2) It was sheer chance that I discovered the need to reindex prior to vacuum
>> in order to get the disk space back.
> As of 9.0, a "vacuum full" inherently does a reindex, so doing an
> explicit one is neither necessary nor beneficial.
>
> I don't know if your discovery is based on a non-full vacuum, or on an
> older server.

I can only state that merely doing a "vacuum full" or "vacuum full
$tables" sequentially did not free the space, whereas the sequential
reindex $table, each followed immediately by a vacuum full $table) did.
If you'd like I can easily recreate the scenario by simply not "cleaning
up" one of the DB servers until it bloats up and make available (limit
distribution) a binary copy of the database (EG: rsync the
/var/lib/pgsql/ filesystem late at night) in order to help identify why
it didn't work as expected.

>
>> 5) I don't yet know if the "full" option for the vacuum is necessary to free
>> up all space. I will experiment with this and post results if useful.
> The answer to this is mostly non-deterministic.  non-full vacuum can
> only free space from the "end" of the table.
>
> If all of your long-lived objects were created before pg_attribute got
> bloated and so the bloat was due only to short-lived objects, then
> non-full vacuum (if run often enough) should eventually be able to
> return that space as the short-lived objects near the end start to go
> away.  However, if even a single long-live object finds itself at the
> end of the table, then only a vacuum full will ever be able to reclaim
> that space.
>

Since the time period involved (weeks/months) would have included both a
large number of created/destroyed temp tables and occasionally altered
persistent objects it would appear that the full option a very good
idea, at least periodically.

-Ben


Enabling Autovacuum Postgres 9.1 (was Unexpectedly high disk space usage)

От
Lists
Дата:
The good news is that we have now resolved our critical problem (disk space overuse) with a somewhat hackish, slow answer that is nonetheless good enough for now.

Now I'd like to work out how to get autovacuum to work smoothly within our cluster. I'm happy to try to clarify my notes and post them either here or on the PG wiki as I get them resolved.

There's a wealth of how to tune PG instruction that's old and (based on this thread alone) often stale enough to be classified as disinformative. For example, nearest I can tell, the entirety of this page is just wrong and/or irrelevant for 9.x and up: http://wiki.postgresql.org/wiki/VACUUM_FULL. In constrast to the advice to "avoid Vacuum Full" ( http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT) comes the insight that vacuum full is necessary to clean up stale data that is not at the "end" of the table. (See Jeff Janes 11/10/2012 email)

non-full vacuum can
only free space from the "end" of the table.
This would imply that a full analyze is a good idea, at least periodically (weekly/monthly/quarterly) in a database that combines the use of temp tables and periodic changes to persistent objects. Does autovacuum ever do a "full" analyze"? What about autovacuum and the reindex question at the end of this email?

On 11/10/2012 02:49 PM, Adrian Klaver wrote:

Seems to have changed in 8.3:

http://www.postgresql.org/docs/8.3/interactive/routine-vacuuming.html

"Beginning in PostgreSQL 8.3, autovacuum has a multiprocess architecture: There is a daemon process, called the autovacuum launcher, which is in charge of starting autovacuum worker processes for all databases. The launcher will distribute the work across time, but attempt to start one worker on each database every autovacuum_naptime seconds. One worker will be launched for each database, with a maximum of autovacuum_max_workers processes running at the same time..."


Sadly, this change means that I can no be certain of the utility of the otherwise excellent-sounding advice originally offered by Scott, quoted below. It appears that naptime is (as of 9.x)  almost irrelevant since it's defined per database, and dropping this from 1 minute to 5 seconds would have very little overall impact.

These two can be killers.  Long running transactions can cause
autovacuum processes to stall out or be autocancelled.
"Long running transactions" - is now long? In our system it's rare to have a transaction (even a prepared transaction) last much longer than a few minutes. Is that enough time to cause problems with AutoVacuum?

As well, since the default nap time is 1 minute, it will take at least
50 minutes to vacuum each db as nap time is how long autovac waits
between databases.

Reducing autovacuum nap time to 5 or 10 seconds would be a good move
here, also possibly making it more aggressive by increasing max worker
threads, decreasing cost delay (possibly to zero or close to it) and /
or increasing cost limit.  After making such a change then watching
iostat when vacuum is running to see how hard its hitting your IO
subsystem.  I'm guessing that with SSDs it isn't gonna be a big
problem.

As Greg Smith has pointed out in the past, usually the answer to an
autovacuum problem is making it more, not less aggressive.  Unless
you're flooding your IO this is almost always the right answer.  Keep
in mind that autovacuum by default is setup to be VERY unaggressive
because it may be running on a netbook for all it knows.

To tune autovacuum with 50 databases, start by dropping nap time to
something much lower, like 10s.  Then if you need to, drop cost delay
until you get to 0.  If you get to 0 and it's still not hitting your
IO too hard, but not keeping up, then increase cost limit.  If you get
to something in the 5000 to 10000 range, and its still not keeping up
then start bumping the thread count

Should I increase the max_workers field from the default of 3 to (perhaps) 10? Noting that my solution to the disk space problem is effectively a max_worker of 1 since it's all done sequentially, I wonder if reducing max_workers would actually be better?

Also, what's the "thread count" ? Is that max_workers?

Why would I want to reduce the cost delay to 0, and how does this relate to cost_limit? Careful reading of the docs: http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html makes me believe that, given my substantial I/O subsystem, I'd want to drop cost_delay to near zero and set the cost_limit really high, which is a rough restatement of the last quoted paragraph above. (I think)

Assuming that I make these suggestions and notice a subsequent system load problem, what information should I be gathering in order to provide better post-incident forensics? We have statistics turned on, and aren't using replication. (yet)

Lastly, there's the question of reindexing before full vacuum. I've observed that not doing a manual reindex prior to vacuum full did not, in fact, free up the space, even though I've been told that reindex is implicit in the vacuum process. (?!) I'm confident that I can reproduce this behavior given a bit of time to allow one of our database servers to bloat back up.

-Ben

Re: Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

От
Jeff Janes
Дата:
On Mon, Nov 12, 2012 at 10:38 AM, Lists <lists@benjamindsmith.com> wrote:
> On 11/10/2012 02:21 PM, Jeff Janes wrote:
>>
>> On Fri, Nov 9, 2012 at 4:28 PM, Lists <lists@benjamindsmith.com> wrote:
>>
>>>
>>> 2) It was sheer chance that I discovered the need to reindex prior to
>>> vacuum
>>> in order to get the disk space back.
>>
>> As of 9.0, a "vacuum full" inherently does a reindex, so doing an
>> explicit one is neither necessary nor beneficial.
>>
>> I don't know if your discovery is based on a non-full vacuum, or on an
>> older server.
>
>
> I can only state that merely doing a "vacuum full" or "vacuum full $tables"
> sequentially did not free the space, whereas the sequential  reindex $table,
> each followed immediately by a vacuum full $table) did.

With what version?


> If you'd like I can
> easily recreate the scenario by simply not "cleaning up" one of the DB
> servers until it bloats up and make available (limit distribution) a binary
> copy of the database (EG: rsync the /var/lib/pgsql/ filesystem late at
> night) in order to help identify why it didn't work as expected.

Do you think can make an easily script-able way to re-create the
resistant bloat?  That would be better than trying to disseminate
binary files, I think.

What I did was just create and drop temp tables in a tight loop, with
autovacuum off, and then once pg_attribute got good and bloated, did a
vacuum full as the database owner or superuser.

>>
>> If all of your long-lived objects were created before pg_attribute got
>> bloated and so the bloat was due only to short-lived objects, then
>> non-full vacuum (if run often enough) should eventually be able to
>> return that space as the short-lived objects near the end start to go
>> away.  However, if even a single long-live object finds itself at the
>> end of the table, then only a vacuum full will ever be able to reclaim
>> that space.
>>
>
> Since the time period involved (weeks/months) would have included both a
> large number of created/destroyed temp tables and occasionally altered
> persistent objects it would appear that the full option a very good idea, at
> least periodically.

If you can prevent the extreme bloat from occurring in the first
place, then the "end" of the table would not be so far away from its
desired size that it needs to get reset by a vacuum full.

If you find your self in need of a vacuum full, then you should do
one.  But you should ask yourself what went wrong that you got into
that situation in the first place.

Cheers,

Jeff


Re: Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

От
Lists
Дата:
On 11/12/2012 01:31 PM, Jeff Janes wrote:
> On Mon, Nov 12, 2012 at 10:38 AM, Lists <lists@benjamindsmith.com> wrote:
>> On 11/10/2012 02:21 PM, Jeff Janes wrote:
>>> On Fri, Nov 9, 2012 at 4:28 PM, Lists <lists@benjamindsmith.com> wrote:
>>>
>>>> 2) It was sheer chance that I discovered the need to reindex prior to
>>>> vacuum
>>>> in order to get the disk space back.
>>> As of 9.0, a "vacuum full" inherently does a reindex, so doing an
>>> explicit one is neither necessary nor beneficial.
>>>
>>> I don't know if your discovery is based on a non-full vacuum, or on an
>>> older server.
>>
>> I can only state that merely doing a "vacuum full" or "vacuum full $tables"
>> sequentially did not free the space, whereas the sequential  reindex $table,
>> each followed immediately by a vacuum full $table) did.
> With what version?
[root@alpha ~]# rpm -qi postgresql91-server
Name        : postgresql91-server          Relocations: (not relocatable)
Version     : 9.1.5                             Vendor: (none)
Release     : 3PGDG.rhel6                   Build Date: Sun 02 Sep 2012
12:13:18 PM UTC
Install Date: Wed 12 Sep 2012 03:04:41 AM UTC      Build Host:
koji-sl6-x86-64-pg91
Group       : Applications/Databases        Source RPM:
postgresql91-9.1.5-3PGDG.rhel6.src.rpm
Size        : 15191132                         License: PostgreSQL
Signature   : DSA/SHA1, Sun 02 Sep 2012 12:13:24 PM UTC, Key ID
1f16d2e1442df0f8
URL         : http://www.postgresql.org/

>
>
>> If you'd like I can
>> easily recreate the scenario by simply not "cleaning up" one of the DB
>> servers until it bloats up and make available (limit distribution) a binary
>> copy of the database (EG: rsync the /var/lib/pgsql/ filesystem late at
>> night) in order to help identify why it didn't work as expected.
> Do you think can make an easily script-able way to re-create the
> resistant bloat?  That would be better than trying to disseminate
> binary files, I think.
It would only be better if it actually created the situation that caused
the space to not be freed. But, until you know the actual cause of a
problem, I've found that it's often not productive to create simulations
that may or may not be actually related to the problem.

>
> What I did was just create and drop temp tables in a tight loop, with
> autovacuum off, and then once pg_attribute got good and bloated, did a
> vacuum full as the database owner or superuser.
Based on my understanding, if your loop included an intermittent schema
change from within a transaction it might better approximate my actual
scenario. Merely creating temp tables and then dropping them would
create lots of activity "at the end" of the table which would free
correctly. This still does not explain why reindex $table works when
reindex is supposedly implicit in the vacuum.

>>> If all of your long-lived objects were created before pg_attribute got
>>> bloated and so the bloat was due only to short-lived objects, then
>>> non-full vacuum (if run often enough) should eventually be able to
>>> return that space as the short-lived objects near the end start to go
>>> away.  However, if even a single long-live object finds itself at the
>>> end of the table, then only a vacuum full will ever be able to reclaim
>>> that space.
>>>
>> Since the time period involved (weeks/months) would have included both a
>> large number of created/destroyed temp tables and occasionally altered
>> persistent objects it would appear that the full option a very good idea, at
>> least periodically.
> If you can prevent the extreme bloat from occurring in the first
> place, then the "end" of the table would not be so far away from its
> desired size that it needs to get reset by a vacuum full.
>
> If you find your self in need of a vacuum full, then you should do
> one.  But you should ask yourself what went wrong that you got into
> that situation in the first place.
>
I agree; this is why my questions on enabling autovacuum in a related
thread.


Re: Enabling Autovacuum Postgres 9.1 (was Unexpectedly high disk space usage)

От
Craig Ringer
Дата:
On 11/13/2012 04:04 AM, Lists wrote:

There's a wealth of how to tune PG instruction that's old and (based on this thread alone) often stale enough to be classified as disinformative. For example, nearest I can tell, the entirety of this page is just wrong and/or irrelevant for 9.x and up: http://wiki.postgresql.org/wiki/VACUUM_FULL.
Agreed, that needs fixing. I'll sort that out. That information was important before the VACUUM FULL rewrite, but is now severely outdated. It needs to be split into pre-9.1 and 9.1+ sections.

In constrast to the advice to "avoid Vacuum Full" ( http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT) comes the insight that vacuum full is necessary to clean up stale data that is not at the "end" of the table. (See Jeff Janes 11/10/2012 email)
That part isn't quite right AFAIK.

If you keep table bloat under control, ordinary VACCUM is perfectly sufficient. You only need/want to VACUUM FULL if you wish to truncate a table, reducing its size by compacting rows. In an actively used database that's usually pretty pointless, since new rows will then be added to the end, growing the table. You might as well just keep on re-using the space, rather than going through those compact-and-expand cycles. You'll suffer from less file fragmentation that way and won't be paying the costs of file system allocations.

If you have a table that you've dramatically reduced in size (say, by deleting the vast majority of it) and you won't be adding more rows to replace the old ones, that's when VACUUM FULL makes sense.

It's a bit like those utilities that claim to "free" or "clean" or "de-fragment" memory. They seem good, but they're actually grossly counter-productive, because the system then has to re-read cached data and otherwise fight to get back to its old equilibrium. It's typically the same for Pg: you want to aim for equilibrium, not free space that'll just promptly get re-allocated.

If you do have a bad bloat problem, I'd set a non-default FILLFACTOR before doing a VACUUM FULL, so you still have some free space within the table after vacuum completes. That way you won't be immediately paying the cost of allocating space for new rows as soon as any UPDATEs or INSERTs come in.


non-full vacuum can
only free space from the "end" of the table.
This would imply that a full analyze is a good idea, at least periodically (weekly/monthly/quarterly) in a database that combines the use of temp tables and periodic changes to persistent objects. Does autovacuum ever do a "full" analyze"? What about autovacuum and the reindex question at the end of this email?
I'm not sure there's any such thing as a full analyze. "VACUUM FULL ANALYZE" is "Do a VACUUM FULL and an ANALYZE", not "Do a full analyze".

Autovacuum should be taking care of analyze and table statistics. If it isn't, adjust autovacuum parameters so that it does.

These two can be killers.  Long running transactions can cause
autovacuum processes to stall out or be autocancelled.
"Long running transactions" - is now long? In our system it's rare to have a transaction (even a prepared transaction) last much longer than a few minutes. Is that enough time to cause problems with AutoVacuum?

Not generally, no.

PostgreSQL can't clean up rows that are still visible to a transaction. So if your transactions are three minutes long, that's a three minute delay before VACUUM can clean up DELETEd rows or dead rows left by UPDATEs. Not a biggie even on a pretty high load DB.

You should generally be concerned only when transactions are open over "user think time" or are abandoned by buggy applications - cases where the transaction length is many minutes or hours, potentially unbounded. Uncommitted prepared transactions are also a problem for similar reasons.

Lastly, there's the question of reindexing before full vacuum. I've observed that not doing a manual reindex prior to vacuum full did not, in fact, free up the space, even though I've been told that reindex is implicit in the vacuum process. (?!)

VACUUM FULL, post-9.1, should take care of index bloat.

Pre-9.1 VACUUM FULL could make index bloat worse.

Ordinary VACUUM will not truncate indexes AFAIK, only mark free space within them so it can be re-used. Same deal as with the table its self: this is usually what you want.


-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Enabling Autovacuum Postgres 9.1 (was Unexpectedly high disk space usage)

От
Craig Ringer
Дата:
On 11/13/2012 10:29 AM, Craig Ringer wrote:
On 11/13/2012 04:04 AM, Lists wrote:

There's a wealth of how to tune PG instruction that's old and (based on this thread alone) often stale enough to be classified as disinformative. For example, nearest I can tell, the entirety of this page is just wrong and/or irrelevant for 9.x and up: http://wiki.postgresql.org/wiki/VACUUM_FULL.
Agreed, that needs fixing. I'll sort that out. That information was important before the VACUUM FULL rewrite, but is now severely outdated. It needs to be split into pre-9.1 and 9.1+ sections.
Correction - pre-9.0 and 9.0+

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Enabling Autovacuum Postgres 9.1 (was Unexpectedly high disk space usage)

От
Jeff Janes
Дата:
On Mon, Nov 12, 2012 at 12:04 PM, Lists <lists@benjamindsmith.com> wrote:
>
>
> Should I increase the max_workers field from the default of 3 to (perhaps)
> 10?

I would not.  You report that the reason you turned off autovac is
because it made your database crawl when it kicked in.  That suggests
that if anything you should reduce that parameter (assuming you are
still doing manual vacuums at off-peak hours, leaving autovacuum to
only mop up what is left).

>
> Why would I want to reduce the cost delay to 0, and how does this relate to
> cost_limit? Careful reading of the docs:
> http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html makes
> me believe that, given my substantial I/O subsystem, I'd want to drop
> cost_delay to near zero and set the cost_limit really high, which is a rough
> restatement of the last quoted paragraph above. (I think)

Given that autovac kicking in destroys your performance, I think that
your I/O subsystem may not be all that you think it is.  Do you have
test/dev/QA system with the same subsystem that you can use for
investigation?  If so, do you have a vaguely realistic load generator
to drive those systems?

> Assuming that I make these suggestions and notice a subsequent system load
> problem, what information should I be gathering in order to provide better
> post-incident forensics?

If you are going to be focusing your undivided attention on monitoring
the system during the period, just keeping a window open with "top"
running is invaluable. (On most implementations, if you hit 'c' it
will toggle the command display so you can see the results of
"update_process_title=on")

Also, "sar" is useful, and on most systems has the advantage that its
stats are always being gathered without you having to do anything, so
it works well for unexpected problems arising.  I often just have
"vmstat 1 -t" running in the background streaming into a log file, for
the same reason.

For internal to pgsql, set log_min_duration_statement to a value which
few statements will exceed under normal operations, but many will when
things bog down.  That way you can figure out exactly when things
bogged down after unattended operation, to correlate it with the
sar/vmstat/etc reports.

I'd also set for the probationary period (if you haven't already):

log_lock_waits = on
log_checkpoints = on
log_autovacuum_min_duration = 0 (or some smallish positive value)

The last one only logs when it finishes vacuuming a table.  I wish
there was a way to make it log when it started as well, but I don't
think there is.

Cheers,

Jeff