Обсуждение: auto vacuum

Поиск
Список
Период
Сортировка

auto vacuum

От
Herouth Maoz
Дата:
Hi all.

We had a crisis this week that was resolved by tuning pg_autovacuum for a particular table. The table is supposed to contain a small number of items at any given point in time (typically around 10,000-30,000). The items are inserted when we send out a message, and are selected, then deleted when a reply to the message arrives. This may be done at a rather high rate - sometimes a thousand a minute or around that.

We found out that the table's response depends on the rate of ANALYZE being performed. We have tuned the values in pg_autovacuum so that we have around one analyze per minute.

What is bothering me is that sometimes the auto vacuum daemon decides to perform a vacuum analyze rather than just analyze. If it just does a vacuum independent of the analyze, we don't see much impact on performance. But if it does vacuum analyze, it means that until vacuum is over, it doesn't do another analyze, and this may take about five minutes, in which our performance under load conditions might deteriorate.

Is there any way to cause pg_autovacuum not to choose vacuum analyze?

I thought of changing the vacuum frequency to be rare - but then it might take even longer to vacuum, and if a long vacuum analyze falls on a high load time, although the chances are smaller, the risk is higher. We can't afford a slowdown in that table.

# select * from pg_stat_user_tables where relname = 'transient';
-[ RECORD 1 ]----+------------------------------
relid            | 17866
schemaname       | public
relname          | transient
seq_scan         | 49633
seq_tup_read     | 1388557648
idx_scan         | 9200950
idx_tup_fetch    | 9960245
n_tup_ins        | 6572067
n_tup_upd        | 0
n_tup_del        | 6466085
n_tup_hot_upd    | 0
n_live_tup       | 81060
n_dead_tup       | 10097
last_vacuum      | 2010-04-11 00:29:52.266617+03
last_autovacuum  | 2010-04-14 11:47:43.13062+03
last_analyze     | 2010-04-11 00:29:52.266617+03
last_autoanalyze | 2010-04-14 12:04:39.090055+03

Thank you,
Herouth

Re: auto vacuum

От
Bill Moran
Дата:
In response to Herouth Maoz <herouth@unicell.co.il>:

> Hi all.
>
> We had a crisis this week that was resolved by tuning pg_autovacuum for a particular table. The table is supposed to
containa small number of items at any given point in time (typically around 10,000-30,000). The items are inserted when
wesend out a message, and are selected, then deleted when a reply to the message arrives. This may be done at a rather
highrate - sometimes a thousand a minute or around that. 
>
> We found out that the table's response depends on the rate of ANALYZE being performed. We have tuned the values in
pg_autovacuumso that we have around one analyze per minute. 
>
> What is bothering me is that sometimes the auto vacuum daemon decides to perform a vacuum analyze rather than just
analyze.If it just does a vacuum independent of the analyze, we don't see much impact on performance. But if it does
vacuumanalyze, it means that until vacuum is over, it doesn't do another analyze, and this may take about five minutes,
inwhich our performance under load conditions might deteriorate. 
>
> Is there any way to cause pg_autovacuum not to choose vacuum analyze?
>
> I thought of changing the vacuum frequency to be rare - but then it might take even longer to vacuum, and if a long
vacuumanalyze falls on a high load time, although the chances are smaller, the risk is higher. We can't afford a
slowdownin that table. 

Have you considered putting either vacuum or analyze on a manual schedule
with cron so you can control it better?  You could schedule a manual
analyze every minute and let autovacuum handle vacuuming, but it might
be better to explicitly schedule both of them so you have complete
control over when they run.

I did a couple of quick tests, and it doesn't seem as if a vacuum running
on one connection blocks an analyze running on another connection.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: auto vacuum

От
Alvaro Herrera
Дата:
Herouth Maoz wrote:

> We found out that the table's response depends on the rate of ANALYZE being performed. We have tuned the values in
pg_autovacuumso that we have around one analyze per minute. 
>
> What is bothering me is that sometimes the auto vacuum daemon decides to perform a vacuum analyze rather than just
analyze.If it just does a vacuum independent of the analyze, we don't see much impact on performance. But if it does
vacuumanalyze, it means that until vacuum is over, it doesn't do another analyze, and this may take about five minutes,
inwhich our performance under load conditions might deteriorate. 

Hmm, this is an use case we never thought about.  I don't think there's
any way to coerce autovacuum into doing what you want it to do.  I
suggest you turn analyze off for this table (say by setting a very large
scale factor), and analyze it manually through cron or something.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: auto vacuum

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Herouth Maoz wrote:
>> We found out that the table's response depends on the rate of ANALYZE being performed. We have tuned the values in
pg_autovacuumso that we have around one analyze per minute. 
>>
>> What is bothering me is that sometimes the auto vacuum daemon decides to perform a vacuum analyze rather than just
analyze.If it just does a vacuum independent of the analyze, we don't see much impact on performance. But if it does
vacuumanalyze, it means that until vacuum is over, it doesn't do another analyze, and this may take about five minutes,
inwhich our performance under load conditions might deteriorate. 

> Hmm, this is an use case we never thought about.  I don't think there's
> any way to coerce autovacuum into doing what you want it to do.  I
> suggest you turn analyze off for this table (say by setting a very large
> scale factor), and analyze it manually through cron or something.

Hmm.  Given the churn rate on the table, I'm having a very hard time
believing that you don't need to vacuum it pretty dang often.  Maybe the
direction you need to be moving is to persuade autovac to vacuum it
*more* often, not less often, so that the time needed to finish each
vacuum is small enough.

            regards, tom lane

Re: auto vacuum

От
Herouth Maoz
Дата:
First, I'd like to thank Bill and Alvaro as well as you for your replies.

Quoting Tom Lane:
> Hmm.  Given the churn rate on the table, I'm having a very hard time
> believing that you don't need to vacuum it pretty dang often.  Maybe the
> direction you need to be moving is to persuade autovac to vacuum it
> *more* often, not less often, so that the time needed to finish each
> vacuum is small enough.
>
Other than reclaiming disk space, is there any advantage to vacuum? Is a
vacuumed table more efficient? So far, every time it vacuums - which is
around every 15-20 minutes under load conditions - it slows down
processing. I think perhaps Bill's suggestion of just scheduling the
vacuums myself (e.g. 1-2am, off peak) coupled with cost-based vacuuming
might be a good answer? Unless I'm missing an important point about
vacuuming.

Alvaro and Bill both suggested scheduling analyzes on a minute-by-minute
cron. Would this be no different than automatic analyze? No extra
overhead for connection, perhaps?

Thanks,
Herouth

Re: auto vacuum

От
Bill Moran
Дата:
In response to Herouth Maoz <herouth@unicell.co.il>:

> First, I'd like to thank Bill and Alvaro as well as you for your replies.
>
> Quoting Tom Lane:
> > Hmm.  Given the churn rate on the table, I'm having a very hard time
> > believing that you don't need to vacuum it pretty dang often.  Maybe the
> > direction you need to be moving is to persuade autovac to vacuum it
> > *more* often, not less often, so that the time needed to finish each
> > vacuum is small enough.
> >
> Other than reclaiming disk space, is there any advantage to vacuum? Is a
> vacuumed table more efficient? So far, every time it vacuums - which is
> around every 15-20 minutes under load conditions - it slows down
> processing. I think perhaps Bill's suggestion of just scheduling the
> vacuums myself (e.g. 1-2am, off peak) coupled with cost-based vacuuming
> might be a good answer? Unless I'm missing an important point about
> vacuuming.
>
> Alvaro and Bill both suggested scheduling analyzes on a minute-by-minute
> cron. Would this be no different than automatic analyze? No extra
> overhead for connection, perhaps?

Did I understand the original problem correctly?  I thought you were saying
that _lack_ of analyzing was causing performance issues, and that running
vacuum analyze was taking too long and causing the interval between
analyze runs to be too long.  If that is the case, then I still think
manually scheduling vacuum and analyze to run in separate threads is
the best approach.

If the problem is that overall performance slows too much when vacuum is
running, then you'll probably have to get more/faster hardware.  Vacuum
has to run occasionally or your table will bloat.  Bloated tables perform
lousy and waste a lot of space, and a table that is getting updates and
inserts without vacuuming will grow without bound, even if you delete
records.  It's kind of like the trash bin on many desktop OSes ... when
you DELETE a record from the DB, it goes into the trash bin, when you
run VACUUM, the trash is emptied (Yes, I know that's not _exactly_ how
vacuum works, but I'm just drawing a parallel here)

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: auto vacuum

От
Alvaro Herrera
Дата:
Herouth Maoz wrote:
> First, I'd like to thank Bill and Alvaro as well as you for your replies.
>
> Quoting Tom Lane:
> >Hmm.  Given the churn rate on the table, I'm having a very hard time
> >believing that you don't need to vacuum it pretty dang often.  Maybe the
> >direction you need to be moving is to persuade autovac to vacuum it
> >*more* often, not less often, so that the time needed to finish each
> >vacuum is small enough.
> Other than reclaiming disk space, is there any advantage to vacuum?
> Is a vacuumed table more efficient?

Yeah, it frees up space for new tuples.  Whether it does anything or not
depends on you having long running transactions; if you do, maybe more
frequent vacuuming would just scan the table to no useful purpose.

> So far, every time it vacuums -
> which is around every 15-20 minutes under load conditions - it slows
> down processing. I think perhaps Bill's suggestion of just
> scheduling the vacuums myself (e.g. 1-2am, off peak) coupled with
> cost-based vacuuming might be a good answer? Unless I'm missing an
> important point about vacuuming.

I think you should continue with your current vacuuming strategy, or
perhaps make it more frequent, not less.  To prevent it being a load hog
you can set vacuum_cost_delay to some nonzero value.  Keep in mind that
vacuum_cost_delay affects all vacuuming tasks, including autovacuum,
whereas autovacuum_vacuum_cost_delay affects autovacuum but not manually
invoked vacuum (including the ones you invoke via cron of course)

> Alvaro and Bill both suggested scheduling analyzes on a
> minute-by-minute cron. Would this be no different than automatic
> analyze? No extra overhead for connection, perhaps?

No difference.  Autovacuum also uses a connection; the only difference
is that it does it internally.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: auto vacuum

От
Herouth Maoz
Дата:
ציטוט Bill Moran:

> In response to Herouth Maoz <herouth@unicell.co.il>:
>
>
> Did I understand the original problem correctly?  I thought you were saying
> that _lack_ of analyzing was causing performance issues, and that running
> vacuum analyze was taking too long and causing the interval between
> analyze runs to be too long.  If that is the case, then I still think
> manually scheduling vacuum and analyze to run in separate threads is
> the best approach.
>
Yes, I get the worst impact if autovacuum decides to do a vacuum analyze
rather than a separate vacuum and a separate analyze. However, normal
vacuum does cause a slowdown, though not as much as vacuum analyze.
> If the problem is that overall performance slows too much when vacuum is
> running, then you'll probably have to get more/faster hardware.  Vacuum
> has to run occasionally or your table will bloat.  Bloated tables perform
> lousy and waste a lot of space, and a table that is getting updates and
> inserts without vacuuming will grow without bound, even if you delete
> records.  It's kind of like the trash bin on many desktop OSes ... when
> you DELETE a record from the DB, it goes into the trash bin, when you
> run VACUUM, the trash is emptied (Yes, I know that's not _exactly_ how
> vacuum works, but I'm just drawing a parallel here)
>
I understand. Assuming that I have enough disk space, and I vacuum once
a day instead of every 20 minutes. Does that cause deterioration in
performance?

Thank you,
Herouth

Re: auto vacuum

От
Bill Moran
Дата:
In response to Herouth Maoz <herouth@unicell.co.il>:

> > If the problem is that overall performance slows too much when vacuum is
> > running, then you'll probably have to get more/faster hardware.  Vacuum
> > has to run occasionally or your table will bloat.  Bloated tables perform
> > lousy and waste a lot of space, and a table that is getting updates and
> > inserts without vacuuming will grow without bound, even if you delete
> > records.  It's kind of like the trash bin on many desktop OSes ... when
> > you DELETE a record from the DB, it goes into the trash bin, when you
> > run VACUUM, the trash is emptied (Yes, I know that's not _exactly_ how
> > vacuum works, but I'm just drawing a parallel here)
> >
> I understand. Assuming that I have enough disk space, and I vacuum once
> a day instead of every 20 minutes. Does that cause deterioration in
> performance?

It can, but it depends on the actual circumstances.  As with many things,
there's a threshold.  Exactly where that threshold is depends on your
hardware, the actual amount data changing, the nature of your queries,
etc ... to the point where the only way to be sure is to try and see.

When a table goes a long time between vacuums, it accumulates "dead
tuples".  This is space in the table that doesn't have any usable data
in it.  As the percentage of dead tuples to live ones increases, the
system has to seek through more empty space to find real data, and
performance suffers.  Vacuum cleans up what dead space it can, and
marks the rest as available for reuse so new rows will reuse it instead
of making the table larger.

If there's only a little dead space, the performance hit is minor,
frequently it's too small to worry about.  But as the interval between
vacuums increases, the amount of dead space increases, and the system
can cross a threshold where that dead space presents a significant
performance problem.  If vacuum runs frequently, the amount dead space
says small enough not to present a problem.

Of course, the question is "how often is often enough" and again, the
answer is dependent on a number of usage factors such that it's difficult
to offer any advice other than, "Try some different settings and see
how it turns out".  However, given the information you've provided, I
would suspect that daily is not going to be often enough.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/