Обсуждение: obsessive-compulsive vacuum behavior

Поиск
Список
Период
Сортировка

obsessive-compulsive vacuum behavior

От
Ben Chobot
Дата:
I've got an 8.4.2 database where it appears that vacuum keeps redoing the same table and indexes, never thinking it's
finished:

auditor=# VACUUM analyze VERBOSE repair_queue ;
INFO:  vacuuming "public.repair_queue"
INFO:  scanned index "repair_queue_pkey" to remove 2795932 row versions
DETAIL:  CPU 14.98s/15.29u sec elapsed 312.50 sec.
INFO:  scanned index "repair_queue_auditor" to remove 2795932 row versions
DETAIL:  CPU 0.74s/0.50u sec elapsed 10.49 sec.
INFO:  scanned index "repair_queue_sort" to remove 2795932 row versions
DETAIL:  CPU 2.99s/1.58u sec elapsed 45.14 sec.
INFO:  scanned index "repair_queue_sort3" to remove 2795932 row versions
DETAIL:  CPU 0.89s/0.48u sec elapsed 10.99 sec.
INFO:  "repair_queue": removed 2795932 row versions in 43199 pages
DETAIL:  CPU 1.04s/0.39u sec elapsed 17.93 sec.
INFO:  scanned index "repair_queue_pkey" to remove 2795938 row versions
DETAIL:  CPU 14.71s/15.06u sec elapsed 362.37 sec.
INFO:  scanned index "repair_queue_auditor" to remove 2795938 row versions
DETAIL:  CPU 0.62s/0.45u sec elapsed 14.36 sec.
INFO:  scanned index "repair_queue_sort" to remove 2795938 row versions
DETAIL:  CPU 2.97s/1.65u sec elapsed 56.94 sec.
INFO:  scanned index "repair_queue_sort3" to remove 2795938 row versions
DETAIL:  CPU 0.82s/0.44u sec elapsed 10.54 sec.
INFO:  "repair_queue": removed 2795938 row versions in 41055 pages
DETAIL:  CPU 0.75s/0.34u sec elapsed 7.59 sec.
INFO:  scanned index "repair_queue_pkey" to remove 2795959 row versions
DETAIL:  CPU 14.20s/14.56u sec elapsed 539.17 sec.
INFO:  scanned index "repair_queue_auditor" to remove 2795959 row versions
DETAIL:  CPU 0.75s/0.48u sec elapsed 13.76 sec.
INFO:  scanned index "repair_queue_sort" to remove 2795959 row versions
DETAIL:  CPU 3.07s/1.65u sec elapsed 44.29 sec.
INFO:  scanned index "repair_queue_sort3" to remove 2795959 row versions
DETAIL:  CPU 0.78s/0.44u sec elapsed 12.52 sec.
INFO:  "repair_queue": removed 2795959 row versions in 41004 pages
DETAIL:  CPU 0.88s/0.42u sec elapsed 12.49 sec.


...and so on. It's been running for an hour or so now, when it appears it shouldn't take 10 minutes. This seems pretty
weirdto me.... has anybody else seen this behavior? I'm not even sure what details I could report which would help
figureout what's going on. 

Re: obsessive-compulsive vacuum behavior

От
Scott Marlowe
Дата:
On Sun, Mar 7, 2010 at 11:58 PM, Ben Chobot <bench@silentmedia.com> wrote:
> I've got an 8.4.2 database where it appears that vacuum keeps redoing the same table and indexes, never thinking it's
finished:
>
> auditor=# VACUUM analyze VERBOSE repair_queue ;
> INFO:  vacuuming "public.repair_queue"
> INFO:  scanned index "repair_queue_pkey" to remove 2795932 row versions
> DETAIL:  CPU 14.98s/15.29u sec elapsed 312.50 sec.
> INFO:  scanned index "repair_queue_auditor" to remove 2795932 row versions
> DETAIL:  CPU 0.74s/0.50u sec elapsed 10.49 sec.
> INFO:  scanned index "repair_queue_sort" to remove 2795932 row versions
> DETAIL:  CPU 2.99s/1.58u sec elapsed 45.14 sec.
> INFO:  scanned index "repair_queue_sort3" to remove 2795932 row versions
> DETAIL:  CPU 0.89s/0.48u sec elapsed 10.99 sec.
> INFO:  "repair_queue": removed 2795932 row versions in 43199 pages
> DETAIL:  CPU 1.04s/0.39u sec elapsed 17.93 sec.
> INFO:  scanned index "repair_queue_pkey" to remove 2795938 row versions
> DETAIL:  CPU 14.71s/15.06u sec elapsed 362.37 sec.
> INFO:  scanned index "repair_queue_auditor" to remove 2795938 row versions
> DETAIL:  CPU 0.62s/0.45u sec elapsed 14.36 sec.
> INFO:  scanned index "repair_queue_sort" to remove 2795938 row versions
> DETAIL:  CPU 2.97s/1.65u sec elapsed 56.94 sec.
> INFO:  scanned index "repair_queue_sort3" to remove 2795938 row versions
> DETAIL:  CPU 0.82s/0.44u sec elapsed 10.54 sec.
> INFO:  "repair_queue": removed 2795938 row versions in 41055 pages
> DETAIL:  CPU 0.75s/0.34u sec elapsed 7.59 sec.
> INFO:  scanned index "repair_queue_pkey" to remove 2795959 row versions
> DETAIL:  CPU 14.20s/14.56u sec elapsed 539.17 sec.
> INFO:  scanned index "repair_queue_auditor" to remove 2795959 row versions
> DETAIL:  CPU 0.75s/0.48u sec elapsed 13.76 sec.
> INFO:  scanned index "repair_queue_sort" to remove 2795959 row versions
> DETAIL:  CPU 3.07s/1.65u sec elapsed 44.29 sec.
> INFO:  scanned index "repair_queue_sort3" to remove  row versions
> DETAIL:  CPU 0.78s/0.44u sec elapsed 12.52 sec.
> INFO:  "repair_queue": removed 2795959 row versions in 41004 pages
> DETAIL:  CPU 0.88s/0.42u sec elapsed 12.49 sec.
>
>
> ...and so on. It's been running for an hour or so now, when it appears it shouldn't take 10 minutes. This seems
prettyweird to me.... has anybody else seen this behavior? I'm not even sure what details I could report which would
helpfigure out what's going on. 

Those are all different relations, and it's reclaiming a good number
of rows and pages.   41004 pages is ~320 Megs.  Even if the rows are
small it's gonna be 100 Megs or so per index reclaimed.  Seems like
vacuum is doing its job.  Is it running often enough to prevent
bloating?

Re: obsessive-compulsive vacuum behavior

От
Scott Marlowe
Дата:
On Mon, Mar 8, 2010 at 12:26 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Sun, Mar 7, 2010 at 11:58 PM, Ben Chobot <bench@silentmedia.com> wrote:
>> I've got an 8.4.2 database where it appears that vacuum keeps redoing the same table and indexes, never thinking
it'sfinished: 
>>
>> auditor=# VACUUM analyze VERBOSE repair_queue ;
>> INFO:  vacuuming "public.repair_queue"
>> INFO:  scanned index "repair_queue_pkey" to remove 2795932 row versions
>> DETAIL:  CPU 14.98s/15.29u sec elapsed 312.50 sec.
>> INFO:  scanned index "repair_queue_auditor" to remove 2795932 row versions
>> DETAIL:  CPU 0.74s/0.50u sec elapsed 10.49 sec.
>> INFO:  scanned index "repair_queue_sort" to remove 2795932 row versions
>> DETAIL:  CPU 2.99s/1.58u sec elapsed 45.14 sec.
>> INFO:  scanned index "repair_queue_sort3" to remove 2795932 row versions
>> DETAIL:  CPU 0.89s/0.48u sec elapsed 10.99 sec.
>> INFO:  "repair_queue": removed 2795932 row versions in 43199 pages
>> DETAIL:  CPU 1.04s/0.39u sec elapsed 17.93 sec.
>> INFO:  scanned index "repair_queue_pkey" to remove 2795938 row versions
>> DETAIL:  CPU 14.71s/15.06u sec elapsed 362.37 sec.
>> INFO:  scanned index "repair_queue_auditor" to remove 2795938 row versions
>> DETAIL:  CPU 0.62s/0.45u sec elapsed 14.36 sec.
>> INFO:  scanned index "repair_queue_sort" to remove 2795938 row versions
>> DETAIL:  CPU 2.97s/1.65u sec elapsed 56.94 sec.
>> INFO:  scanned index "repair_queue_sort3" to remove 2795938 row versions
>> DETAIL:  CPU 0.82s/0.44u sec elapsed 10.54 sec.
>> INFO:  "repair_queue": removed 2795938 row versions in 41055 pages
>> DETAIL:  CPU 0.75s/0.34u sec elapsed 7.59 sec.
>> INFO:  scanned index "repair_queue_pkey" to remove 2795959 row versions
>> DETAIL:  CPU 14.20s/14.56u sec elapsed 539.17 sec.
>> INFO:  scanned index "repair_queue_auditor" to remove 2795959 row versions
>> DETAIL:  CPU 0.75s/0.48u sec elapsed 13.76 sec.
>> INFO:  scanned index "repair_queue_sort" to remove 2795959 row versions
>> DETAIL:  CPU 3.07s/1.65u sec elapsed 44.29 sec.
>> INFO:  scanned index "repair_queue_sort3" to remove  row versions
>> DETAIL:  CPU 0.78s/0.44u sec elapsed 12.52 sec.
>> INFO:  "repair_queue": removed 2795959 row versions in 41004 pages
>> DETAIL:  CPU 0.88s/0.42u sec elapsed 12.49 sec.
>>
>>
>> ...and so on. It's been running for an hour or so now, when it appears it shouldn't take 10 minutes. This seems
prettyweird to me.... has anybody else seen this behavior? I'm not even sure what details I could report which would
helpfigure out what's going on. 
>
> Those are all different relations, and it's reclaiming a good number
> of rows and pages.   41004 pages is ~320 Megs.  Even if the rows are
> small it's gonna be 100 Megs or so per index reclaimed.  Seems like
> vacuum is doing its job.  Is it running often enough to prevent
> bloating?

OK, they're not ALL different relations, but only one seems to repeat
much and that's the _sort one.

Re: obsessive-compulsive vacuum behavior

От
Tom Lane
Дата:
Ben Chobot <bench@silentmedia.com> writes:
> ...and so on. It's been running for an hour or so now, when it appears it shouldn't take 10 minutes. This seems
prettyweird to me.... has anybody else seen this behavior? I'm not even sure what details I could report which would
helpfigure out what's going on. 

You're apparently trying to vacuum a really large table with a whole lot
of dead rows using a tiny maintenance_work_mem setting (16MB, looks
like).  Every time the list of dead tuples fills up, it has to perform a
pass over the table and indexes to get rid of those tuples.  Then it
goes back to scanning for more dead tuples.

If it's still going, I'd suggest cancelling it and starting again with a
more realistic maintenance_work_mem, maybe something in the several
hundred MB range, depending on how much memory is free in your server.

After it's done you probably ought to look into your autovacuum
settings.  You shouldn't have gotten into a state with so many dead
tuples in the first place, I would think, if autovac were functioning
properly.

            regards, tom lane

Re: obsessive-compulsive vacuum behavior

От
Ben Chobot
Дата:
On Mar 8, 2010, at 7:16 AM, Tom Lane wrote:

> After it's done you probably ought to look into your autovacuum
> settings.  You shouldn't have gotten into a state with so many dead
> tuples in the first place, I would think, if autovac were functioning
> properly.

Yeah, it turns out our autovacuums kept getting canceled because this table gets a lot of churn. Now that we've turn
offthe cost delay, things are much happier. Thanks!