Обсуждение: vacuum taking an unusually long time

Поиск
Список
Период
Сортировка

vacuum taking an unusually long time

От
"Mason Hale"
Дата:
Vacuum operations on several tables are taking much longer than they previously were.
We currently have 3 autovacuum processes that have been running more than 3 days each.

The tables are large (between 40 and 90GB each).

Postgresql version is 8.3.1
maintenance_work_mem is 512MB  (on a 32GB server).

Any ideas what would make vacuum take so long?
What can I do to speed things up?

thanks,
- Mason

Re: vacuum taking an unusually long time

От
"Scott Marlowe"
Дата:
On Mon, Jul 14, 2008 at 3:08 PM, Mason Hale <masonhale@gmail.com> wrote:
> Vacuum operations on several tables are taking much longer than they
> previously were.
> We currently have 3 autovacuum processes that have been running more than 3
> days each.
> The tables are large (between 40 and 90GB each).
> Postgresql version is 8.3.1
> maintenance_work_mem is 512MB  (on a 32GB server).
> Any ideas what would make vacuum take so long?
> What can I do to speed things up?

Have you adjusted your vacuum / autovacuum cost parameters up?  that
will certainly slow down vacuums.

Re: vacuum taking an unusually long time

От
"Mason Hale"
Дата:
None of these values have changed recently.

The values are:

vacuum_cost_delay = 10ms
vacuum_cost_limit = 200

Are there any other values I should be looking at?

The longest running vacuum has been running more than 6 days at this point.

Thanks,
Mason

On Mon, Jul 14, 2008 at 4:39 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Mon, Jul 14, 2008 at 3:08 PM, Mason Hale <masonhale@gmail.com> wrote:
> Vacuum operations on several tables are taking much longer than they
> previously were.
> We currently have 3 autovacuum processes that have been running more than 3
> days each.
> The tables are large (between 40 and 90GB each).
> Postgresql version is 8.3.1
> maintenance_work_mem is 512MB  (on a 32GB server).
> Any ideas what would make vacuum take so long?
> What can I do to speed things up?

Have you adjusted your vacuum / autovacuum cost parameters up?  that
will certainly slow down vacuums.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: vacuum taking an unusually long time

От
Tom Lane
Дата:
"Mason Hale" <masonhale@gmail.com> writes:
> The longest running vacuum has been running more than 6 days at this point.

Is it actually *doing* anything, or is it just blocked waiting for
someone else?  strace or local equivalent would be the most definitive
way to check.

            regards, tom lane

Re: vacuum taking an unusually long time

От
"Mason Hale"
Дата:
Here's some of the strace output:

select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)
semop(9895945, 0x7fff1321db70, 1)       = 0
read(72, "\233\7\0\0H\207f2\1\0\1\0`\0\0 \0 \4 \0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "!\5\0\0\370\277\371\247\1\0\1\0\214\0\330\23\0 \4 \0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "}\4\0\0h]\333\217\1\0\1\0X\0\310\v\0 \4 \0\0\0\0\0\0\0\0X\235H\5"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "}\4\0\0(\260\333\217\1\0\1\0\\\0\200\v\0 \4 \0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "}\4\0\0p\305\333\217\1\0\1\0\\\0p\v\0 \4 \0\0\0\0\370\235\20\4\0\0\0\0"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "!\5\0\0\10\330\371\247\1\0\1\0P\0\240\10\0 \4 \0\0\0\0\0\0\0\0\360\234 \6"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "\233\7\0\0\340\212f2\1\0\1\0t\0(\35\0 \4 \0\0\0\0\0\0\0\0(\235\250\5"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "\24\10\0\0\230\333\372\207\1\0\1\0`\0\330\n\0 \4 \0\0\0\0X\220@\4\7\0\1\0"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
semop(9830407, 0x7fff1321db90, 1)       = 0
read(72, "\362\n\0\0H\316,r\1\0\1\0\204\0P\r\0 \4 _\17L\21x\235\20\5\5\0\1\0"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "\177\5\0\0\300\317A\276\1\0\1\0p\0\210\1\0 \4 \0\0\0\0\26\0\1\0\30\235\310\5"..., 8192) = 8192

All those "select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)" commands sure look suspicious.

Any thoughts?

Mason

On Tue, Jul 15, 2008 at 10:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Mason Hale" <masonhale@gmail.com> writes:
> The longest running vacuum has been running more than 6 days at this point.

Is it actually *doing* anything, or is it just blocked waiting for
someone else?  strace or local equivalent would be the most definitive
way to check.

                       regards, tom lane

Re: vacuum taking an unusually long time

От
"Mason Hale"
Дата:


On Wed, Jul 16, 2008 at 10:26 AM, Mason Hale <masonhale@gmail.com> wrote:
Here's some of the strace output:

select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)

If I read the 'select(2)' man page correctly, it appears this process is waiting indefinitely for a NULL file descriptor.
That looks pretty stuck to me.

Should I kill this autovacuum process via pg_cancel_backend? 
Or is there a better way to "unstick" it?

Any ideas what may have caused it to get into this condition?

- Mason 

Re: vacuum taking an unusually long time

От
Steve Atkins
Дата:
On Jul 16, 2008, at 4:40 PM, Mason Hale wrote:

>
>
> On Wed, Jul 16, 2008 at 10:26 AM, Mason Hale <masonhale@gmail.com>
> wrote:
> Here's some of the strace output:
>
> select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)
>
> If I read the 'select(2)' man page correctly, it appears this
> process is waiting indefinitely for a NULL file descriptor.
> That looks pretty stuck to me.

No, it's just sleeping. For 10 milliseconds or so.

>
> Should I kill this autovacuum process via pg_cancel_backend?
> Or is there a better way to "unstick" it?
>
> Any ideas what may have caused it to get into this condition?
>

It looks like "read a page, sleep for 80 milliseconds, repeat".

I'd look at your settings for autovacuum_vacuum_cost_limit /
autovacuum_vacuum_cost_delay, and maybe the contents of pg_autovacuum.

Cheers,
   Steve


Re: vacuum taking an unusually long time

От
"Mason Hale"
Дата:

On Wed, Jul 16, 2008 at 10:26 AM, Mason Hale <masonhale@gmail.com> wrote:
Here's some of the strace output:

select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)

If I read the 'select(2)' man page correctly, it appears this process is waiting indefinitely for a NULL file descriptor.
That looks pretty stuck to me.

No, it's just sleeping. For 10 milliseconds or so.


Ah, thanks.

 
Should I kill this autovacuum process via pg_cancel_backend?
Or is there a better way to "unstick" it?

Any ideas what may have caused it to get into this condition?


It looks like "read a page, sleep for 80 milliseconds, repeat".

I'd look at your settings for autovacuum_vacuum_cost_limit / autovacuum_vacuum_cost_delay, and maybe the contents of pg_autovacuum.

autovacuum_vacuum_cost_limit = -1
autovacuum_vacuum_cost_delay = 20ms

'select * from pg_autovacuum;' returns zero rows

I killed the one autovacuum process already that I thought was hung (via pg_cancel_backend).
I can see one other autovacuum process running. The other vacuum process was started manually.
Should I be concerned that pg_autovacuum is empty when I have an autovacuum process running?

Mason

 

Cheers,
 Steve


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: vacuum taking an unusually long time

От
Tom Lane
Дата:
"Mason Hale" <masonhale@gmail.com> writes:
>> It looks like "read a page, sleep for 80 milliseconds, repeat".

That's what it looks like to me too.

>> I'd look at your settings for autovacuum_vacuum_cost_limit /
>> autovacuum_vacuum_cost_delay, and maybe the contents of pg_autovacuum.

> autovacuum_vacuum_cost_limit = -1
> autovacuum_vacuum_cost_delay = 20ms

That process is *clearly* not using those vacuum cost parameters ---
it's evidently using a delay of 80ms and some completely over-aggressive
cost settings that're making it sleep for each single page read.

So you need to find out where those whacked-out values are coming from.
pg_autovacuum might be a likely source.  Or maybe you just forgot a
SIGHUP after a recent change to postgresql.conf?

            regards, tom lane