Re: autovacuum on primary blocking queries on replica?

Поиск

Список

Период

Сортировка

От	Laurenz Albe
Тема	Re: autovacuum on primary blocking queries on replica?
Дата	29 мая 2022 г. 18:02:28
Msg-id	0ffa96ab1ec3b22bb738b15a93612e69d1e5c11c.camel@cybertec.at обсуждение исходный текст
Ответ на	autovacuum on primary blocking queries on replica? (Don Seiler <don@seiler.us>)
Список	pgsql-general

Дерево обсуждения

On Fri, 2022-05-27 at 14:00 -0500, Don Seiler wrote:
>  * PostgreSQL 12.9 - PGDG Ubuntu 18.04 image
>  * Streaming physical replication
>  * hot_standby_feedback = on
> We use a read replica to offload a lot of (what should be) quick queries. This morning we had an incident
> where these queries were all blocking on AccessShareLock waits, written to the log as:
> 
> 2022-05-27 15:23:53.476 UTC [8185] foo@foo_all - myapp LOG:  process 8185 still waiting for AccessShareLock on
relation16834 of database 16401 after 1000.228 ms at character 204
 
> 2022-05-27 15:23:53.476 UTC [8185] foo@foo_all - myapp DETAIL:  Process holding the lock: 10822. Wait queue: 32373,
8185,13782, [...]
 
> This went on for 30 seconds (the value of max_standby_streaming_delay) until PG killed the blocking process:
> 2022-05-27 15:24:22.474 UTC [10822] foo@foo_all - anotherapp FATAL:  terminating connection due to conflict with
recovery
> 2022-05-27 15:24:22.474 UTC [10822] foo@foo_all - anotherapp DETAIL:  User was holding a relation lock for too long.
> 
> I'm trying to find the root cause of why this started. We did see an UPDATE [...]

An UPDATE cannot be the problem.

> 15 seconds later we then see an aggressive autovacuum on this table:
> 
> 2022-05-27 15:23:52.507 UTC [30513] LOG:  automatic aggressive vacuum of table "foo_all.public.industry": index
scans:1
 
>         pages: 252 removed, 323 remain, 0 skipped due to pins, 0 skipped frozen
>         tuples: 8252 removed, 8252 remain, 0 are dead but not yet removable, oldest xmin: 1670999292
>         buffer usage: 12219 hits, 137 misses, 54 dirtied
>         avg read rate: 2.372 MB/s, avg write rate: 0.935 MB/s
>         system usage: CPU: user: 0.14 s, system: 0.00 s, elapsed: 0.45 s
> 
> and less than a second after that is when we see the first AccessShareLock message on the replica.
> 
> I've been reading tales of autovacuum taking an AccessExclusiveLock when truncating empty pages at the end of a
table.
> I'm imagining that updating every row of a table and then rolling back would leave all of those rows empty at the
end
> and qualify for truncation and lead to the scenario I saw this morning.
> 
> I'm still not entirely satisfied since that table in question was so small (only 8252 rows) so I wouldn't imagine it
would
> hold things up as long as it did. Although the blocking session on the replica was an application session,
> not any background/recovery process.

I think you are on the right trail.

VACUUM will truncate trailing pages if it can get a (very short) ACCESS EXCLUSIVE lock on the table.
Now that lock and the truncation is replicated, and they can create a replication conflict just like
you describe.  Even if the lock is held for a very short time, replaying it will conflict with any
query on that table on the standby.

You can disable autovacuum truncation on the table with

   ALTER TABLE foo_all.public.industry SET (vacuum_truncate = off);

if you know that you can do without autovacuum truncation for that query.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

В списке pgsql-general по дате отправления:

Предыдущее

От: Tom Lane
Дата: 29 мая 2022 г., 17:58:52
Сообщение: Re: JSONB index not in use, but is TOAST the real cause of slow query?

Следующее

От: Shaheed Haque
Дата: 29 мая 2022 г., 19:03:20
Сообщение: Re: JSONB index not in use, but is TOAST the real cause of slow query?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: autovacuum on primary blocking queries on replica?

Предыдущее

Следующее