Обсуждение: troubleshooting deadlocks

Поиск
Список
Период
Сортировка

troubleshooting deadlocks

От
"Ed L."
Дата:
I'm wrestling with tracking down a deadlock.  Here's a 7.4.6 deadlock
message:

ERROR:  deadlock detected
DETAIL:  Process 15655 waits for ShareLock on transaction 9381; blocked by
process 15600.
Process 15600 waits for ShareLock on transaction 9388; blocked by process
15655.

I know the original statement is printed right after this, but with complex
triggers doing lots of write queries, I'm finding it difficult to identify
which subsequent query in the trigger is really the one immediately
preceding the deadlock.  It would be helpful in debugging if the error
message included info on which tables are involved, maybe even the
deadlocking query itself, in the "DETAIL" output for future releases.
Maybe something like:

DETAIL:  Process 15655 waits on transaction 9381 for ShareLock on
public.this_table for statement:  UPDATE public.this_table SET foo = 1;
blocked by process 15600.
Process 15600 waits on transaction 9388 for ShareLock on public.that_table
for statement: UPDATE public.that_table SET bar = 1; blocked by process
15655.

Maybe there is a simpler way to troubleshoot this that I'm overlooking?

ps - Here's a query I use to show locks, maybe others might find it useful
(or find bugs in it):

SELECT dbu.usename as locker, l.mode as locktype,
pg_stat_get_backend_pid(S.backendid) as pid,
db.datname||'.'||n.nspname||'.'||r.relname as relation, l.mode,
substring(pg_stat_get_backend_activity(S.backendid), 0, 30) as query
FROM pg_user dbu,
     (SELECT pg_stat_get_backend_idset() AS backendid) AS S,
     pg_database db, pg_locks l, pg_class r, pg_namespace n
WHERE db.oid = pg_stat_get_backend_dbid(S.backendid)
  AND dbu.usesysid = pg_stat_get_backend_userid(S.backendid)
  AND l.pid = pg_stat_get_backend_pid(S.backendid)
  AND l.relation = r.oid
  AND l.database = db.oid
  AND r.relnamespace = n.oid
  AND l.granted
ORDER BY db.datname, n.nspname, r.relname, l.mode;


Re: troubleshooting deadlocks

От
Tom Lane
Дата:
"Ed L." <pgsql@bluepolka.net> writes:
> I know the original statement is printed right after this, but with complex
> triggers doing lots of write queries, I'm finding it difficult to identify
> which subsequent query in the trigger is really the one immediately
> preceding the deadlock.  It would be helpful in debugging if the error
> message included info on which tables are involved, maybe even the
> deadlocking query itself, in the "DETAIL" output for future releases.

I suppose the problem here has to do with conflicting SELECT FOR UPDATEs
from foreign-key references.  This does actually work nicely in CVS tip,
for non-deferred FKs:

regression=# select boo(2);
ERROR:  insert or update on table "zork" violates foreign key constraint "zork_f1_fkey"
DETAIL:  Key (f1)=(2) is not present in table "zork2".
CONTEXT:  SQL statement "insert into zork values( $1 )"
PL/pgSQL function "boo" line 2 at SQL statement
regression=#

For deferred FKs you're pretty much out of luck, because the original
statement is long gone by the time the FK trigger fires (and no, it's
not reasonable to keep a copy around).  The difficulty in making 7.4
do this is that in 7.4 AFTER triggers never fire within a user function,
so you're always dealing with the "deferred" situation.

            regards, tom lane

Re: troubleshooting deadlocks

От
"Ed L."
Дата:
On Tuesday November 9 2004 10:36, Tom Lane wrote:
> "Ed L." <pgsql@bluepolka.net> writes:
> > I know the original statement is printed right after this, but with
> > complex triggers doing lots of write queries, I'm finding it difficult
> > to identify which subsequent query in the trigger is really the one
> > immediately preceding the deadlock.  It would be helpful in debugging
> > if the error message included info on which tables are involved, maybe
> > even the deadlocking query itself, in the "DETAIL" output for future
> > releases.
>
> I suppose the problem here has to do with conflicting SELECT FOR UPDATEs
> from foreign-key references.

That appears to be the issue.  We upgraded to 7.4.6 (thanks to slony,
production downtime was minimal), then used 7.4.6 debug output to track
relations, then guessed on fkey references, dropped them, and viola, no
more deadlock.

Thanks,
Ed