Re: Puzzling full database lock

Поиск
Список
Период
Сортировка
От Christopher Opena
Тема Re: Puzzling full database lock
Дата
Msg-id CAFOrgqf8ipN-bRNURhOWSm33v12mTdzbR0w0z6g626nRjcPJcw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Puzzling full database lock  (Merlin Moncure <mmoncure@gmail.com>)
Ответы Re: Puzzling full database lock  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Puzzling full database lock  (Merlin Moncure <mmoncure@gmail.com>)
Список pgsql-general
Merlin, thanks for the response.  My comments below, but firstly, does anyone know if autovacuum is affected by setting a statement_timeout?  There was a long thread here from 2007'ish:


But it's unclear to me which way that ended up going.  We're thinking of setting statement_timeout to something fairy high (that will catch these queries we've been seeing) so that we can further troubleshoot over time.  We're worried, however, that autovacuum might be affected.

Random thoughts/suggestions:
*) Look for some correlation between non-idle process count and
locking situation.  You are running a lot of processes and if I was in
your shoes I would be strongly looking at pgbouncer to handle
connection pooling.  You could be binding in the database or (worse)
the kernel

We're definitely looking at our options with pgbouncer right now; issue being that we'd have to have a bouncer per database, and our architecture right now calls for "many databases, many connections" so we're trying to limit that by having our application pooler limit the amount of active connections one can have to the application itself (and thereby to the database, by proxy).  This is still an option, however, so we're doing some research here.
 
*) Try logging checkpoints to see if there is any correlation with your locks.

We've been logging checkpoints for several days now with no hard correlation that we can find.  Thanks for the suggestion though!
 
*) An strace of both the 'high cpu' process and one of the blocked
process might give some clues -- in particular if you are being
blocked on a system call

We have yet to try this; definitely next in line.
 
*) Given enough time, do your high cpu queries ever complete? Are they
writing or reading?

The queries are reading in this case; we haven't allowed them to run their course because of the effect it has on our entire user base.  Right now we've patched our application to catch these cases and handle them outright by notifying the end user that there is a potentially damaging query that is being cancelled.  Short term solution, but for now it's something we need to do until we can replicate and solve the problem on a non-production system.
 
*) What happens to overall system load if you lower shared_buffers to, say, 4gb?

We're going to be trying this as well once we have an appropriate maintenance window.  It seems to be a general consensus that this is something we should at least try.

Thanks,
-Chris.

В списке pgsql-general по дате отправления:

Предыдущее
От: John R Pierce
Дата:
Сообщение: Re: restart server on Lion
Следующее
От: Gary Chambers
Дата:
Сообщение: Warning: you don't own a lock of type ExclusiveLock