Обсуждение: BUG #13970: Vacuum hangs on particular table; cannot be terminated - requires `kill -QUIT pid`
BUG #13970: Vacuum hangs on particular table; cannot be terminated - requires `kill -QUIT pid`
От
brian@pukkasoft.com
Дата:
The following bug has been logged on the website: Bug reference: 13970 Logged by: Brian Ghidinelli Email address: brian@pukkasoft.com PostgreSQL version: 9.4.6 Operating system: Linux (RHEL 5.11) Description: Hi Pg team - I've been running a 9.4.1 server for the last year+. In the past few months I've had a couple of instances of the server locking up. I've done more troubleshooting into this last event and uncovered what appears to be a bug. My situation is much like these: http://comments.gmane.org/gmane.comp.db.postgresql.admin/40587 http://postgresql.nabble.com/VACUUM-hanging-on-PostgreSQL-8-3-1-for-larger-tables-td1898438.html The former claims lightweight locks had a bug up thru 9.4.5 but I'm running the latest 9.4.6 and still experiencing this issue with one table. Here's the scenario: * Either autovacuum OR manual vacuum on a single table hangs * There is no cpu or i/o usage; top -p <pid> shows the vacuum process is sleeping * strace of the process id shows rapidly scrolling screenfuls of `select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)` * Running a query against pg_locks shows the vacuum has been granted a lock but it is not fast path. * There are no other queries running... I can trigger this behavior after a fresh reboot and no other users by issuing a simple vacuum. * I have reindex'd the table as well as dropped all but the primary key in case there were issues with the index - still hung when vacuum was attempted * Interestingly when I check the last autovacuum/autoanalyze report, they are all blank, even though I have autovacuum on It scares me a lot that pg_cancel_backend and pg_terminate_backend don't work. It requires a kill -QUIT to break out this process. What can I investigate to help add more information? Brian
brian@pukkasoft.com wrote: > Hi Pg team - I've been running a 9.4.1 server for the last year+. In the > past few months I've had a couple of instances of the server locking up. > I've done more troubleshooting into this last event and uncovered what > appears to be a bug. My situation is much like these: > > http://comments.gmane.org/gmane.comp.db.postgresql.admin/40587 > http://postgresql.nabble.com/VACUUM-hanging-on-PostgreSQL-8-3-1-for-larger-tables-td1898438.html > > The former claims lightweight locks had a bug up thru 9.4.5 but I'm running > the latest 9.4.6 and still experiencing this issue with one table. Here's > the scenario: > > * Either autovacuum OR manual vacuum on a single table hangs > * There is no cpu or i/o usage; top -p <pid> shows the vacuum process is > sleeping > * strace of the process id shows rapidly scrolling screenfuls of `select(0, > NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)` This smells like it's looping waiting for a multixact to be fully written out ... except that the uninterruptibility part of that was fixed in time for 9.4.0, Author: Alvaro Herrera <alvherre@alvh.no-ip.org> Branch: master Release: REL9_5_BR [51f9ea25d] 2014-11-14 15:14:01 -0300 Branch: REL9_4_STABLE Release: REL9_4_0 [137e4da6d] 2014-11-14 15:14:02 -0300 Branch: REL9_3_STABLE Release: REL9_3_6 [d45e8dc52] 2014-11-14 15:14:02 -0300 http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=137e4da6d Can you attach to the looping process with gdb when it's doing the select() dance, and obtain a backtrace? You need debug symbols installed. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services