Обсуждение: BUG #3843: archiver process is restarted after the smart shutdown
The following bug has been logged online: Bug reference: 3843 Logged by: Email address: fujii.masao@oss.ntt.co.jp PostgreSQL version: 8.3beta4 Operating system: RHEL5 Description: archiver process is restarted after the smart shutdown Details: Is this a bug though archiver process is restarted after the smart shutdown? BTW, the archiver process ends after a few minutes. [postgresql.conf] archive_mode = on archive_command = 'cp %p ../arch/%f' $ pg_ctl start ... $ pgrep -fl postgres 22781 /home/postgres/bin/postgres 22783 postgres: writer process 22784 postgres: wal writer process 22785 postgres: autovacuum launcher process 22786 postgres: archiver process 22787 postgres: stats collector process $ pg_ctl stop (*1) ... $ pgrep -fl postgres 23579 postgres: archiver process (*1) It's easy to reproduce waiting between a few seconds for pg_ctl start and stop in this problem.
On Thu, 2007-12-27 at 09:41 +0000, fujii.masao@oss.ntt.co.jp wrote: > The following bug has been logged online: > > Bug reference: 3843 > Logged by: > Email address: fujii.masao@oss.ntt.co.jp > PostgreSQL version: 8.3beta4 > Operating system: RHEL5 > Description: archiver process is restarted after the smart shutdown > Details: > > Is this a bug though archiver process is restarted after the smart > shutdown? > BTW, the archiver process ends after a few minutes. > > > [postgresql.conf] > archive_mode = on > archive_command = 'cp %p ../arch/%f' > > $ pg_ctl start > ... > $ pgrep -fl postgres > 22781 /home/postgres/bin/postgres > 22783 postgres: writer process > 22784 postgres: wal writer process > 22785 postgres: autovacuum launcher process > 22786 postgres: archiver process > 22787 postgres: stats collector process > $ pg_ctl stop (*1) > ... > $ pgrep -fl postgres > 23579 postgres: archiver process > > > (*1) > It's easy to reproduce waiting between a few seconds > for pg_ctl start and stop in this problem. Code says /* * If we have lost the archiver, try to start a new one. We do this * even if we are shutting down, to allow archiver to take care of any * remaining WAL files. */ The previous behaviour was to shut down even when there were WAL files needing to be archived, which was considered an issue by many. I notice that when we re-enter the archiver in this way that we may end up waiting a full minute before we eventually shutdown because of the normal wait in the archive loop. If there are no objections, I will add an extra condition to the wait, so that we wait if while (!(wakened || got_SIGHUP) && PostmasterIsAlive(true)) rather than just while (!(wakened || got_SIGHUP)) This runs getppid() once per second, so shouldn't be an overhead. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs <simon@2ndquadrant.com> writes: > If there are no objections, I will add an extra condition to the wait, That's not the correct fix. The problem is that the archiver is getting killed in the first place. What the postmaster should do instead is waken it after the shutdown process terminates. regards, tom lane
On Thu, 2007-12-27 at 10:10 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > If there are no objections, I will add an extra condition to the wait, > > That's not the correct fix. The problem is that the archiver is getting > killed in the first place. What the postmaster should do instead is > waken it after the shutdown process terminates. Waking up the archiver in a different place is possible, and I'm happy to do it, but my suggested change was looking at a different issue. In 8.3 we added the code to wake up the archiver again even during shutdown. So the archiver wakes back up again during a shutdown and does its thing. So the bug report is not a bug, its just observing that the behaviour has changed in 8.3 The problem I perceived was that the archiver was hanging around too long after shutdown, which I noticed was related to the way the sleep occurs in the archiver's wait loop. So wherever we wake up the archiver, it will still hang around longer than might be convenient in some circumstances. If we reduce that wait, then the OP might not even have observed the archiver had restarted at all. So I'd like to change the wait loop as described, though am happy to consider other changes as well if that's your wish. Yes? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com