Обсуждение: pgsql: Get rid of the dedicated latch for signaling the startup process
Get rid of the dedicated latch for signaling the startup process. This commit gets rid of the dedicated latch for signaling the startup process in favor of using its procLatch, since that comports better with possible generic signal handlers using that latch. Commit 1e53fe0e70 changed background processes so that they use standard SIGHUP handler. Like that, this commit also makes the startup process use standard SIGHUP handler to simplify the code. Author: Fujii Masao Reviewed-by: Bharath Rupireddy, Michael Paquier Discussion: https://postgr.es/m/CALj2ACXPorUqePswDtOeM_s82v9RW32E1fYmOPZ5NuE+TWKj_A@mail.gmail.com Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/ac22929a2613e122708bd0172508ac863c51c1cc Modified Files -------------- src/backend/access/transam/xlog.c | 29 ++++++++++++++--------------- src/backend/postmaster/startup.c | 24 +++++------------------- 2 files changed, 19 insertions(+), 34 deletions(-)
Re: pgsql: Get rid of the dedicated latch for signaling the startup process
От
Heikki Linnakangas
Дата:
On 04/11/2020 09:44, Fujii Masao wrote: > Get rid of the dedicated latch for signaling the startup process. > > This commit gets rid of the dedicated latch for signaling the startup > process in favor of using its procLatch, since that comports better > with possible generic signal handlers using that latch. > > Commit 1e53fe0e70 changed background processes so that they use standard > SIGHUP handler. Like that, this commit also makes the startup process use > standard SIGHUP handler to simplify the code. This seems to have made buildfarm member 'elver' to segfault. I've got a hunch that setting recoveryWakeupLatch to NULL, when WakeupRecovery() doesn't check for NULL, is not OK. It's surprising that we're only seeing this on 'elver', though. - Heikki
On 2020/11/04 19:27, Heikki Linnakangas wrote: > On 04/11/2020 09:44, Fujii Masao wrote: >> Get rid of the dedicated latch for signaling the startup process. >> >> This commit gets rid of the dedicated latch for signaling the startup >> process in favor of using its procLatch, since that comports better >> with possible generic signal handlers using that latch. >> >> Commit 1e53fe0e70 changed background processes so that they use standard >> SIGHUP handler. Like that, this commit also makes the startup process use >> standard SIGHUP handler to simplify the code. > > This seems to have made buildfarm member 'elver' to segfault. I've got a hunch that setting recoveryWakeupLatch to NULL,when WakeupRecovery() doesn't check for NULL, is not OK. It's surprising that we're only seeing this on 'elver', though. Thanks for the report! The latch is reset to NULL after ShutdownWalRcv(). So I thought that there are no processes settingthe latch (i.e., walreceiver has already exited) after it's reset to NULL. But this is not true. There seem a windowbetween ShutdownWalRcv() and the actual exit of walreceiver. If a signal is sent during that window, the segmentationfault would happen. This would be the reason why segv happened in some platforms, but not in others. I'm thinking to remove the following code to fix this issue. Thought? /* * We don't need the latch anymore. It's not strictly necessary to reset * it to NULL, but let's do it for the sake of tidiness. */ if (ArchiveRecoveryRequested) XLogCtl->recoveryWakeupLatch = NULL; Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/11/04 20:20, Fujii Masao wrote: > > > On 2020/11/04 19:27, Heikki Linnakangas wrote: >> On 04/11/2020 09:44, Fujii Masao wrote: >>> Get rid of the dedicated latch for signaling the startup process. >>> >>> This commit gets rid of the dedicated latch for signaling the startup >>> process in favor of using its procLatch, since that comports better >>> with possible generic signal handlers using that latch. >>> >>> Commit 1e53fe0e70 changed background processes so that they use standard >>> SIGHUP handler. Like that, this commit also makes the startup process use >>> standard SIGHUP handler to simplify the code. >> >> This seems to have made buildfarm member 'elver' to segfault. I've got a hunch that setting recoveryWakeupLatch to NULL,when WakeupRecovery() doesn't check for NULL, is not OK. It's surprising that we're only seeing this on 'elver', though. > > Thanks for the report! The latch is reset to NULL after ShutdownWalRcv(). So I thought that there are no processes settingthe latch (i.e., walreceiver has already exited) after it's reset to NULL. But this is not true. There seem a windowbetween ShutdownWalRcv() and the actual exit of walreceiver. If a signal is sent during that window, the segmentationfault would happen. This would be the reason why segv happened in some platforms, but not in others. > > I'm thinking to remove the following code to fix this issue. Thought? > > /* > * We don't need the latch anymore. It's not strictly necessary to reset > * it to NULL, but let's do it for the sake of tidiness. > */ > if (ArchiveRecoveryRequested) > XLogCtl->recoveryWakeupLatch = NULL; Or ISTM that WakeupRecovery() should set the latch only when the latch has not been reset to NULL yet. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: pgsql: Get rid of the dedicated latch for signaling the startup process
От
Heikki Linnakangas
Дата:
On 04/11/2020 14:03, Fujii Masao wrote: >> I'm thinking to remove the following code to fix this issue. Thought? >> >> /* >> * We don't need the latch anymore. It's not strictly necessary to reset >> * it to NULL, but let's do it for the sake of tidiness. >> */ >> if (ArchiveRecoveryRequested) >> XLogCtl->recoveryWakeupLatch = NULL; That should work, but it seems a bit sloppy to leave it pointing to a latch that belongs to a random process though. > Or ISTM that WakeupRecovery() should set the latch only when the latch > has not been reset to NULL yet. Got to be careful with race conditions, if the latch is set to NULL at the same time that WakeupRecovery() is called. - Heikki
Re: pgsql: Get rid of the dedicated latch for signaling the startup process
От
Heikki Linnakangas
Дата:
On 04/11/2020 15:17, Heikki Linnakangas wrote: > On 04/11/2020 14:03, Fujii Masao wrote: >> Or ISTM that WakeupRecovery() should set the latch only when the latch >> has not been reset to NULL yet. > > Got to be careful with race conditions, if the latch is set to NULL at > the same time that WakeupRecovery() is called. I don't think commit 113d3591b8 got this quite right: > void > WakeupRecovery(void) > { > if (XLogCtl->recoveryWakeupLatch) > SetLatch(XLogCtl->recoveryWakeupLatch); > } If XLogCtl->recoveryWakeupLatch is set to NULL between the if and the SetLatch, you'll still get a segfault. That's highly unlikely to happen in practice because the compiler will optimize that into a single load instruction, but could happen with -O0. I think you'd need to do the access only once, using a volatile pointer, to make that safe. Maybe it's simpler to just not reset it to NULL, after all. - Heikki
On 2020/11/05 5:36, Heikki Linnakangas wrote: > On 04/11/2020 15:17, Heikki Linnakangas wrote: >> On 04/11/2020 14:03, Fujii Masao wrote: >>> Or ISTM that WakeupRecovery() should set the latch only when the latch >>> has not been reset to NULL yet. >> >> Got to be careful with race conditions, if the latch is set to NULL at >> the same time that WakeupRecovery() is called. > > I don't think commit 113d3591b8 got this quite right: > >> void >> WakeupRecovery(void) >> { >> if (XLogCtl->recoveryWakeupLatch) >> SetLatch(XLogCtl->recoveryWakeupLatch); >> } > > If XLogCtl->recoveryWakeupLatch is set to NULL between the if and the SetLatch, you'll still get a segfault. That's highlyunlikely to happen in practice because the compiler will optimize that into a single load instruction, but could happenwith -O0. I think you'd need to do the access only once, using a volatile pointer, to make that safe. Maybe it's simplerto just not reset it to NULL, after all. Yes, you're right. I agree it's simpler to remove the code resetting the latch to NULL. Also as the comment for that code explains, basically it's not necessary to reset it to NULL. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/11/05 6:02, Fujii Masao wrote: > > > On 2020/11/05 5:36, Heikki Linnakangas wrote: >> On 04/11/2020 15:17, Heikki Linnakangas wrote: >>> On 04/11/2020 14:03, Fujii Masao wrote: >>>> Or ISTM that WakeupRecovery() should set the latch only when the latch >>>> has not been reset to NULL yet. >>> >>> Got to be careful with race conditions, if the latch is set to NULL at >>> the same time that WakeupRecovery() is called. >> >> I don't think commit 113d3591b8 got this quite right: >> >>> void >>> WakeupRecovery(void) >>> { >>> if (XLogCtl->recoveryWakeupLatch) >>> SetLatch(XLogCtl->recoveryWakeupLatch); >>> } >> >> If XLogCtl->recoveryWakeupLatch is set to NULL between the if and the SetLatch, you'll still get a segfault. That's highlyunlikely to happen in practice because the compiler will optimize that into a single load instruction, but could happenwith -O0. I think you'd need to do the access only once, using a volatile pointer, to make that safe. On second thought, since fetching the latch pointer might not be atomic, maybe we need to use spinlock like WalRcvForceReply() does. But using spinlock in every calls of WakeupRecovery() might cause performance overhead. So I'm thinking to use spinlock only when it's necessary, i.e., when the latch may be reset to NULL concurrently. Attached is the POC patch implementing that. Thought? > Maybe it's simpler to just not reset it to NULL, after all. > > Yes, you're right. I agree it's simpler to remove the code resetting > the latch to NULL. Also as the comment for that code explains, > basically it's not necessary to reset it to NULL. On second thought, your comment in upthread "it seems a bit sloppy to leave it pointing to a latch that belongs to a random process though." seems right. So I'm inclined to avoid this approach, for now. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Вложения
On 2020/11/05 23:32, Fujii Masao wrote: > > > On 2020/11/05 6:02, Fujii Masao wrote: >> >> >> On 2020/11/05 5:36, Heikki Linnakangas wrote: >>> On 04/11/2020 15:17, Heikki Linnakangas wrote: >>>> On 04/11/2020 14:03, Fujii Masao wrote: >>>>> Or ISTM that WakeupRecovery() should set the latch only when the latch >>>>> has not been reset to NULL yet. >>>> >>>> Got to be careful with race conditions, if the latch is set to NULL at >>>> the same time that WakeupRecovery() is called. >>> >>> I don't think commit 113d3591b8 got this quite right: >>> >>>> void >>>> WakeupRecovery(void) >>>> { >>>> if (XLogCtl->recoveryWakeupLatch) >>>> SetLatch(XLogCtl->recoveryWakeupLatch); >>>> } >>> >>> If XLogCtl->recoveryWakeupLatch is set to NULL between the if and the SetLatch, you'll still get a segfault. That's highlyunlikely to happen in practice because the compiler will optimize that into a single load instruction, but could happenwith -O0. I think you'd need to do the access only once, using a volatile pointer, to make that safe. > > On second thought, since fetching the latch pointer might not be atomic, > maybe we need to use spinlock like WalRcvForceReply() does. But using > spinlock in every calls of WakeupRecovery() might cause performance > overhead. So I'm thinking to use spinlock only when it's necessary, i.e., > when the latch may be reset to NULL concurrently. Attached is the POC > patch implementing that. Thought? Previously I added this patch to next CommitFest. But I reverted the commit ac22929a26 and 113d3591b8 because those changes have other issue. So this patch is no longer necessary, and I dropped it from next CommitFest. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION