Обсуждение: Review: support for multiplexing SIGUSR1
Hi, I'm reviewing this patch: http://archives.postgresql.org/message-id/3f0b79eb0907022341m1d36a841x19c3e2a5a6906b5b@mail.gmail.com This one applies almost cleanly, except for a minor hunk in elog.c and postinit.c Compiles and pass regression tests (i tried both steps in a debian lenny amd turion x2 64bits and in a windows xp sp2) about the patch itself: Tom objects to a previous patch for this here: http://archives.postgresql.org/message-id/14969.1228835521@sss.pgh.pa.us This new patch doesn't use PGPROC struct anymore, instead it uses a ProcSignalSlot struct defined as: typedef struct { pid_t pss_pid; sig_atomic_t pss_signalFlags[NUM_PROCSIGNALS]; } ProcSignalSlot; which, AFAIU, seems to be in sync with Tom's advice here: http://archives.postgresql.org/pgsql-hackers/2008-12/msg00556.php something that make me nervous is this: /* * Note: Since there's no locking, it's possible that the target * process detaches from shared memory and exits right after this * test, before we set the flag and send signal.And the signal slot * might even be recycled by a new process, so it's remotely possible * that we seta flag for a wrong process. That's OK, all the signals * are such that no harm is done if they're mistakenly fired. */ can we signal a wrong process and still "be fine"? besides, seems like SendProcSignal is still attached to SIGUSR1 only, is this fine? the rest of the patch (or better ways of testing it) is beyond my knowledge... i think a reviewer should take a look on it, specially Tom because he rejected the other one... -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
On Thu, Jul 16, 2009 at 2:57 AM, Jaime Casanova<jcasanov@systemguards.com.ec> wrote: > Hi, > > I'm reviewing this patch: > http://archives.postgresql.org/message-id/3f0b79eb0907022341m1d36a841x19c3e2a5a6906b5b@mail.gmail.com Another thing that took my attention, i don't think this is safe (it assumes only one auxiliary process of any type, don't know if we have various of the same kind but...): + /* + * Assign backend ID to auxiliary processes like backends, in order to + * allow multiplexing signal to auxiliary processes. Since backends use + * ID in the range from 1 to MaxBackends (inclusive), we assign + * auxiliary processes with MaxBackends + AuxProcType + 1 as an unique ID. + */ + MyBackendId = MaxBackends + auxType + 1; + MyProc->backendId = MyBackendId; -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Hi Jaime, On Fri, Jul 17, 2009 at 6:31 AM, Jaime Casanova<jcasanov@systemguards.com.ec> wrote: > I'm reviewing this patch: > http://archives.postgresql.org/message-id/3f0b79eb0907022341m1d36a841x19c3e2a5a6906b5b@mail.gmail.com Thanks for reviewing the patch! > something that make me nervous is this: > /* > * Note: Since there's no locking, it's possible that the target > * process detaches from shared memory and exits right after this > * test, before we set the flag and send signal. And the signal slot > * might even be recycled by a new process, so it's remotely possible > * that we set a flag for a wrong process. That's OK, all the signals > * are such that no harm is done if they're mistakenly fired. > */ > > can we signal a wrong process and still "be fine"? Umm... the old flag might be set to a new process wongly as follows. 1. The target process exits right after SendProcSignal passed that test. 2. A new process is assigned to the same ProcSignalSlot, and reset it. 3. SendProcSignal sets the old flag to the slot. I think that locking is required here to get around this problem. How about introducing a new slock variable "slock_t pss_lck" into ProcSignalSlot strust, which protects pss_pid and pss_signalFlags? SendProcSignal and ProcSignalInit should use the slock. > besides, seems like SendProcSignal is still attached to SIGUSR1 only, > is this fine? Yes, I think that multiplexing of one signal is enough. Why do you think that SendProcSignal should be attached to also the other signals? > Another thing that took my attention, i don't think this is safe (it > assumes only one auxiliary process of any type, don't know if we have > various of the same kind but...): > > + /* > + * Assign backend ID to auxiliary processes like backends, in order to > + * allow multiplexing signal to auxiliary processes. Since backends use > + * ID in the range from 1 to MaxBackends (inclusive), we assign > + * auxiliary processes with MaxBackends + AuxProcType + 1 as > an unique ID. > + */ > + MyBackendId = MaxBackends + auxType + 1; > + MyProc->backendId = MyBackendId; That assumption is OK for now, but might be unacceptable in the near future. So, I'll use the index of AuxiliaryProcs instead of auxType, which is assigned in InitAuxiliaryProcess. Is this OK? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, On Fri, Jul 17, 2009 at 5:41 PM, Fujii Masao<masao.fujii@gmail.com> wrote: >> I'm reviewing this patch: >> http://archives.postgresql.org/message-id/3f0b79eb0907022341m1d36a841x19c3e2a5a6906b5b@mail.gmail.com I updated the patch to solve two problems which you pointed. Here is the changes: * Prevented the obsolete flag to being set to a new process, by using newly-introduced spinlock. * Used the index of AuxiliaryProcs instead of auxType, to assign backend ID to an auxiliary process. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Вложения
On Fri, Jul 17, 2009 at 8:58 AM, Fujii Masao<masao.fujii@gmail.com> wrote: > Hi, > > On Fri, Jul 17, 2009 at 5:41 PM, Fujii Masao<masao.fujii@gmail.com> wrote: >>> I'm reviewing this patch: >>> http://archives.postgresql.org/message-id/3f0b79eb0907022341m1d36a841x19c3e2a5a6906b5b@mail.gmail.com > > I updated the patch to solve two problems which you pointed. > > Here is the changes: > > * Prevented the obsolete flag to being set to a new process, by using > newly-introduced spinlock. > thinking in ways to test the patch i tried this, the test at least try to see if signals are managed correctly: - patch, compile, install, initdb and start the service - open five terminals: on the first: make installcheck on the second: pg_dumpall -p port_to_an_existing_med_size_test_installation | psql on third: psql -f /home/postgres/a_something_small_database.sqlon fourth: explain analyze with q as (select * from generate_series(1, 1000000) select * from q a, q b, q c, q d, q e, q f; on fifth: select procpid from pg_start_activity; and pg_cancel_backend(randomly_choosen_pid); when cancelling backends i got in a situation where i kill the explain analyze in fourth session, execute again the pg_cancel_backend for the same session and if i try to re-execute the same explain analyze it got cancelled immediatly (seems like something don't get cleaned appropiately). once you get in this situation you can repeat that everytime you want; bad enough, i wasn't able to repeat this on a new instalation and of course i can't swear this is your patch fault... -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
On Fri, Jul 17, 2009 at 1:44 PM, Jaime Casanova<jcasanov@systemguards.com.ec> wrote: > > i wasn't able to repeat this on a new instalation and of > course i can't swear this is your patch fault... > this is not your patch fault but an existing bug, i repeat that behaviour in an unpatched source tree... with the steps in the previous mail i canceled 2 or 3 backend before cancell the one with the explain analyze, i execute a second time that pg_cancel_backend and if i try to re-run the query it gets cancelled immediately, next time it runs normally... pg_stat_activity reports that query as running no mather what... -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Fujii Masao <masao.fujii@gmail.com> writes: > I updated the patch to solve two problems which you pointed. > Here is the changes: > * Prevented the obsolete flag to being set to a new process, by using > newly-introduced spinlock. > * Used the index of AuxiliaryProcs instead of auxType, to assign > backend ID to an auxiliary process. Neither of these changes seem like a good idea to me. The use of a spinlock renders the mechanism unsafe for use from the postmaster --- we do not wish the postmaster to risk getting stuck if the contents of shared memory have become corrupted, eg, there is a spinlock that looks taken. And you've completely mangled the concept of BackendId. MyBackendId is by definition the same as the index of the process's entry in the sinval ProcState array. This means that (1) storing it in the ProcState entry is silly, and (2) assigning values that don't correspond to an actual ProcState entry is dangerous. If we want to be able to signal processes that don't have a ProcState entry, it would be better to assign an independent index instead of overloading BackendId like this. I'm not sure though whether we care about being able to signal such processes. It's certainly not necessary for catchup interrupts, but it might be for future applications of the mechanism. Do we have a clear idea of what the future applications are? As for the locking issue, I'm inclined to just specify that uses of the mechanism must be such that receiving a signal that wasn't meant for you isn't fatal. In the case of catchup interrupts the worst that can happen is you do a little bit of useless work. Are there any intended uses where it would be seriously bad to get a misdirected signal? I agree with Jaime that the patch would be more credible if it covered more than one signal usage at the start --- these questions make it clear that the design can't happen in a vacuum without intended usages in mind. regards, tom lane
Hi, Thanks for reviewing the patch! On Mon, Jul 27, 2009 at 2:43 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Neither of these changes seem like a good idea to me. The use of a > spinlock renders the mechanism unsafe for use from the postmaster --- > we do not wish the postmaster to risk getting stuck if the contents of > shared memory have become corrupted, eg, there is a spinlock that looks > taken. And you've completely mangled the concept of BackendId. > MyBackendId is by definition the same as the index of the process's > entry in the sinval ProcState array. This means that (1) storing it in > the ProcState entry is silly, and (2) assigning values that don't > correspond to an actual ProcState entry is dangerous. > > If we want to be able to signal processes that don't have a ProcState > entry, it would be better to assign an independent index instead of > overloading BackendId like this. OK, I'll change the mechanism to assign a ProcSignalSlot to a process, independently from ProcState, but similarly to ProcState: a process gets a ProcSignalSlot from newly-introduced freelist at startup, and returns it to freelist at exit. Since this assignment and return require the lock, I'll introduce a new LWLock for them. > I'm not sure though whether we care > about being able to signal such processes. It's certainly not necessary > for catchup interrupts, but it might be for future applications of the > mechanism. Do we have a clear idea of what the future applications are? Yes, I'm planning to use this mechanism for communication between a process which can generate the WAL records and a WAL sender process, in Synch Rep. Since not only a backend but also some auxiliary processes (e.g., bgwriter) can generate the WAL records, processes which don't have a ProcState need to receive the multiplexed signal, too. > As for the locking issue, I'm inclined to just specify that uses of the > mechanism must be such that receiving a signal that wasn't meant for you > isn't fatal. This makes sense. I'll get rid of a spinlock from the patch and add the usage note as above. > In the case of catchup interrupts the worst that can > happen is you do a little bit of useless work. Are there any intended > uses where it would be seriously bad to get a misdirected signal? In, at least, currently-intended use case (i.e., Synch Rep), such wrong signal is not fatal because it's only used as a hint. > I agree with Jaime that the patch would be more credible if it covered > more than one signal usage at the start --- these questions make it > clear that the design can't happen in a vacuum without intended usages > in mind. Umm... the patch should cover a notify interrupt which currently uses SIGUSR2? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao <masao.fujii@gmail.com> writes: > On Mon, Jul 27, 2009 at 2:43 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> If we want to be able to signal processes that don't have a ProcState >> entry, it would be better to assign an independent index instead of >> overloading BackendId like this. > OK, I'll change the mechanism to assign a ProcSignalSlot to a process, > independently from ProcState, but similarly to ProcState: a process gets > a ProcSignalSlot from newly-introduced freelist at startup, and returns it > to freelist at exit. Since this assignment and return require the lock, I'll > introduce a new LWLock for them. I think you're making things more complicated when they should be getting simpler. It strikes me that the current API of "pass the BackendId if known or InvalidBackendId if not" still works for processes without a BackendId, as long as you can tolerate a bit of extra search overhead for them. (You could reduce the search overhead by searching the array back to front.) So a new process index may be overkill. >> Do we have a clear idea of what the future applications are? > Yes, I'm planning to use this mechanism for communication between > a process which can generate the WAL records and a WAL sender process, > in Synch Rep. Since not only a backend but also some auxiliary processes > (e.g., bgwriter) can generate the WAL records, processes which don't have > a ProcState need to receive the multiplexed signal, too. Huh? Wouldn't the bgwriter be *sending* the signal not receiving it? > Umm... the patch should cover a notify interrupt which currently uses > SIGUSR2? Getting rid of the separate SIGUSR2 handler would definitely be a good proof of concept that the mechanism works for more than one use. regards, tom lane
Hi, On Tue, Jul 28, 2009 at 12:09 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > I think you're making things more complicated when they should be > getting simpler. > > It strikes me that the current API of "pass the BackendId if known or > InvalidBackendId if not" still works for processes without a BackendId, > as long as you can tolerate a bit of extra search overhead for them. > (You could reduce the search overhead by searching the array back to > front.) So a new process index may be overkill. Yeah, this is very simple. I'll change the patch according to your suggestion. Such extra search wouldn't be problem. Because an auxiliary process doesn't need a catchup and notify interrupt which are intended uses. Also, because there are few opportunities to send a multiplexed signal to an auxiliary process in Synch Rep. >>> Do we have a clear idea of what the future applications are? > >> Yes, I'm planning to use this mechanism for communication between >> a process which can generate the WAL records and a WAL sender process, >> in Synch Rep. Since not only a backend but also some auxiliary processes >> (e.g., bgwriter) can generate the WAL records, processes which don't have >> a ProcState need to receive the multiplexed signal, too. > > Huh? Wouldn't the bgwriter be *sending* the signal not receiving it? The bgwriter sends and receives a signal to/from the walsender process. The walsender is new introduced process in Synch Rep, and there are some free signals which aren't assigned yet. So, the signal sent from a backend or an auxiliary process to the walsender doesn't need to be multiplexed. On the other hand, the signal sent to a backend which has no free signal must be multiplexed. Since I'd like to treat a backend and an auxiliary process as unified destination of a signal, I'd like to multiplex also a signal sent to an auxiliary process. Of course, though we can change the method of signaling according to the destination (e.g., call a native kill instead of newly-introduced SendProcSignal when the destination is the bgwriter), it seems to be awkward. >> Umm... the patch should cover a notify interrupt which currently uses >> SIGUSR2? > > Getting rid of the separate SIGUSR2 handler would definitely be a good > proof of concept that the mechanism works for more than one use. OK. I'll change the patch as above. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center