Обсуждение: Re: [HACKERS] Function to kill backend
-----Original Message----- From: pgsql-patches-owner@postgresql.org on behalf of Magnus Hagander Sent: Sun 7/25/2004 12:07 PM To: Tom Lane; Bruce Momjian Cc: Josh Berkus; PostgreSQL-patches Subject: Re: [PATCHES] [HACKERS] Function to kill backend > >much further. I recall being voted down though ... > That's not quite the argument I think I had :-) But withuot being able > to kill the backends, there just no way for me to handle the sitaution > when I have a hundred clients eating up all available connections and/or > memory, just sitting idle, because of some freak bug in a client. The first time I used it was for precisely this reason - some buggy PHP code opened hundreds of connections to a dev serverwhich then remained open doing nothing except wasting resources. It was particularly useful in that case as I didn'thave access to the web server at the time. Shortly afterwards I added support to pgAdmin's server status tool which has proven quite handy (although I will admit, mainlyfor canceling ather than terminating). I don't know the details of how it works, but is it any worse/better than 'kill -9' (which iirc is no longer considered anabsolute no-no)? Regards, Dave
> The first time I used it was for precisely this reason - some buggy PHP code opened hundreds of connections to a dev serverwhich then remained open doing nothing except wasting resources. It was particularly useful in that case as I didn'thave access to the web server at the time. > > Shortly afterwards I added support to pgAdmin's server status tool which has proven quite handy (although I will admit,mainly for canceling ather than terminating). Yeah, I've added the kill and cancel commands to phppgadmin. I'm happy if kill is removed though, i don't want my newbie users panicing their machines. phpmyadmin has both kill and cancel since they're sql commands in mysql. Chris
"Dave Page" <dpage@vale-housing.co.uk> writes: > I don't know the details of how it works, but is it any worse/better > than 'kill -9' (which iirc is no longer considered an absolute no-no)? What I've been trying to remind people of is that killing just a single backend with SIGTERM is not the normal code path and can't be considered well-tested. We know it works to shut down an entire cluster with simultaneous SIGTERMs. However, in that situation the only correctness requirement is that the final database state on disk be consistent. We don't really *know* what state is being left behind in the shared memory segment, because shmem just gets thrown away. It could be that sometimes some locks don't get released, or in other ways a SIGTERM'd backend fails to clean up after itself fully. In comparison, the query-cancel code path is nearly indistinguishable from any ordinary elog(ERROR). We can also have confidence that kill -9 on an individual backend is not going to screw things terribly, because that simulates a backend crash, and the recovery path for that has been (ahem) tested pretty frequently over the years. Note also that in the kill -9 case, again only the final database state on disk matters, not the condition of shared memory. Another way to look at this is that elog(FATAL) in general is not a well tested code path, because it just hardly ever happens in the field. The only elog(FATAL)s that get exercised with any regularity are the ones that reject a connection request during authentication, and those all occur *before* the backend has become a full-fledged backend and acquired any resources it might need to release. The only elog(FATAL) calls in an up-and-running backend are for "can't happen" conditions, and by and large indeed those don't happen. So what it comes down to is that we can put this feature out there if we choose, but we'd be fooling ourselves to think we can consider it reliable. Moreover, since the kinds of cases where you'd use a session kill don't arise every day, I don't think we could say we'd acquire any confidence in it over time either. It'd always remain a little-used corner of the code, and little-used corners tend to gather bit rot. If you don't mind plastering a "use at your own risk" sign on it, then go for it. regards, tom lane
Tom Lane wrote: > > If you don't mind plastering a "use at your own risk" sign on it, then > go for it. killing a backend is obviously much more "at your own risk" than a descent function. Taken from your mail, I understand that a killed backend might leave some loose ends, eg. open locks, which would degrade the cluster's performance. Still, it should not corrupt the shared mem, just leave it as if the backend's still alive and sleeping, right? You'd kill a backend only if your complete cluster is suffering from it, and you hope to keep it running by just shooting that process. If the cluster still has that uncleaned locks or so, you're unlucky and need to shutdown the cluster. Maybe we should supply a restricted version of pg_terminate_backend that's callable from admin interfaces only so we can make sure that the user was warned what he's doing before the termination is executed, something like that: ticket := select pg_admin_ticket(); /* calculate well-known stuff on ticket and issue before it times out */ select pg_terminate_backend(ticket_hash); Regards, Andreas
Andreas Pflug <pgadmin@pse-consulting.de> writes: > Taken from your mail, I understand that a killed backend might leave > some loose ends, eg. open locks, which would degrade the cluster's > performance. Still, it should not corrupt the shared mem, just leave it > as if the backend's still alive and sleeping, right? Well, I was citing that as an example of the sort of trouble that is foreseeable; I don't say either that it would happen, or that it's the only bad thing that could happen. But having backends block on locks that will never be released sure seems like something that would look like database corruption to the average DBA. If you want to put in the function and document that it may cause problems, I won't object; it's not like we don't have other features that are poorly implemented :-(. But my vote would be to remove it. regards, tom lane
> If you want to put in the function and document that it may cause > problems, I won't object; it's not like we don't have other features > that are poorly implemented :-(. But my vote would be to remove it. I'm down with removing it - people don't read documentation :/ Chris
Andreas Pflug wrote: > Tom Lane wrote: > >> >> If you don't mind plastering a "use at your own risk" sign on it, then >> go for it. > > > killing a backend is obviously much more "at your own risk" than a > descent function. > [...] What about implementing "kill" as "cancel then exit"? Does that guarantee a safe exit in all cases? It wouldn't catch *all* the cases where you want to kill a backend, just the ones where the backend is in a cancellable state, but it seems to me that the main usecase is killing an otherwise idle backend that the client doesn't want to let go of. And if the backend isn't cancellable for an extended period, you probably have other problems anyway. -O
Oliver Jowett <oliver@opencloud.com> writes: > What about implementing "kill" as "cancel then exit"? Does that > guarantee a safe exit in all cases? That was exactly what Bruce's patch turned it into. That would be workable if we separated this case from the existing elog(FATAL) behavior, but doing it that way is quite unsafe for real FATAL errors, and I do not think we want SIGTERM response to behave that way either. (When init SIGTERMs us, we do *not* want to lollygag around, we want to get the heck out of there so we can write a shutdown checkpoint before we get SIGKILL'd.) So what you'd basically need is a separate signal to trigger that sort of exit, which would be easy ... if we had any spare signal numbers. regards, tom lane
Tom Lane wrote: > So what you'd basically need is a separate signal to trigger that sort > of exit, which would be easy ... if we had any spare signal numbers. What about multiplexing it onto an existing signal? e.g. set a shared-mem flag saying "exit after cancel" then send SIGINT? I guess this is getting away from the original patch though.. -O
Oliver Jowett <oliver@opencloud.com> writes: > Tom Lane wrote: >> So what you'd basically need is a separate signal to trigger that sort >> of exit, which would be easy ... if we had any spare signal numbers. > What about multiplexing it onto an existing signal? e.g. set a > shared-mem flag saying "exit after cancel" then send SIGINT? Possible, but then the *only* way to get the behavior is by using the backend function --- you couldn't use dear old kill(1) to do it manually. It'd be better if it mapped to a signal. regards, tom lane
Tom Lane wrote: > Oliver Jowett <oliver@opencloud.com> writes: > > Tom Lane wrote: > >> So what you'd basically need is a separate signal to trigger that sort > >> of exit, which would be easy ... if we had any spare signal numbers. > > > What about multiplexing it onto an existing signal? e.g. set a > > shared-mem flag saying "exit after cancel" then send SIGINT? > > Possible, but then the *only* way to get the behavior is by using the > backend function --- you couldn't use dear old kill(1) to do it > manually. It'd be better if it mapped to a signal. And what happens if a FATAL comes while it is procesing a signal meant for termination? It wouldn't exit fast enough --- bad. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073