Обсуждение: BUG #13518: CancelRequest lacks statement identifier
The following bug has been logged on the website: Bug reference: 13518 Logged by: Niall Ross Email address: niallfr@btinternet.com PostgreSQL version: 9.3.1 Operating system: Mac OSX 10.9.5, Windows 7, Linux RHel 5.6 Description: Postgres lacks a way to cancel a specific identified statement. Sending CancelRequest cancels the current statement running on a connection. But this means the current statement when the cancel reaches the server, _not_ the current statement when the cancel was sent. Thus we are caught between a rock and a hard place when our system has to dismiss an incompletely-processed statement, since the system cannot tell which of the following applies. - If the statement will run on the server for a while yet, we have time to send cancel-on-server and will complete the dismiss quicker by doing so (because we buffer, draining all messages for that statement from the socket, and cancelling on server truncates the flow of messages). - If the statement will soon finish anyway, the asynchronous cancel-on-server can kill the _next_ statement on that connection, not the one our system seeks to dismiss, exactly the opposite of what is wanted. When dismissing a query, the aim is to do so as fast as possible. For a long-running query, sending CancelRequest to the server is exactly what you want, but until Postgres knows which session is meant, we can see no way for an automatic system to do this safely (unless by a post-send delay, defeating the aim, which is performance). If CancelRequest could include a statement name (or other means of identifying a statement) and did nothing if that statement was no longer running by the time the server processed it, the usability of CancelRequest would be significantly enhanced.
"niallfr@btinternet.com" <niallfr@btinternet.com> wrote: > If CancelRequest could include a statement name (or other means > of identifying a statement) and did nothing if that statement was > no longer running by the time the server processed it, the > usability of CancelRequest would be significantly enhanced. Would it be sufficient for your purposes to cancel a *transaction* rather than a *statement*? There is already a virtual transaction ID exposed in pg_locks, and it could probably be added to pg_stat_activity; we could probably create a pg_cancel_transaction() function that took a text representation of that and only canceled the transaction if it was running. There would need to be locking on a heavily contended lock or two to make that happen correctly, but presumably this would not ba a high-volume activity. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > "niallfr@btinternet.com" <niallfr@btinternet.com> wrote: >> If CancelRequest could include a statement name (or other means >> of identifying a statement) and did nothing if that statement was >> no longer running by the time the server processed it, the >> usability of CancelRequest would be significantly enhanced. > Would it be sufficient for your purposes to cancel a *transaction* > rather than a *statement*? There is already a virtual transaction > ID exposed in pg_locks, and it could probably be added to > pg_stat_activity; we could probably create a > pg_cancel_transaction() function that took a text representation of > that and only canceled the transaction if it was running. There > would need to be locking on a heavily contended lock or two to make > that happen correctly, but presumably this would not ba a > high-volume activity. If I'm correctly visualizing what you've got in mind, it seems like that would only narrow the race-condition window not close it entirely. I don't believe there's any guarantee that signals-in-flight are delivered immediately, so the SIGINT might still arrive at the target process after it's finished out the transaction that was meant to be canceled. The bigger picture though is that what you're suggesting would require that a whole new connection be made in order to issue a Cancel. That's kind of an expensive solution ... not that what we're doing now is exactly free, but at least it doesn't fail if you're up against the MaxBackends limit, for instance. regards, tom lane
On 2015-07-28 15:57:26 -0400, Tom Lane wrote: > The bigger picture though is that what you're suggesting would require > that a whole new connection be made in order to issue a Cancel. That's > kind of an expensive solution ... On the other hand that'd allow us to do authentication, ssl, etc for cancel requests. Which would also be kinda nice...
Dear Tom, Kevin et al, > The bigger picture though is that what you're suggesting would require > that a whole new connection be made in order to issue a Cancel. Am I misunderstanding something? My experience with using Postgres, and my reading of the documents, is that a new statement, but not a new connection, is needed to cancel a statement on that connection. The cancel is signalled by a fresh statement using the same connection (and specifically that connection's PID and Key). I do not see why a new connection would enter into it. The point is that in any situation other than explicit user intervention - or a built-in post-send-cancel delay - issuing a cancel seems too risky to do, because the cancel could kill the _next_ statement, not the one intended. The motive for issuing a cancel - at least in my scenario - is to increase the probability that the current statement will end as soon as possible, so the next can run sooner. Any risk of cancelling that next statement that we're keen to run sooner is very counterproductive (as would be making the system wait after issuing a cancel for some guessed time delay after which for sure it will have been processed). Hence the desire to include some identifier of the statement we wish to cancel, so either that intended statement is cancelled or nothing happens. More generally, I'm puzzled what the scenario is for an app driving postgres to use cancel automatically. Why cancel at all except for performance, yet how should a system cancel and move to next statement performantly without risking next-statement cancel? > Would it be sufficient for your purposes to cancel a *transaction* > rather than a *statement*? No, AFAICS; there is no reason why the statement needing cancelling would necessarily be in a transaction, and conversely it could be inside a transaction there was no desire to cancel. This is not an immediate major hold-up for us at the moment, because we've just removed the code that was exploiting cancel, choosing functional certainty over the occasional performance gain of cancelling a long query. However the experience left me puzzled how to use cancel. Thanks for thinking about this Niall Ross >Kevin Grittner <kgrittn@ymail.com> writes: > > >>"niallfr@btinternet.com" <niallfr@btinternet.com> wrote: >> >> >>>If CancelRequest could include a statement name (or other means >>>of identifying a statement) and did nothing if that statement was >>>no longer running by the time the server processed it, the >>>usability of CancelRequest would be significantly enhanced. >>> >>> > > > >>Would it be sufficient for your purposes to cancel a *transaction* >>rather than a *statement*? There is already a virtual transaction >>ID exposed in pg_locks, and it could probably be added to >>pg_stat_activity; we could probably create a >>pg_cancel_transaction() function that took a text representation of >>that and only canceled the transaction if it was running. There >>would need to be locking on a heavily contended lock or two to make >>that happen correctly, but presumably this would not ba a >>high-volume activity. >> >> > >If I'm correctly visualizing what you've got in mind, it seems like >that would only narrow the race-condition window not close it entirely. >I don't believe there's any guarantee that signals-in-flight are delivered >immediately, so the SIGINT might still arrive at the target process after >it's finished out the transaction that was meant to be canceled. > >The bigger picture though is that what you're suggesting would require >that a whole new connection be made in order to issue a Cancel. That's >kind of an expensive solution ... not that what we're doing now is >exactly free, but at least it doesn't fail if you're up against the >MaxBackends limit, for instance. > > regards, tom lane > > > >
Niall Ross <niallfr@btinternet.com> writes: > Am I misunderstanding something? My experience with using Postgres, and > my reading of the documents, is that a new statement, but not a new > connection, is needed to cancel a statement on that connection. The > cancel is signalled by a fresh statement using the same connection (and > specifically that connection's PID and Key). Uh, no, that's not how it works at all. PQcancel() involves opening a fresh connection to the postmaster, which will then send SIGINT to the backend process you're originally connected to. Also, looking again at the PQcancel code, I'm unsure that there is any actual race condition here. The comments therein quoth * Wait for the postmaster to close the connection, which indicates that * it's processed the request. Without this delay, we might issue another * command only to find that our cancel zaps that command instead of the * one we thought we were canceling. Note we don't actually expect this * read to obtain any data, we are just waiting for EOF to be signaled. which means that by the time the PQcancel call returns, the SIGINT has been sent. I did speculate upthread that delivery of that signal might be asynchronous, but that was just speculation. Even if there was some delay, the backend would ignore the signal unless it arrived later than the first byte of the next client request. So it seems a bit hard to credit that you would have a practical problem unless your application were to issue a new command while PQcancel was still executing ... and if it does, I maintain that's an application bug not a protocol problem. regards, tom lane
Dear Tom, thanks for info. >Niall Ross <niallfr@btinternet.com> writes: > > >>Am I misunderstanding something? My experience with using Postgres, and >>my reading of the documents, is that a new statement, but not a new >>connection, is needed to cancel a statement on that connection. The >>cancel is signalled by a fresh statement using the same connection (and >>specifically that connection's PID and Key). >> >> > >Uh, no, that's not how it works at all. PQcancel() involves opening a >fresh connection to the postmaster, which will then send SIGINT to the >backend process you're originally connected to. > > You are right. Our cancelling code is implemented on _our_ system's existing connection instance but it opens a new _socket_ connection to send the cancel request to postgres (we do not reify this temporary socket connection as a separate connection in our system). I apologise for my stupid phrasing - I was thinking in my system's terms, forgetting that you of course were describing it in Postgres' terms. >Also, looking again at the PQcancel code, I'm unsure that there is any >actual race condition here. The comments therein quoth > > * Wait for the postmaster to close the connection, which indicates that > * it's processed the request. Without this delay, we might issue another > * command only to find that our cancel zaps that command instead of the > * one we thought we were canceling. Note we don't actually expect this > * read to obtain any data, we are just waiting for EOF to be signaled. > > That's interesting. Currently, we close the socket at our client end after committing the cancel request, knowing that no message response is expected, and do not attempt to tell when the socket is closed by the server. If the connection's command has completed on the server by the time the cancel request reaches the server, is the behaviour expected that the server would close the socket (EOF) immediately? >which means that by the time the PQcancel call returns, the SIGINT has >been sent. I did speculate upthread that delivery of that signal might >be asynchronous, but that was just speculation. Even if there was some >delay, the backend would ignore the signal unless it arrived later than >the first byte of the next client request. > Should I interpret these remarks to mean that you are qualifying the quoted comment above them, or would that be a misunderstanding? To ask the same question another way, are you saying it is safe to wait till the cancel request's socket is closed by the server, for no less time but also for no longer, before sending another command on the connection's main socket? >So it seems a bit hard to >credit that you would have a practical problem unless your application >were to issue a new command while PQcancel was still executing > That's exactly the problem; in our desire to maximise performance, we want to issue a new command after cancelling a command that may or may not complete before the cancel reaches the server and is processed. - In an ideal world, if the command is still running when the cancel is processed, we want it cancelled, otherwise a no-op. - In a less ideal world, we at least want a response when the cancel request has been processed rather than waiting for a guessed time. IIUC, EOF on the cancel socket is this response, both in the case where the cancel does something and in the case where it arrives after the query completes. >... and if >it does, I maintain that's an application bug not a protocol problem. > We have a multi-threaded system which can use relational database backends, Postgres being one possibility. Threaded presentation of statements to the same connection is not eliminable in certain important cases. One solution is to protect our reification of the connection so it drains its socket of messages for an ongoing query (i.e. buffers them) before sending another. Such a query can be dismissed while buffering in various cases. This is the situation in which we want to end a running query's posting of messages to the socket as soon as possible, and then present a new query to that socket as soon as possible. Fire&forget cancellation would be ideal - _if_ the cancel could only affect the intended query. Waiting for server-closing of another socket is less ideal, but at least it is deterministic. I submitted this as a bug, but now I understand there is a way to know when the server has cancelled, I'd reclassify it as a critique of the cancel request feature. For now, we are managing without it. Hopefully the discussion has been of interest to you in thinking about Postgres evolution. Yours faithfully Niall Ross