Обсуждение: Is FOR UPDATE an optimization fence?
I'm fooling around with pushing FOR UPDATE locking into a new plan node type, and I just noticed a behavior that seems a bit bogus. Historically we have dealt with FOR UPDATE in sub-selects by flattening the sub-select if we could, because the alternative was to fail altogether. For example, consider select * from a join (select * from b for update) ss on a.x = ss.y; The FOR UPDATE effectively got hoisted to the top because that's where we could implement it, making this equivalent to select * from a join b on a.x = b.y for update of b; It seems to me, though, that this is changing the semantics. In the latter case it's clear that we should only lock b rows that have a join partner in a (which indeed is what happens). In the former case, what I think should be expected to happen is that *all* b rows get locked. With FOR UPDATE as a plan node, it's possible to fix this by treating FOR UPDATE in a sub-select as an optimization fence that prevents flattening of the sub-select, much like LIMIT has always done. The FOR UPDATE node will end up at the top of the subplan and it will act as the syntax would suggest. Of course the downside of changing it is that queries that worked fine before might work differently (and much slower) now; first because not flattening the sub-select might lead to a worse plan, and second because locking more rows takes more time. The alternative would be to let it continue to flatten such sub-selects when possible, and to tell anyone who doesn't want that to stick in OFFSET 0 as an optimization fence. It's an entirely trivial code change either way. I'm inclined to think that we should prevent flattening, on the grounds of least astonishment. Comments? regards, tom lane
Hi, Tom Lane wrote: > It's an entirely trivial code change either way. I'm inclined to think > that we should prevent flattening, on the grounds of least astonishment. Yeah, I also tend towards making FOR UPDATE an optimization fence (that's how I understood the non-flattening approach). While it might change behavior compared to previous versions, it doesn't force people into writing kludges like OFFSET 0. BTW: how do other databases deal with this? Anything of relevance in the SQL standards? Regards Markus Wanner
Markus Wanner <markus@bluegap.ch> writes: > BTW: how do other databases deal with this? Anything of relevance in the > SQL standards? SQL99 treats FOR UPDATE as an attribute of DECLARE CURSOR, so there's no way for it to appear in a sub-select per spec. (In general our approach to FOR UPDATE is only loosely related to what the spec thinks it does ...) regards, tom lane
On Sun, Oct 11, 2009 at 12:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm fooling around with pushing FOR UPDATE locking into a new plan node > type, and I just noticed a behavior that seems a bit bogus. > Historically we have dealt with FOR UPDATE in sub-selects by flattening > the sub-select if we could, because the alternative was to fail > altogether. For example, consider > > select * from a join (select * from b for update) ss on a.x = ss.y; > > The FOR UPDATE effectively got hoisted to the top because that's where > we could implement it, making this equivalent to > > select * from a join b on a.x = b.y for update of b; > > It seems to me, though, that this is changing the semantics. In the > latter case it's clear that we should only lock b rows that have a join > partner in a (which indeed is what happens). In the former case, what > I think should be expected to happen is that *all* b rows get locked. > > With FOR UPDATE as a plan node, it's possible to fix this by treating > FOR UPDATE in a sub-select as an optimization fence that prevents > flattening of the sub-select, much like LIMIT has always done. The > FOR UPDATE node will end up at the top of the subplan and it will act > as the syntax would suggest. > > Of course the downside of changing it is that queries that worked fine > before might work differently (and much slower) now; first because not > flattening the sub-select might lead to a worse plan, and second because > locking more rows takes more time. > > The alternative would be to let it continue to flatten such sub-selects > when possible, and to tell anyone who doesn't want that to stick in > OFFSET 0 as an optimization fence. > > It's an entirely trivial code change either way. I'm inclined to think > that we should prevent flattening, on the grounds of least astonishment. It seems like this is somewhat related to the question of embedding an {INSERT|UPDATE|DELETE}...RETURNING in some arbitrary part of a query versus only allowing it in a WITH clause. The argument for only allowing it in a WITH clause is that there is otherwise no guarantee that it is evaluated in its entirety but just once. ISTM we could contrariwise give it the handling you're proposing here: allow it anywhere in the query, but make it act as an optimization fence. For that reason, I think I'd be inclined to make it act as an optimization fence if used as a top-level CTE, but otherwise flatten it, so that the handling is consistent with what we've proposed to do elsewhere. But I'm not really familiar with how these constructs are being treated by the executor, so I might be creating a false parallel here. The other comment I have is that I *expect* subqueries to be pulled up. So my own personal POLA would not be violated by locking only the rows with a join partner; in fact it would be more likely to be violated by the reverse behavior. I might not be typical, though. My experience is that not pulling up subqueries tends to have disastrous effects on performance, so I'm somewhat biased against creating more situations where that will happen. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Oct 11, 2009 at 12:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It's an entirely trivial code change either way. �I'm inclined to think >> that we should prevent flattening, on the grounds of least astonishment. > It seems like this is somewhat related to the question of embedding an > {INSERT|UPDATE|DELETE}...RETURNING in some arbitrary part of a query > versus only allowing it in a WITH clause. The argument for only > allowing it in a WITH clause is that there is otherwise no guarantee > that it is evaluated in its entirety but just once. ISTM we could > contrariwise give it the handling you're proposing here: allow it > anywhere in the query, but make it act as an optimization fence. I don't think this analogy to update queries is a very solid one. SELECT FOR UPDATE has a couple of properties that make it much less critical for it to have evaluate-exactly-once semantics: * Locking the same row twice is a no-op. * If you don't run the query to completion and thus don't lock all the rows, so what? Arguably, the semantics of the query are to lock only the rows actually read/returned. > For that reason, I think I'd be inclined to make it act as an > optimization fence if used as a top-level CTE, but otherwise flatten > it, so that the handling is consistent with what we've proposed to do > elsewhere. Well, the point of the WITH RETURNING restriction is that just because a subselect isn't flattened doesn't mean you have evaluate-exactly-once semantics for the subselect. The upper query is still free to execute the subselect multiple times, partially, or not at all. A CTE, on the other hand, is more than just an optimization fence --- we've decided to give it evaluate-exactly-once semantics. So SELECT FOR UPDATE at the top of a CTE is really irrelevant to this discussion; it's going to be treated the same no matter what. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Oct 11, 2009 at 12:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Of course the downside of changing it is that queries that worked fine >> before might work differently (and much slower) now; first because not >> flattening the sub-select might lead to a worse plan, and second because >> locking more rows takes more time. >> >> The alternative would be to let it continue to flatten such sub-selects >> when possible, and to tell anyone who doesn't want that to stick in >> OFFSET 0 as an optimization fence. >> >> It's an entirely trivial code change either way. �I'm inclined to think >> that we should prevent flattening, on the grounds of least astonishment. > The other comment I have is that I *expect* subqueries to be pulled > up. So my own personal POLA would not be violated by locking only the > rows with a join partner; in fact it would be more likely to be > violated by the reverse behavior. I might not be typical, though. My > experience is that not pulling up subqueries tends to have disastrous > effects on performance, so I'm somewhat biased against creating more > situations where that will happen. On further reflection I've decided to stick with the old behavior on this point, at least for the time being. I'm concerned about subtly altering the behavior of existing queries, and I've also realized that changing it isn't as much of a one-liner as I thought. The current behavior of the parser and rewriter really depends on the assumption that there's not much of a semantic difference between FOR UPDATE markings at different syntactic levels, because they will happily push down a FOR UPDATE *into* a sub-select. That is, select * from a join (select * from b) ss on a.x = ss.y for update; gets transformed into select * from a join (select * from b for update of b) ss on a.x = ss.yfor update of a; There isn't any simple way to avoid that with the current RowMarkClause representation, because it only applies to the current query level. Maybe we should think about changing that sometime, but it seems like material for a different patch. regards, tom lane
On Mon, Oct 12, 2009 at 1:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sun, Oct 11, 2009 at 12:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Of course the downside of changing it is that queries that worked fine >>> before might work differently (and much slower) now; first because not >>> flattening the sub-select might lead to a worse plan, and second because >>> locking more rows takes more time. >>> >>> The alternative would be to let it continue to flatten such sub-selects >>> when possible, and to tell anyone who doesn't want that to stick in >>> OFFSET 0 as an optimization fence. >>> >>> It's an entirely trivial code change either way. I'm inclined to think >>> that we should prevent flattening, on the grounds of least astonishment. > >> The other comment I have is that I *expect* subqueries to be pulled >> up. So my own personal POLA would not be violated by locking only the >> rows with a join partner; in fact it would be more likely to be >> violated by the reverse behavior. I might not be typical, though. My >> experience is that not pulling up subqueries tends to have disastrous >> effects on performance, so I'm somewhat biased against creating more >> situations where that will happen. > > On further reflection I've decided to stick with the old behavior on > this point, at least for the time being. I'm concerned about subtly > altering the behavior of existing queries, and I've also realized that > changing it isn't as much of a one-liner as I thought. The current > behavior of the parser and rewriter really depends on the assumption > that there's not much of a semantic difference between FOR UPDATE > markings at different syntactic levels, because they will happily push > down a FOR UPDATE *into* a sub-select. That is, For the record, I wasn't sure if I agreed with your original point that: select * from a join (select * from b for update) ss on a.x = ss.y; should necessarily be expected to lock all rows from b (does the standard insist on it?). The select inside the join clause describes 'how you get' the records, not that they should be all gotten. Along the same vein, does: create view foo_update as select * from foo for update; necessarily lock all the rows from foo for any query against the view?(It doesn't and IMO shouldn't). ISTM that the particularrows being locked in your first example are not really defined very well. merlin