Обсуждение: "Value locking" Wiki page
The current approach to "value locking" remains a controversial aspect of my INSERT ... ON CONFLICT UPDATE patch. I must admit that this is a very complicated area, and it's difficult to keep things straight, particularly with the relevant discussion scattered all over the place. In the hope of unblocking things, I have created this Wiki page, which details the advantages and disadvantages of all 3 approaches that have been discussed, as suggested by myself and others: https://wiki.postgresql.org/wiki/Value_locking I actually think that we're only in disagreement on the extent to which the stated advantages and disadvantages are useful or harmful. So I think that at the moment this description is a reasonably balanced handling of the issues. Please add any points that I missed. This is by no means complete yet. There is pretty limited handling of what I call "#3. "Promise" index tuples", because there was no prototype version of that design, and there are many open questions about how it would be implemented relative to the other 2 approaches. Perhaps Andres or Simon would care to improve it. I think I've done a better job of describing #2, Heikki's design - hardly surprising, since there was a prototype that we discussed at length - but I'd appreciate it if other people would make sure that I have everything right there. This is hopefully the beginning of working the issues out. I have of course also described my own proposed design alongside the others. Please edit and add to my words as you see fit. -- Peter Geoghegan
On 2014-10-01 01:44:04 -0700, Peter Geoghegan wrote: > In the hope of unblocking things, I have created this Wiki page, which > details the advantages and disadvantages of all 3 approaches that have > been discussed, as suggested by myself and others: > > https://wiki.postgresql.org/wiki/Value_locking Good. > I actually think that we're only in disagreement on the extent to > which the stated advantages and disadvantages are useful or harmful. > So I think that at the moment this description is a reasonably > balanced handling of the issues. Please add any points that I missed. > > This is by no means complete yet. There is pretty limited handling of > what I call "#3. "Promise" index tuples", because there was no > prototype version of that design, and there are many open questions > about how it would be implemented relative to the other 2 approaches. > Perhaps Andres or Simon would care to improve it. I think I've done a > better job of describing #2, Heikki's design - hardly surprising, > since there was a prototype that we discussed at length - but I'd > appreciate it if other people would make sure that I have everything > right there. This is hopefully the beginning of working the issues > out. I have of course also described my own proposed design alongside > the others. FWIW, Heikki's approach isn't what I'd have choosen, but it's something I can live with. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 10/01/2014 11:49 AM, Andres Freund wrote: > On 2014-10-01 01:44:04 -0700, Peter Geoghegan wrote: >> In the hope of unblocking things, I have created this Wiki page, which >> details the advantages and disadvantages of all 3 approaches that have >> been discussed, as suggested by myself and others: >> >> https://wiki.postgresql.org/wiki/Value_locking Thank! That's a good summary. >> This is by no means complete yet. There is pretty limited handling of >> what I call "#3. "Promise" index tuples", because there was no >> prototype version of that design, and there are many open questions >> about how it would be implemented relative to the other 2 approaches. >> Perhaps Andres or Simon would care to improve it. I think I've done a >> better job of describing #2, Heikki's design - hardly surprising, >> since there was a prototype that we discussed at length - but I'd >> appreciate it if other people would make sure that I have everything >> right there. This is hopefully the beginning of working the issues >> out. I have of course also described my own proposed design alongside >> the others. > > FWIW, Heikki's approach isn't what I'd have choosen, but it's something > I can live with. I didn't realize that "promise index tuples" were even seriously discussed. I guess that can be made to work, too, although I don't see the point. It wouldn't work with GiST indexes, for the same reasons as page-level locking won't work (a tuple can legally be inserted anywhere in a GiST index - it just degrades the index making searching more expensive). And lossy GiST opclasses are a problem too. It's actually more similar to approach #1 in that it puts the responsibility of the locking in the indexam. You would probably need the same kind of API changes to the indexam, and you could well imagine one indexam to implement index promise tuples, while another one might use heavy-weight locks. I don't see the advantage of #3. - Heikki
On 2014-10-01 12:44:25 +0300, Heikki Linnakangas wrote: > I didn't realize that "promise index tuples" were even seriously discussed. > I guess that can be made to work, too, although I don't see the point. It > wouldn't work with GiST indexes, for the same reasons as page-level locking > won't work (a tuple can legally be inserted anywhere in a GiST index - it > just degrades the index making searching more expensive). And lossy GiST > opclasses are a problem too. > It's actually more similar to approach #1 in that it puts the responsibility > of the locking in the indexam. You would probably need the same kind of API > changes to the indexam, and you could well imagine one indexam to implement > index promise tuples, while another one might use heavy-weight locks. I > don't see the advantage of #3. I don't think all of that is of necessary consequence. What I was thinking of would actually works similar to #2: Just that, instead of a heap tuple, it inserts a index tuple that a) has a PROMISE_TUPLE flag set and b) the itemid points towards a xid instead of a heap tuple. That's sufficient for detecting uniqueness violations using a similar approach like in your proposal. Which does *not* have to happen inside the AM. Yes, it'd require some indexam API changes, but I don't think they'd be to severe. Most importantly the ability to insert, return promise tuples to outside and API to repoint a promise tuple to a real entry. I haven't thought about lossy indexes, but I actually do think they could be made work with that strategy. The difference to #2 primarily is that it avoids speculatively inserting in both the heap and index and just inserts into the index, thereby detecting conflicts a bit earlier, after less work. But, as I said, I'm ok with pursuing #2 on the basis that it's quite likely to be simpler Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 1 October 2014 10:44, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > I didn't realize that "promise index tuples" were even seriously discussed. > I guess that can be made to work, too, although I don't see the point. It > wouldn't work with GiST indexes, for the same reasons as page-level locking > won't work (a tuple can legally be inserted anywhere in a GiST index - it > just degrades the index making searching more expensive). And lossy GiST > opclasses are a problem too. GiST doesn't support unique indexes, so it is not in any way a problem. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 10/01/2014 01:50 PM, Simon Riggs wrote: > On 1 October 2014 10:44, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > >> I didn't realize that "promise index tuples" were even seriously discussed. >> I guess that can be made to work, too, although I don't see the point. It >> wouldn't work with GiST indexes, for the same reasons as page-level locking >> won't work (a tuple can legally be inserted anywhere in a GiST index - it >> just degrades the index making searching more expensive). And lossy GiST >> opclasses are a problem too. > > GiST doesn't support unique indexes, so it is not in any way a problem. GiST supports exclusion constraints. That is one of the main reasons I want to do promise tuples, instead of locking within the indexam: to support this feature with exclusion constraints. - Heikki
On 1 October 2014 09:44, Peter Geoghegan <pg@heroku.com> wrote: > In the hope of unblocking things, I have created this Wiki page, which > details the advantages and disadvantages of all 3 approaches that have > been discussed, as suggested by myself and others: > > https://wiki.postgresql.org/wiki/Value_locking A better job is needed at genuine balance on those items. You keep saying you don't care which approach you take and then every word you write goes against that, making it difficult to discuss. Quoting general research and other points about value locking are reasonable in the general section, but not in the description for 1. I'm glad you've called the first option "Heavyweight Page Locking". That name sums up what I see as the major objection to it, which is that performance and scalability of the whole server will be damaged signiciantly by using heavyweight locks for each row inserted. Please add that concern as a negative; I'm sure testing has been done, but not comparative testing against other techniques. I am motivated to explain the promise index tuple approach further because of this, which is an optimistic approach to locking. The stated disadvantages for 3 are not accurate, to put it mildly. But that's been useful because what was a small thought experiment is beginning to look like a useful approach. I will post a summary of my understanding. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 1 October 2014 11:58, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 10/01/2014 01:50 PM, Simon Riggs wrote: >> >> On 1 October 2014 10:44, Heikki Linnakangas <hlinnakangas@vmware.com> >> wrote: >> >>> I didn't realize that "promise index tuples" were even seriously >>> discussed. >>> I guess that can be made to work, too, although I don't see the point. It >>> wouldn't work with GiST indexes, for the same reasons as page-level >>> locking >>> won't work (a tuple can legally be inserted anywhere in a GiST index - it >>> just degrades the index making searching more expensive). And lossy GiST >>> opclasses are a problem too. >> >> >> GiST doesn't support unique indexes, so it is not in any way a problem. > > > GiST supports exclusion constraints. That is one of the main reasons I want > to do promise tuples, instead of locking within the indexam: to support this > feature with exclusion constraints. That does sound interesting, but I am concerned the semantics may cause issues. If I go to insert a row for 'UK' and find an existing row for 'Europe', do we really want to update the population of Europe to be the population of the UK, simply because the UK and Europe have an exclusion conflict? Please give some concrete examples of a business request that might be satisified by such a feature. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 10/01/2014 02:42 PM, Simon Riggs wrote: > On 1 October 2014 11:58, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: >> On 10/01/2014 01:50 PM, Simon Riggs wrote: >>> >>> On 1 October 2014 10:44, Heikki Linnakangas <hlinnakangas@vmware.com> >>> wrote: >>> >>>> I didn't realize that "promise index tuples" were even seriously >>>> discussed. >>>> I guess that can be made to work, too, although I don't see the point. It >>>> wouldn't work with GiST indexes, for the same reasons as page-level >>>> locking >>>> won't work (a tuple can legally be inserted anywhere in a GiST index - it >>>> just degrades the index making searching more expensive). And lossy GiST >>>> opclasses are a problem too. >>> >>> GiST doesn't support unique indexes, so it is not in any way a problem. >> >> GiST supports exclusion constraints. That is one of the main reasons I want >> to do promise tuples, instead of locking within the indexam: to support this >> feature with exclusion constraints. > > That does sound interesting, but I am concerned the semantics may cause issues. > > If I go to insert a row for 'UK' and find an existing row for > 'Europe', do we really want to update the population of Europe to be > the population of the UK, simply because the UK and Europe have an > exclusion conflict? Clearly not, but you might want to insert the tuple to another table instead, or skip it altogether. Or you might want to UPDATE Europe into Continental Europe, and then insert the row for UK. - Heikki
On 1 October 2014 13:43, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: >> That does sound interesting, but I am concerned the semantics may cause >> issues. >> >> If I go to insert a row for 'UK' and find an existing row for >> 'Europe', do we really want to update the population of Europe to be >> the population of the UK, simply because the UK and Europe have an >> exclusion conflict? > > Clearly not, but you might want to insert the tuple to another table > instead, or skip it altogether. Or you might want to UPDATE Europe into > Continental Europe, and then insert the row for UK. Not trying to catch you out, just trying to make sure we don't make technical decisions based upon unachievable ideas. I can't see value in having upsert work against exclusion constraint indexes; thus this only needs to work for btrees, or similar exact indexes. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 1 October 2014 14:31, Simon Riggs <simon@2ndquadrant.com> wrote: > On 1 October 2014 13:43, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > >>> That does sound interesting, but I am concerned the semantics may cause >>> issues. >>> >>> If I go to insert a row for 'UK' and find an existing row for >>> 'Europe', do we really want to update the population of Europe to be >>> the population of the UK, simply because the UK and Europe have an >>> exclusion conflict? >> >> Clearly not, but you might want to insert the tuple to another table >> instead, or skip it altogether. Or you might want to UPDATE Europe into >> Continental Europe, and then insert the row for UK. Sorry, let me explain more clearly - neither of those two use cases matches the upsert case. If you wish to skip an insert that fails on a uniqueness constraint, that is already possible. Just wrap in a subtransaction and trap the error. The only reason we need UPSERT is if we intend to update. If we just intend to ignore, then we could add such a check to any index AM that supports unique/exclusion constraints, but without pursuing the full locking needed for upsert path. I wasn't aware that you could do both an INSERT and an UPDATE at same time using the proposed feature. There is no feature option to refer to the conflicting row that is already in the table. The only thing we have at present is the ability to refer to the incoming data row using CONFLICTING(). Update triggers allow you to refer to OLD and NEW, but with an exclusion constraint the row values don't match, so using OLD and NEW would not be appropriate - that isn't how an update trigger works now. So AFAICS neither of those use cases matches. > Not trying to catch you out, just trying to make sure we don't make > technical decisions based upon unachievable ideas. > > I can't see value in having upsert work against exclusion constraint > indexes; thus this only needs to work for btrees, or similar exact > indexes. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 10/01/2014 04:31 PM, Simon Riggs wrote: > On 1 October 2014 13:43, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > >>> That does sound interesting, but I am concerned the semantics may cause >>> issues. >>> >>> If I go to insert a row for 'UK' and find an existing row for >>> 'Europe', do we really want to update the population of Europe to be >>> the population of the UK, simply because the UK and Europe have an >>> exclusion conflict? >> >> Clearly not, but you might want to insert the tuple to another table >> instead, or skip it altogether. Or you might want to UPDATE Europe into >> Continental Europe, and then insert the row for UK. > > Not trying to catch you out, just trying to make sure we don't make > technical decisions based upon unachievable ideas. > > I can't see value in having upsert work against exclusion constraint > indexes; thus this only needs to work for btrees, or similar exact > indexes. Well, if nothing else, it would be nice to fix the concurrency issue we have with exclusion constraints today, which is that if two backends insert the same value at the same time, they might both get an error, even though you'd only need to abort one of them. That's a special case of UPSERT if you squint a little. - Heikki
On 10/01/2014 04:46 PM, Simon Riggs wrote: > On 1 October 2014 14:31, Simon Riggs <simon@2ndquadrant.com> wrote: >> On 1 October 2014 13:43, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: >> >>>> That does sound interesting, but I am concerned the semantics may cause >>>> issues. >>>> >>>> If I go to insert a row for 'UK' and find an existing row for >>>> 'Europe', do we really want to update the population of Europe to be >>>> the population of the UK, simply because the UK and Europe have an >>>> exclusion conflict? >>> >>> Clearly not, but you might want to insert the tuple to another table >>> instead, or skip it altogether. Or you might want to UPDATE Europe into >>> Continental Europe, and then insert the row for UK. > > Sorry, let me explain more clearly - neither of those two use cases > matches the upsert case. > > If you wish to skip an insert that fails on a uniqueness constraint, > that is already possible. Just wrap in a subtransaction and trap the > error. Uh, if that's an option, couldn't you just use a subtransaction always, and forget about this patch? However, a subtransaction is a lot more expensive; you'll consume an XID for every inserted or updated row, for starters. > The only reason we need UPSERT is if we intend to update. If we > just intend to ignore, then we could add such a check to any index AM > that supports unique/exclusion constraints, but without pursuing the > full locking needed for upsert path. > > I wasn't aware that you could do both an INSERT and an UPDATE at same > time using the proposed feature. I'm not actually sure if the proposed syntax would allow that, I haven't been paying much attention to that part. - Heikki
On Wed, Oct 1, 2014 at 4:23 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Quoting general research and other points about value locking are > reasonable in the general section, but not in the description for 1. I also made a similar comparison of #3. I don't think that reflects a bias. > I'm glad you've called the first option "Heavyweight Page Locking". > That name sums up what I see as the major objection to it, which is > that performance and scalability of the whole server will be damaged > signiciantly by using heavyweight locks for each row inserted. Please > add that concern as a negative; I'm sure testing has been done, but > not comparative testing against other techniques. Comparative testing against both other techniques (#1, at a time when #1 and #2 were otherwise comparable), and plain inserts and updates has shown the performance to be very good. The evidence suggests that using a heavyweight lock like this is fine. Don't promise tuples use heavyweight locks? The coarser granularity did not appear to be significant once we optimize retries to avoid hwlocking. #1 came out a bit ahead of #2. So maybe the bloat of #2 is almost, but not quite, cancelled out by the hwlocking coarseness. But then, I specifically stated that it seemed that not having bloating was not much of an advantage for #1. We're going to have a benchmark between #1 and #2 when #2 is properly integrated with the rest of the updated patch (just because we can). #1 is the fastest of #1 and #2 last I checked, but there wasn't all that much in it. > I am motivated to > explain the promise index tuple approach further because of this, > which is an optimistic approach to locking. > The stated disadvantages for 3 are not accurate, to put it mildly. Honestly, how could I know what they are? The explanation I heard from you and Andres has been very short on details. The points for #3 are *my* concerns. I had to start somewhere. I'll consider your rebuttal to those points that you make on a new thread separately. > But that's been useful because what was a small thought experiment is > beginning to look like a useful approach. I will post a summary of my > understanding. Thanks. Actually, this wiki page will probably pay for itself just by allowing us to refer to #1, #2, and #3. -- Peter Geoghegan
On Wed, Oct 1, 2014 at 6:49 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > Well, if nothing else, it would be nice to fix the concurrency issue we have > with exclusion constraints today, which is that if two backends insert the > same value at the same time, they might both get an error, even though you'd > only need to abort one of them. That's a special case of UPSERT if you > squint a little. In my view, it makes sense to fix that, and to make INSERT ... ON CONFLICT IGNORE work with exclusion constraints. However, it does not make sense to have INSERT ... ON CONFLICT UPDATE work with exclusion constraints. The user-visible semantics are very tricky, and I don't think it's useful. -- Peter Geoghegan
On Wed, Oct 1, 2014 at 2:42 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> GiST supports exclusion constraints. That is one of the main reasons I want >> to do promise tuples, instead of locking within the indexam: to support this >> feature with exclusion constraints. > > That does sound interesting, but I am concerned the semantics may cause issues. > > If I go to insert a row for 'UK' and find an existing row for > 'Europe', do we really want to update the population of Europe to be > the population of the UK, simply because the UK and Europe have an > exclusion conflict? > > Please give some concrete examples of a business request that might be > satisified by such a feature. The ON CONFLICT UPDATE semantics don't seem particularly useful for exclusion constraints. However, a reasonable business request for exclusion constraints would be to have a "boss mode" for the canonical room reservation example - an INSERT that is guaranteed not to fail by either deleting conflicting rows or updating them so the exclusion constraints don't overlap (e.g. truncate the time intervals) or the rows fail the index predicate (e.g. soft delete). AFAICS this is currently not possible to implement correctly without a retry loop. The hypothetical ON CONFLICT REPLACE and ON CONFLICT UPDATE-AND-THEN-INSERT modes would also make sense in the unique index case. Not saying that I view this as necessary for the first cut of the feature, just providing an example where it could be useful. Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de
On 10/02/2014 03:06 AM, Peter Geoghegan wrote: > In my view, it makes sense to fix that, and to make INSERT ... ON > CONFLICT IGNORE work with exclusion constraints. However, it does not > make sense to have INSERT ... ON CONFLICT UPDATE work with exclusion > constraints. The user-visible semantics are very tricky, and I don't > think it's useful. If you were doing a multi-valued INSERT ... ON CONFLICT IGNORE and wanted to do this with exclusion constraints, what would you do if the insert its self contains values that can't exist in the table together? Right now the whole insert will fail. Would that still be the case? Would you insert one tuple then ignore the other? If so, what guarantee if any would be made about which tuple would get inserted? I think insert...ignore for exclusion constraints could be interesting, but I'm not sure it's really the same as the upsert case. I guess I also just don't really understand the utility of insert ... on conflict ignore for GiST exclusion constraints. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services