Обсуждение: LIMIT for UPDATE and DELETE
Greetings, Based on the feedback on my previous patch, I've separated only the LIMIT part into its own feature. This version plays nicely with inheritance. The intended use is splitting up big UPDATEs and DELETEs into batches more easily and efficiently. ♜
Вложения
Hi Rukh, (2014/08/15 6:18), Rukh Meski wrote: > Based on the feedback on my previous patch, I've separated only the > LIMIT part into its own feature. This version plays nicely with > inheritance. The intended use is splitting up big UPDATEs and DELETEs > into batches more easily and efficiently. Before looking into the patch, I'd like to know the use cases in more details. Thanks, Best regards, Etsuro Fujita
On Mon, Aug 25, 2014 at 12:18 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
> (2014/08/15 6:18), Rukh Meski wrote:>>
>> Based on the feedback on my previous patch, I've separated only the
>> LIMIT part into its own feature. This version plays nicely with
>> inheritance. The intended use is splitting up big UPDATEs and DELETEs
>> into batches more easily and efficiently.
>
>
> Before looking into the patch, I'd like to know the use cases in more details.
You can once check the previous commit fest thread [1] for
this feature, in that you can find some use cases and what are
the difficulties to implement this feature, it might aid you in
review of this feature.
[1]
https://commitfest.postgresql.org/action/patch_view?id=1412
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
https://commitfest.postgresql.org/action/patch_view?id=1412
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2014/08/15 6:18), Rukh Meski wrote: >> Based on the feedback on my previous patch, I've separated only the >> LIMIT part into its own feature. This version plays nicely with >> inheritance. The intended use is splitting up big UPDATEs and DELETEs >> into batches more easily and efficiently. > > Before looking into the patch, I'd like to know the use cases in more > details. There have been a few times I wanted something like this, so that it was easier to do an update that affects a very high percentage of rows in a table, while making the old version of the row no longer match the selection criteria for the UPDATE. There are workarounds using cursors or subselects returning ctid, but they are kludgy and error prone. Basically I wanted to alternate UPDATE of a subset of the rows with table VACUUM so that subsequent iterations can re-use space and avoid bloating the table. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Aug 24, 2014 at 11:48 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
Hi Rukh,Before looking into the patch, I'd like to know the use cases in more details.
(2014/08/15 6:18), Rukh Meski wrote:Based on the feedback on my previous patch, I've separated only the
LIMIT part into its own feature. This version plays nicely with
inheritance. The intended use is splitting up big UPDATEs and DELETEs
into batches more easily and efficiently.
There are two common use cases I can think of:
1)
I've just added a column to an existing table, and it is all NULL. I've changed the code to populate that column appropriately for new or updated rows, but I need to back fill the existing rows. I have a (slow) method to compute the new value. (I've not yet changed the code to depend on that column being populated)
The obvious solution is:
update the_table set new_col=populate_new_col(whatever) where new_col is null.
But this will bloat the table because vacuum cannot intervene, and will take a very long time. The first row to be update will remain locked until the last row gets updated, which is not acceptable. And if something goes wrong before the commit, you've lost all the work.
With the limit clause, you can just do this:
update the_table set new_col=populate_new_col(whatever) where new_col is null limit 50000;
In a loop with appropriate vacuuming and throttling.
2)
I've introduced or re-designed partitioning, and need to migrate rows to the appropriate partitions without long lived row locks.
create table pgbench_accounts2 () inherits (pgbench_accounts);
and then in a loop:
with t as (delete from only pgbench_accounts where aid < 500000 limit 5000 returning *)
insert into pgbench_accounts2 select * from t;
Cheers,
Jeff
(2014/08/25 15:48), Etsuro Fujita wrote: > (2014/08/15 6:18), Rukh Meski wrote: >> Based on the feedback on my previous patch, I've separated only the >> LIMIT part into its own feature. This version plays nicely with >> inheritance. The intended use is splitting up big UPDATEs and DELETEs >> into batches more easily and efficiently. > > Before looking into the patch, I'd like to know the use cases in more > details. Thanks for the input, Amit, Kevin and Jeff! I understand that the patch is useful. I've looked at the patch a bit closely. Here is my initial thought about the patch. The patch places limit-counting inside ModifyTable, and works well for inheritance trees, but I'm not sure that that is the right way to go. I think that this feature should be implemented in the way that we can naturally extend it to the ORDER-BY-LIMIT case in future. But honestly the patch doesn't seem to take into account that, I might be missing something, though. What plan do you have for the future extensibility? I think that the approach discussed in [1] would be promissing, so ISTM that it would be better to implement this feature by allowing for plans in the form of eg, ModifyTModifyTable+Limit+Append. Thanks, [1] http://www.postgresql.org/message-id/26819.1291133045@sss.pgh.pa.us Best regards, Etsuro Fujita
On 8/29/14 12:20 PM, Etsuro Fujita wrote: > The patch places limit-counting inside ModifyTable, and works well for > inheritance trees, but I'm not sure that that is the right way to go. I > think that this feature should be implemented in the way that we can > naturally extend it to the ORDER-BY-LIMIT case in future. But honestly > the patch doesn't seem to take into account that, I might be missing > something, though. The LIMIT part *has* to happen after the rows have been locked or it will work very surprisingly under concurrency (sort of like how FOR SHARE / FOR UPDATE worked before 9.0). So either it has to be inside ModifyTable or the ModifyTable has to somehow pass something to a Limit node on top of it which would then realize that the tuples from ModifyTable aren't supposed to be sent to the client (unless there's a RETURNING clause). I think it's a lot nicer to do the LIMITing inside ModifyTable, even though that duplicates a small portion of code that already exists in the Limit node. > What plan do you have for the future extensibility? > > I think that the approach discussed in [1] would be promissing, so ISTM > that it would be better to implement this feature by allowing for plans > in the form of eg, ModifyTModifyTable+Limit+Append. I don't see an approach discussed there, just a listing of problems with no solutions. This is just my personal opinion, but what I think should happen is: 1) We put the LIMIT inside ModifyTable like this patch does. This doesn't prevent us from doing ORDER BY in the future, but helps numerous people who today have to 2) We allow ORDER BY on tables with no inheritance children using something similar to Rukh's previous patch. 3) Someone rewrites how UPDATE works based on Tom's suggestion here: http://www.postgresql.org/message-id/1598.1399826841@sss.pgh.pa.us, which allows us to support ORDER BY on all tables (or perhaps maybe not FDWs, I don't know how those work). The LIMIT functionality in this patch is unaffected. Now, I know some people disagree with me on whether step #2 is worth taking or not, but that's a separate discussion. My point w.r.t. this patch still stands: I don't see any forwards compatibility problems with this approach, nor do I really see any viable alternatives either. .marko
On 8/29/14 1:53 PM, I wrote: > This is just my personal opinion, but what I think should happen is: > > 1) We put the LIMIT inside ModifyTable like this patch does. This > doesn't prevent us from doing ORDER BY in the future, but helps numerous > people who today have to Oops, looks like I didn't finish my thought here. .. but helps numerous people who today have to achieve the same thing via tedious, slow and problematic subqueries (or a choose-any-two combination of these). .marko
Marko Tiikkaja <marko@joh.to> writes: > The LIMIT part *has* to happen after the rows have been locked or it > will work very surprisingly under concurrency (sort of like how FOR > SHARE / FOR UPDATE worked before 9.0). Good point. > So either it has to be inside > ModifyTable or the ModifyTable has to somehow pass something to a Limit > node on top of it ... or we add a LockRows node below the Limit node. Yeah, that would make UPDATE/LIMIT a tad slower, but I think that might be preferable to what you're proposing anyway. Raw speed of what is fundamentally a fringe feature ought not trump every other concern. > This is just my personal opinion, but what I think should happen is: > 1) We put the LIMIT inside ModifyTable like this patch does. This > doesn't prevent us from doing ORDER BY in the future, but helps numerous > people who today have to > 2) We allow ORDER BY on tables with no inheritance children using > something similar to Rukh's previous patch. > 3) Someone rewrites how UPDATE works based on Tom's suggestion here: > http://www.postgresql.org/message-id/1598.1399826841@sss.pgh.pa.us, > which allows us to support ORDER BY on all tables (or perhaps maybe not > FDWs, I don't know how those work). The LIMIT functionality in this > patch is unaffected. I still think we should skip #2 and go directly to work on #3. Getting rid of the unholy mess that is inheritance_planner would be a very nice thing. regards, tom lane
On 8/29/14 4:33 PM, Tom Lane wrote: >> So either it has to be inside >> ModifyTable or the ModifyTable has to somehow pass something to a Limit >> node on top of it > > ... or we add a LockRows node below the Limit node. Yeah, that would make > UPDATE/LIMIT a tad slower, but I think that might be preferable to what > you're proposing anyway. Raw speed of what is fundamentally a fringe > feature ought not trump every other concern. I don't consider this a fringe feature, but in any case, the main use case for LIMIT without ORDER BY in UPDATE and DELETE is to split up large transactions into smaller batches. And considering that, I think raw speed should be a concern (though it shouldn't trump every other concern, obviously). More to the point, personally, I think the changes to nodeModifyTable.c are very reasonable so it's not clear to me that the "extra LockRows+Limit nodes" approach would be inherently better (even ignoring performance concerns). >> This is just my personal opinion, but what I think should happen is: > >> 2) We allow ORDER BY on tables with no inheritance children using >> something similar to Rukh's previous patch. >> 3) Someone rewrites how UPDATE works based on Tom's suggestion here: >> http://www.postgresql.org/message-id/1598.1399826841@sss.pgh.pa.us, > > I still think we should skip #2 and go directly to work on #3. Getting > rid of the unholy mess that is inheritance_planner would be a very nice > thing. Ideally? Yeah, that would be great. But I don't see anyone volunteering to do that work, and I think holding back a useful feature (ORDER BY with UPDATE/DELETE) in hopes of getting someone to volunteer to do it is insane. Now, you're free to argue that ORDER BY with UPDATE/DELETE isn't that useful, of course, but I'm sure there are lots of people who agree with me. .marko
On Mon, Sep 1, 2014 at 8:06 AM, Marko Tiikkaja <marko@joh.to> wrote: > Ideally? Yeah, that would be great. But I don't see anyone volunteering to > do that work, and I think holding back a useful feature (ORDER BY with > UPDATE/DELETE) in hopes of getting someone to volunteer to do it is insane. > Now, you're free to argue that ORDER BY with UPDATE/DELETE isn't that > useful, of course, but I'm sure there are lots of people who agree with me. I still agree with Tom. Arbitrary restrictions on which features can be used in combination with each other piss off and alienate users. We've put quite a bit of effort into making table inheritance not suck (e.g. statistics on inheritance trees, Merge Append, etc.). Making it suck more because you don't think it's as important as your feature is, in my opinion, not cool. This is not to say that I don't like the feature. I like it a lot. But I like a product where you can be sure that if walking works and chewing gum works you can also walk and chew gum at the same time even more. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 9/3/14 4:46 PM, Robert Haas wrote: > Making it > suck more because you don't think it's as important as your feature > is, in my opinion, not cool. I really can't see how that would make inheritance suck *more*. You can't do UPDATE .. ORDER BY now, and you wouldn't be able to do it after the patch. Yeah, sure, perhaps people using inheritance might feel left out, but surely that would just motivate them to work on improving it. .marko
On Wed, Sep 3, 2014 at 11:02 AM, Marko Tiikkaja <marko@joh.to> wrote: > On 9/3/14 4:46 PM, Robert Haas wrote: >> Making it >> suck more because you don't think it's as important as your feature >> is, in my opinion, not cool. > > I really can't see how that would make inheritance suck *more*. You can't > do UPDATE .. ORDER BY now, and you wouldn't be able to do it after the > patch. Yeah, sure, perhaps people using inheritance might feel left out, > but surely that would just motivate them to work on improving it. I think it's entirely reasonable for us to require that all new SQL features should be required to work with or without inheritance. If we took the opposition position, and said that the only things that need to work with inheritance are the ones that existed at the time inheritance was introduced, then we'd not need to worry about it not only for this feature but for row-level security and SKIP LOCKED and GROUPING SETS and, going back a bit further, materialized views and security-barrier views and LATERAL and CTEs and on and on. Perhaps not all of those require any special handling for inheritance hierarchies, but some of them surely did, and if even 10% of the SQL features that we have added since table inheritance were allowed to opt out of supporting it, we'd have a broken and unusable feature today. Now some people might argue that we have that anyway, but the fact is that a lot of people are using inheritance today, even with all its flaws, and they wouldn't be if there were a long laundry list of limitations that didn't apply to regular tables. We should be looking to lift the limitations that currently exist, not add more. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 09/03/2014 06:39 PM, Robert Haas wrote: > On Wed, Sep 3, 2014 at 11:02 AM, Marko Tiikkaja <marko@joh.to> wrote: >> On 9/3/14 4:46 PM, Robert Haas wrote: >>> Making it >>> suck more because you don't think it's as important as your feature >>> is, in my opinion, not cool. >> >> I really can't see how that would make inheritance suck *more*. You can't >> do UPDATE .. ORDER BY now, and you wouldn't be able to do it after the >> patch. Yeah, sure, perhaps people using inheritance might feel left out, >> but surely that would just motivate them to work on improving it. > > I think it's entirely reasonable for us to require that all new SQL > features should be required to work with or without inheritance. If > we took the opposition position, and said that the only things that > need to work with inheritance are the ones that existed at the time > inheritance was introduced, then we'd not need to worry about it not > only for this feature but for row-level security and SKIP LOCKED and > GROUPING SETS and, going back a bit further, materialized views and > security-barrier views and LATERAL and CTEs and on and on. Perhaps > not all of those require any special handling for inheritance > hierarchies, but some of them surely did, and if even 10% of the SQL > features that we have added since table inheritance were allowed to > opt out of supporting it, we'd have a broken and unusable feature > today. > > Now some people might argue that we have that anyway, but the fact is > that a lot of people are using inheritance today, even with all its > flaws, and they wouldn't be if there were a long laundry list of > limitations that didn't apply to regular tables. We should be looking > to lift the limitations that currently exist, not add more. I agree. If we are to support UPDATE .. ORDER BY .. LIMIT, it should work with inheritance. So the path forward is (using Marko's phrasing upthread): 1) We put the LIMIT inside ModifyTable like this patch does. This doesn't prevent us from doing ORDER BY in the future, but helps numerous people who today have to 2) Someone rewrites how UPDATE works based on Tom's suggestion here: http://www.postgresql.org/message-id/1598.1399826841@sss.pgh.pa.us, which allows us to support ORDER BY on all tables (or perhaps maybe not FDWs, I don't know how those work). The LIMIT functionality in this patch is unaffected. What's not clear to me is whether it make sense to do 1) without 2) ? Is UPDATE .. LIMIT without support for an ORDER BY useful enough? And if we apply this patch now, how much of it needs to be rewritten after 2) ? If the answers are "yes" and "not much", then we should review this patch now, and put 2) on the TODO list. Otherwise 2) should do done first. Etsuro, Kaigei, please take a look at the patch and try to make a guess on how much of this still needs to be rewritten if we do 2). If not much, then please continue to review it, with the aim of getting it into a committable state. - Heikki
On 9/9/14 11:57 AM, Heikki Linnakangas wrote: > What's not clear to me is whether it make sense to do 1) without 2) ? Is > UPDATE .. LIMIT without support for an ORDER BY useful enough? I'd say so; I could use it right now. I have to remove millions of lines from a table, but I don't want the live transaction processing to take a hit, so I have to do it in batches. Granted, some kind of rate limiting would achieve the same thing for DELETE, but with UPDATE you also have to consider row locking, and rate limiting wouldn't help with that at all; it would, in fact, just make it worse. I'll also be running a big UPDATE like that later today, so that's two uses today alone for me. And no, these are not routine things so keep your "use partitions" suggestions to yourselves. > And if we > apply this patch now, how much of it needs to be rewritten after 2) ? If > the answers are "yes" and "not much", then we should review this patch > now, and put 2) on the TODO list. Otherwise 2) should do done first. I'd say "not much, if anything at all". .marko
On 9/9/14 12:37 PM, I wrote: > And no, these are not routine things so keep your "use > partitions" suggestions to yourselves. My apologies. This was not directed at you personally, Heikki, and in any case it was unnecessarily hostile. .marko
On 09/09/2014 01:46 PM, Marko Tiikkaja wrote: > On 9/9/14 12:37 PM, I wrote: >> And no, these are not routine things so keep your "use >> partitions" suggestions to yourselves. > > My apologies. This was not directed at you personally, Heikki, and in > any case it was unnecessarily hostile. No worries, it made me smile :-) - Heikki
On Tue, Sep 9, 2014 at 2:57 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
I agree. If we are to support UPDATE .. ORDER BY .. LIMIT, it should work with inheritance. So the path forward is (using Marko's phrasing upthread):
1) We put the LIMIT inside ModifyTable like this patch does. This
doesn't prevent us from doing ORDER BY in the future, but helps numerous
people who today have to
2) Someone rewrites how UPDATE works based on Tom's suggestion here:
http://www.postgresql.org/message-id/1598.1399826841@sss.pgh.pa.us,
which allows us to support ORDER BY on all tables (or perhaps maybe not
FDWs, I don't know how those work). The LIMIT functionality in this
patch is unaffected.
What's not clear to me is whether it make sense to do 1) without 2) ? Is UPDATE .. LIMIT without support for an ORDER BY useful enough?
I've wanted LIMIT even without ORDER BY many times, so I'd vote yes.
And if we apply this patch now, how much of it needs to be rewritten after 2) ? If the answers are "yes" and "not much", then we should review this patch now, and put 2) on the TODO list. Otherwise 2) should do done first.
On that I can't give any useful feedback.
Cheers,
Jeff
(2014/09/09 18:57), Heikki Linnakangas wrote: > On 09/03/2014 06:39 PM, Robert Haas wrote: >> Now some people might argue that we have that anyway, but the fact is >> that a lot of people are using inheritance today, even with all its >> flaws, and they wouldn't be if there were a long laundry list of >> limitations that didn't apply to regular tables. We should be looking >> to lift the limitations that currently exist, not add more. > I agree. If we are to support UPDATE .. ORDER BY .. LIMIT, it should > work with inheritance. So the path forward is (using Marko's phrasing > upthread): > > 1) We put the LIMIT inside ModifyTable like this patch does. This > doesn't prevent us from doing ORDER BY in the future, but helps numerous > people who today have to > 2) Someone rewrites how UPDATE works based on Tom's suggestion here: > http://www.postgresql.org/message-id/1598.1399826841@sss.pgh.pa.us, > which allows us to support ORDER BY on all tables (or perhaps maybe not > FDWs, I don't know how those work). The LIMIT functionality in this > patch is unaffected. > > What's not clear to me is whether it make sense to do 1) without 2) ? Is > UPDATE .. LIMIT without support for an ORDER BY useful enough? And if we > apply this patch now, how much of it needs to be rewritten after 2) ? If > the answers are "yes" and "not much", then we should review this patch > now, and put 2) on the TODO list. Otherwise 2) should do done first. My answers are "yes" but "completely rewritten". So, I think we should work on 2) first ideally, but 2) seems a large project at least to me. So, I think it would be reasonable to implement UPDATE/DELETE .. LIMIT based on Rukh' patch for now and put 2) and the re-implementation of 1) after 2) on the TODO list. I don't have enough time to review it for a while, so I'd like to work on it (and postgres_fdw's "update pushdown" enhancement related to it) at the next CF (I think I can do it earlier). I must apologize to Rukh for not having enough time for the patch review. Thanks, Best regards, Etsuro Fujita
On 2014-09-10 04:25, Etsuro Fujita wrote: > (2014/09/09 18:57), Heikki Linnakangas wrote: >> What's not clear to me is whether it make sense to do 1) without 2) ? Is >> UPDATE .. LIMIT without support for an ORDER BY useful enough? And if we >> apply this patch now, how much of it needs to be rewritten after 2) ? If >> the answers are "yes" and "not much", then we should review this patch >> now, and put 2) on the TODO list. Otherwise 2) should do done first. > > My answers are "yes" but "completely rewritten". Any particular reason for you to say that? Because an UPDATE might have a RETURNING clause, all the updated tuples have to go through the ModifyTable node one at a time. I don't see why we couldn't LIMIT there after implementing #2. .marko
(2014/09/10 16:57), Marko Tiikkaja wrote: > On 2014-09-10 04:25, Etsuro Fujita wrote: >> (2014/09/09 18:57), Heikki Linnakangas wrote: >>> What's not clear to me is whether it make sense to do 1) without 2) ? Is >>> UPDATE .. LIMIT without support for an ORDER BY useful enough? And if we >>> apply this patch now, how much of it needs to be rewritten after 2) ? If >>> the answers are "yes" and "not much", then we should review this patch >>> now, and put 2) on the TODO list. Otherwise 2) should do done first. >> >> My answers are "yes" but "completely rewritten". > > Any particular reason for you to say that? Because an UPDATE might have > a RETURNING clause, all the updated tuples have to go through the > ModifyTable node one at a time. I don't see why we couldn't LIMIT there > after implementing #2. The reason is because I think that after implementing #2, we should re-implement this feature by extending the planner to produce a plan tree such as ModifyTable+Limit+Append, maybe with LockRows below the Limit node. Such an approach would prevent duplication of the LIMIT code in ModifyTable, making the ModifyTable code more simple, I think. Thanks, Best regards, Etsuro Fujita
On 9/10/14 11:25 AM, Etsuro Fujita wrote: > The reason is because I think that after implementing #2, we should > re-implement this feature by extending the planner to produce a plan > tree such as ModifyTable+Limit+Append, maybe with LockRows below the > Limit node. Such an approach would prevent duplication of the LIMIT > code in ModifyTable, making the ModifyTable code more simple, I think. You can already change *this patch* to do ModifyTable <- Limit <- LockRows. If we think that's what we want, we should rewrite this patch right now. This isn't a reason not to implement LIMIT without ORDER BY. Like I said upthread, I think LockRows is a bad idea, but I'll need to run some performance tests first. But whichever method we decide to implement for this patch shouldn't need to be touched when the changes to UPDATE land, so your reasoning is incorrect. .marko
(2014/09/10 18:33), Marko Tiikkaja wrote: > On 9/10/14 11:25 AM, Etsuro Fujita wrote: >> The reason is because I think that after implementing #2, we should >> re-implement this feature by extending the planner to produce a plan >> tree such as ModifyTable+Limit+Append, maybe with LockRows below the >> Limit node. Such an approach would prevent duplication of the LIMIT >> code in ModifyTable, making the ModifyTable code more simple, I think. > You can already change *this patch* to do ModifyTable <- Limit <- > LockRows. If we think that's what we want, we should rewrite this patch > right now. I think it might be relatively easy to do that for single-table cases, but for inheritance cases, inheritance_planner needs work and I'm not sure how much work it would take ... > Like I said upthread, I think LockRows is a bad idea, but I'll need to > run some performance tests first. But whichever method we decide to > implement for this patch shouldn't need to be touched when the changes > to UPDATE land, so your reasoning is incorrect. Yeah, as you say, we need the performance tests, and I think it would depend on those results whether LockRows is a bad idea or not. Thanks, Best regards, Etsuro Fujita
On 9/10/14 12:05 PM, Etsuro Fujita wrote: > (2014/09/10 18:33), Marko Tiikkaja wrote: >> You can already change *this patch* to do ModifyTable <- Limit <- >> LockRows. If we think that's what we want, we should rewrite this patch >> right now. > > I think it might be relatively easy to do that for single-table cases, > but for inheritance cases, inheritance_planner needs work and I'm not > sure how much work it would take ... Uh. Yeah, I think I'm an idiot and you're right. I'll try and get some benchmarking done and see what comes out. .marko
(2014/08/15 6:18), Rukh Meski wrote: > Based on the feedback on my previous patch, I've separated only the > LIMIT part into its own feature. This version plays nicely with > inheritance. The intended use is splitting up big UPDATEs and DELETEs > into batches more easily and efficiently. IIUC, the patch doesn't support OFFSET with UPDATE/DELETE ... LIMIT. Is that OK? When we support ORDER BY ... LIMIT/OFFSET, we will also be allowing for OFFSET with UPDATE/DELETE ... LIMIT. So, ISTM it would be better for the patch to support OFFSET at this point. No? Thanks, Best regards, Etsuro Fujita
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2014/08/15 6:18), Rukh Meski wrote: >> Based on the feedback on my previous patch, I've separated only the >> LIMIT part into its own feature. This version plays nicely with >> inheritance. The intended use is splitting up big UPDATEs and DELETEs >> into batches more easily and efficiently. > > IIUC, the patch doesn't support OFFSET with UPDATE/DELETE ... LIMIT. Is > that OK? When we support ORDER BY ... LIMIT/OFFSET, we will also be > allowing for OFFSET with UPDATE/DELETE ... LIMIT. So, ISTM it would be > better for the patch to support OFFSET at this point. No? Without ORDER BY you really would have no idea *which* rows the OFFSET would be skipping. Even more dangerously, you might *think* you do, and get a surprise when you see the results (if, for example, a seqscan starts at a point other than the start of the heap, due to a concurrent seqscan from an unrelated query). It might be better not to provide an illusion of a degree of control you don't have, especially for UPDATE and DELETE operations. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Sep 16, 2014 at 11:31 AM, Kevin Grittner <kgrittn@ymail.com> wrote: > Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: >> (2014/08/15 6:18), Rukh Meski wrote: >>> Based on the feedback on my previous patch, I've separated only the >>> LIMIT part into its own feature. This version plays nicely with >>> inheritance. The intended use is splitting up big UPDATEs and DELETEs >>> into batches more easily and efficiently. >> >> IIUC, the patch doesn't support OFFSET with UPDATE/DELETE ... LIMIT. Is >> that OK? When we support ORDER BY ... LIMIT/OFFSET, we will also be >> allowing for OFFSET with UPDATE/DELETE ... LIMIT. So, ISTM it would be >> better for the patch to support OFFSET at this point. No? > > Without ORDER BY you really would have no idea *which* rows the > OFFSET would be skipping. Even more dangerously, you might *think* > you do, and get a surprise when you see the results (if, for > example, a seqscan starts at a point other than the start of the > heap, due to a concurrent seqscan from an unrelated query). It > might be better not to provide an illusion of a degree of control > you don't have, especially for UPDATE and DELETE operations. Fair point, but I'd lean toward including it. I think we all agree the end goal is ORDER BY .. LIMIT, and there OFFSET certainly has meaning. If we don't include it now, we'll just end up adding it later. It makes for fewer patches, and fewer changes for users, if we do it all at once. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
(2014/09/17 1:58), Robert Haas wrote: > On Tue, Sep 16, 2014 at 11:31 AM, Kevin Grittner <kgrittn@ymail.com> wrote: >> Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: >>> (2014/08/15 6:18), Rukh Meski wrote: >>>> Based on the feedback on my previous patch, I've separated only the >>>> LIMIT part into its own feature. This version plays nicely with >>>> inheritance. The intended use is splitting up big UPDATEs and DELETEs >>>> into batches more easily and efficiently. >>> >>> IIUC, the patch doesn't support OFFSET with UPDATE/DELETE ... LIMIT. Is >>> that OK? When we support ORDER BY ... LIMIT/OFFSET, we will also be >>> allowing for OFFSET with UPDATE/DELETE ... LIMIT. So, ISTM it would be >>> better for the patch to support OFFSET at this point. No? >> >> Without ORDER BY you really would have no idea *which* rows the >> OFFSET would be skipping. Even more dangerously, you might *think* >> you do, and get a surprise when you see the results (if, for >> example, a seqscan starts at a point other than the start of the >> heap, due to a concurrent seqscan from an unrelated query). It >> might be better not to provide an illusion of a degree of control >> you don't have, especially for UPDATE and DELETE operations. > > Fair point, but I'd lean toward including it. I think we all agree > the end goal is ORDER BY .. LIMIT, and there OFFSET certainly has > meaning. If we don't include it now, we'll just end up adding it > later. It makes for fewer patches, and fewer changes for users, if we > do it all at once. I agree with Robert. Rukh, what do you think as an author? Thanks, Best regards, Etsuro Fujita
On 09/24/2014 05:22 AM, Etsuro Fujita wrote: > (2014/09/17 1:58), Robert Haas wrote: >> On Tue, Sep 16, 2014 at 11:31 AM, Kevin Grittner <kgrittn@ymail.com> wrote: >>> Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: >>>> (2014/08/15 6:18), Rukh Meski wrote: >>>>> Based on the feedback on my previous patch, I've separated only the >>>>> LIMIT part into its own feature. This version plays nicely with >>>>> inheritance. The intended use is splitting up big UPDATEs and DELETEs >>>>> into batches more easily and efficiently. >>>> >>>> IIUC, the patch doesn't support OFFSET with UPDATE/DELETE ... LIMIT. Is >>>> that OK? When we support ORDER BY ... LIMIT/OFFSET, we will also be >>>> allowing for OFFSET with UPDATE/DELETE ... LIMIT. So, ISTM it would be >>>> better for the patch to support OFFSET at this point. No? >>> >>> Without ORDER BY you really would have no idea *which* rows the >>> OFFSET would be skipping. Even more dangerously, you might *think* >>> you do, and get a surprise when you see the results (if, for >>> example, a seqscan starts at a point other than the start of the >>> heap, due to a concurrent seqscan from an unrelated query). It >>> might be better not to provide an illusion of a degree of control >>> you don't have, especially for UPDATE and DELETE operations. >> >> Fair point, but I'd lean toward including it. I think we all agree >> the end goal is ORDER BY .. LIMIT, and there OFFSET certainly has >> meaning. If we don't include it now, we'll just end up adding it >> later. It makes for fewer patches, and fewer changes for users, if we >> do it all at once. > > I agree with Robert. > > Rukh, what do you think as an author? I have marked this as "returned with feedback" in the commitfest app. What I'd like to see happen next is: Rewrite how UPDATEs work, per Tom's suggestion here: http://www.postgresql.org/message-id/1598.1399826841@sss.pgh.pa.us. Then implement ORDER BY ... LIMIT on top of that. A lot of people would also be willing to settle for just implementing LIMIT without ORDER BY, as a stopgap measure. But the UPDATE rewrite is what would make everyone most happy. - Heikki