Обсуждение: Another way to fix inherited UPDATE/DELETE
While contemplating the wreckage of https://commitfest.postgresql.org/22/1778/ I had the beginnings of an idea of another way to fix that problem. The issue largely arises from the fact that for UPDATE, we expect the plan tree to emit a tuple that's ready to be stored back into the target rel ... well, almost, because it also has a CTID or some other row-identity column, so we have to do some work on it anyway. But the point is this means we potentially need a different targetlist for each child table in an inherited UPDATE. What if we dropped that idea, and instead defined the plan tree as returning only the columns that are updated by SET, plus the row identity? It would then be the ModifyTable node's job to fetch the original tuple using the row identity (which it must do anyway) and form the new tuple by combining the updated columns from the plan output with the non-updated columns from the original tuple. DELETE would be even simpler, since it only needs the row identity and nothing else. Having done that, we could toss inheritance_planner into the oblivion it so richly deserves, and just treat all types of inheritance or partitioning queries as expand-at-the-bottom, as SELECT has always done it. Arguably, this would be more efficient even for non-inheritance join situations, as less data (typically) would need to propagate through the join tree. I'm not sure exactly how it'd shake out for trivial updates; we might be paying for two tuple deconstructions not one, though perhaps there's a way to finesse that. (One easy way would be to stick to the old approach when there is no inheritance going on.) In the case of a standard inheritance or partition tree, this seems to go through really easily, since all the children could share the same returned CTID column (I guess you'd also need a TABLEOID column so you could figure out which table to direct the update back into). It gets a bit harder if the tree contains some foreign tables, because they might have different concepts of row identity, but I'd think in most cases you could still combine those into a small number of output columns. I have no idea how this might play with the pluggable-storage work. Obviously this'd be a major rewrite with no chance of making it into v12, but it doesn't sound too big to get done during v13. Thoughts? regards, tom lane
Hi, On 2019-02-19 16:48:55 -0500, Tom Lane wrote: > I have no idea how this might play with the pluggable-storage work. I don't think it'd have a meaningful impact, except for needing changes to an overlapping set of lines. But given the different timeframes, I'd not expect a problem with that. Greetings, Andres Freund
On Wed, 20 Feb 2019 at 10:49, Tom Lane <tgl@sss.pgh.pa.us> wrote: > What if we dropped that idea, and instead defined the plan tree as > returning only the columns that are updated by SET, plus the row > identity? It would then be the ModifyTable node's job to fetch the > original tuple using the row identity (which it must do anyway) and > form the new tuple by combining the updated columns from the plan > output with the non-updated columns from the original tuple. > > DELETE would be even simpler, since it only needs the row identity > and nothing else. While I didn't look at the patch in great detail, I think this is how Pavan must have made MERGE work for partitioned targets. I recall seeing the tableoid being added to the target list and a lookup of the ResultRelInfo by tableoid. Maybe Pavan can provide more useful details than I can. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, On 2019/02/20 6:48, Tom Lane wrote: > While contemplating the wreckage of > https://commitfest.postgresql.org/22/1778/ > I had the beginnings of an idea of another way to fix that problem. > > The issue largely arises from the fact that for UPDATE, we expect > the plan tree to emit a tuple that's ready to be stored back into > the target rel ... well, almost, because it also has a CTID or some > other row-identity column, so we have to do some work on it anyway. > But the point is this means we potentially need a different > targetlist for each child table in an inherited UPDATE. > > What if we dropped that idea, and instead defined the plan tree as > returning only the columns that are updated by SET, plus the row > identity? It would then be the ModifyTable node's job to fetch the > original tuple using the row identity (which it must do anyway) and > form the new tuple by combining the updated columns from the plan > output with the non-updated columns from the original tuple. > > DELETE would be even simpler, since it only needs the row identity > and nothing else. I had bookmarked link to an archived email of yours from about 5 years ago, in which you described a similar attack plan for UPDATE planning: https://www.postgresql.org/message-id/1598.1399826841%40sss.pgh.pa.us It's been kind of in the back of my mind for a while, even considered implementing it based on your sketch back then, but didn't have solutions for some issues surrounding optimization of updates of foreign partitions (see below). Maybe I should've mentioned that on this thread at some point. > Having done that, we could toss inheritance_planner into the oblivion > it so richly deserves, and just treat all types of inheritance or > partitioning queries as expand-at-the-bottom, as SELECT has always > done it. > > Arguably, this would be more efficient even for non-inheritance join > situations, as less data (typically) would need to propagate through the > join tree. I'm not sure exactly how it'd shake out for trivial updates; > we might be paying for two tuple deconstructions not one, though perhaps > there's a way to finesse that. (One easy way would be to stick to the > old approach when there is no inheritance going on.) > > In the case of a standard inheritance or partition tree, this seems to > go through really easily, since all the children could share the same > returned CTID column (I guess you'd also need a TABLEOID column so you > could figure out which table to direct the update back into). It gets > a bit harder if the tree contains some foreign tables, because they might > have different concepts of row identity, but I'd think in most cases you > could still combine those into a small number of output columns. Regarding child target relations that are foreign tables, the expand-target-inheritance-at-the-bottom approach perhaps leaves no way to allow pushing the update (possibly with joins) to remote side? -- no inheritance explain (costs off, verbose) update ffoo f set a = f.a + 1 from fbar b where f.a = b.a; QUERY PLAN ────────────────────────────────────────────────────────────────────────────────────────────────────── Update on public.ffoo f -> Foreign Update Remote SQL: UPDATE public.foo r1 SET a = (r1.a + 1) FROM public.bar r2 WHERE ((r1.a = r2.a)) (3 rows) -- inheritance explain (costs off, verbose) update p set aa = aa + 1 from ffoo f where p.aa = f.a; QUERY PLAN ─────────────────────────────────────────────────────────────────────────────────────────────────────────── Update on public.p Update on public.p1 Update on public.p2 Foreign Update on public.p3 -> Nested Loop Output: (p1.aa + 1), p1.ctid, f.* -> Seq Scan on public.p1 Output: p1.aa, p1.ctid -> Foreign Scan on public.ffoo f Output: f.*, f.a Remote SQL: SELECT a FROM public.foo WHERE (($1::integer = a)) -> Nested Loop Output: (p2.aa + 1), p2.ctid, f.* -> Seq Scan on public.p2 Output: p2.aa, p2.ctid -> Foreign Scan on public.ffoo f Output: f.*, f.a Remote SQL: SELECT a FROM public.foo WHERE (($1::integer = a)) -> Foreign Update Remote SQL: UPDATE public.base3 r5 SET aa = (r5.aa + 1) FROM public.foo r2 WHERE ((r5.aa = r2.a)) (20 rows) Does that seem salvageable? Thanks, Amit
On 2019/02/20 10:55, Amit Langote wrote: > Maybe I should've mentioned that on this thread at some point. I meant the other thread where we're discussing my patches. Thanks, Amit
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: > On 2019/02/20 6:48, Tom Lane wrote: >> What if we dropped that idea, and instead defined the plan tree as >> returning only the columns that are updated by SET, plus the row >> identity? It would then be the ModifyTable node's job to fetch the >> original tuple using the row identity (which it must do anyway) and >> form the new tuple by combining the updated columns from the plan >> output with the non-updated columns from the original tuple. > Regarding child target relations that are foreign tables, the > expand-target-inheritance-at-the-bottom approach perhaps leaves no way to > allow pushing the update (possibly with joins) to remote side? That's something we'd need to think about. Obviously, anything along this line breaks the existing FDW update APIs, but let's assume that's acceptable. Is it impossible, or even hard, for an FDW to support this definition of UPDATE rather than the existing one? I don't think so --- it seems like it's just different --- but I might well be missing something. regards, tom lane
On Wed, Feb 20, 2019 at 4:23 AM David Rowley <david.rowley@2ndquadrant.com> wrote:
On Wed, 20 Feb 2019 at 10:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> What if we dropped that idea, and instead defined the plan tree as
> returning only the columns that are updated by SET, plus the row
> identity? It would then be the ModifyTable node's job to fetch the
> original tuple using the row identity (which it must do anyway) and
> form the new tuple by combining the updated columns from the plan
> output with the non-updated columns from the original tuple.
>
> DELETE would be even simpler, since it only needs the row identity
> and nothing else.
While I didn't look at the patch in great detail, I think this is how
Pavan must have made MERGE work for partitioned targets. I recall
seeing the tableoid being added to the target list and a lookup of the
ResultRelInfo by tableoid.
Maybe Pavan can provide more useful details than I can.
Yes, that's the approach I took in MERGE, primarily because of the hurdles I faced in handling partitioned tables, which take entirely different route for UPDATE/DELETE vs INSERT and in MERGE we had to do all three together. But the approach also showed significant performance improvements. UPDATE/DELETE via MERGE is far quicker as compared to regular UPDATE/DELETE when there are non-trivial number of partitions. That's also a reason why I recommended doing the same for regular UPDATE/DELETE, but that got lost in the MERGE discussions. So +1 for the approach.
We will need to consider how this affects EvalPlanQual which currently doesn't have to do anything special for partitioned tables. I solved that via tracking the expanded-at-the-bottom child in a separate mergeTargetRelation, but that approach has been criticised. May be Tom's idea doesn't have the same problem or most likely he will have a far better approach to address that.
Thanks,
Pavan
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
PostgreSQL Development, 24x7 Support, Training & Services
Pavan Deolasee <pavan.deolasee@gmail.com> writes: > We will need to consider how this affects EvalPlanQual which currently > doesn't have to do anything special for partitioned tables. I solved that > via tracking the expanded-at-the-bottom child in a separate > mergeTargetRelation, but that approach has been criticised. May be Tom's > idea doesn't have the same problem or most likely he will have a far better > approach to address that. I did spend a few seconds thinking about that, and my gut says that this wouldn't change anything interesting for EPQ. But the devil is in the details as always, so maybe working out the patch would find problems ... regards, tom lane
On 2019/02/20 13:54, Tom Lane wrote: > Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: >> On 2019/02/20 6:48, Tom Lane wrote: >>> What if we dropped that idea, and instead defined the plan tree as >>> returning only the columns that are updated by SET, plus the row >>> identity? It would then be the ModifyTable node's job to fetch the >>> original tuple using the row identity (which it must do anyway) and >>> form the new tuple by combining the updated columns from the plan >>> output with the non-updated columns from the original tuple. > >> Regarding child target relations that are foreign tables, the >> expand-target-inheritance-at-the-bottom approach perhaps leaves no way to >> allow pushing the update (possibly with joins) to remote side? > > That's something we'd need to think about. Obviously, anything > along this line breaks the existing FDW update APIs, but let's assume > that's acceptable. Is it impossible, or even hard, for an FDW to > support this definition of UPDATE rather than the existing one? > I don't think so --- it seems like it's just different --- but > I might well be missing something. IIUC, in the new approach, only the root of the inheritance tree (target table specified in the query) will appear in the query's join tree, not the child target tables, so pushing updates with joins to the remote side seems a bit hard, because we're not going to consider child joins. Maybe I'm missing something though. Thanks, Amit
(2019/02/20 6:48), Tom Lane wrote: > In the case of a standard inheritance or partition tree, this seems to > go through really easily, since all the children could share the same > returned CTID column (I guess you'd also need a TABLEOID column so you > could figure out which table to direct the update back into). It gets > a bit harder if the tree contains some foreign tables, because they might > have different concepts of row identity, but I'd think in most cases you > could still combine those into a small number of output columns. If this is the direction we go in, we should work on the row ID issue [1] before this? Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/1590.1542393315%40sss.pgh.pa.us
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: > On 2019/02/20 13:54, Tom Lane wrote: >> That's something we'd need to think about. Obviously, anything >> along this line breaks the existing FDW update APIs, but let's assume >> that's acceptable. Is it impossible, or even hard, for an FDW to >> support this definition of UPDATE rather than the existing one? >> I don't think so --- it seems like it's just different --- but >> I might well be missing something. > IIUC, in the new approach, only the root of the inheritance tree (target > table specified in the query) will appear in the query's join tree, not > the child target tables, so pushing updates with joins to the remote side > seems a bit hard, because we're not going to consider child joins. Maybe > I'm missing something though. Hm. Even if that's true (I'm not convinced), I don't think it's such a significant use-case as to be considered a blocker. regards, tom lane
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes: > (2019/02/20 6:48), Tom Lane wrote: >> In the case of a standard inheritance or partition tree, this seems to >> go through really easily, since all the children could share the same >> returned CTID column (I guess you'd also need a TABLEOID column so you >> could figure out which table to direct the update back into). It gets >> a bit harder if the tree contains some foreign tables, because they might >> have different concepts of row identity, but I'd think in most cases you >> could still combine those into a small number of output columns. > If this is the direction we go in, we should work on the row ID issue > [1] before this? Certainly, the more foreign tables can use a standardized concept of row identity, the better this would work. What I'm loosely envisioning is that we have one junk row-identity column for each distinct row-identity datatype needed --- so, for instance, all ordinary tables could share a TID column. Different FDWs might need different things, though one would hope for no more than one datatype per FDW-type involved in the inheritance tree. Where things could break down is if we have a lot of tables that need a whole-row-variable for row identity, and they're all different rowtypes --- eventually we'd run out of available columns. So we'd definitely wish to encourage FDWs to have some more efficient row-identity scheme than that one. I don't see that as being something that constrains those two projects to be done in a particular order; it'd just be a nice-to-have improvement. regards, tom lane
(2019/02/21 0:14), Tom Lane wrote: > Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> writes: >> (2019/02/20 6:48), Tom Lane wrote: >>> In the case of a standard inheritance or partition tree, this seems to >>> go through really easily, since all the children could share the same >>> returned CTID column (I guess you'd also need a TABLEOID column so you >>> could figure out which table to direct the update back into). It gets >>> a bit harder if the tree contains some foreign tables, because they might >>> have different concepts of row identity, but I'd think in most cases you >>> could still combine those into a small number of output columns. > >> If this is the direction we go in, we should work on the row ID issue >> [1] before this? > > Certainly, the more foreign tables can use a standardized concept of row > identity, the better this would work. What I'm loosely envisioning is > that we have one junk row-identity column for each distinct row-identity > datatype needed --- so, for instance, all ordinary tables could share > a TID column. Different FDWs might need different things, though one > would hope for no more than one datatype per FDW-type involved in the > inheritance tree. Where things could break down is if we have a lot > of tables that need a whole-row-variable for row identity, and they're > all different rowtypes --- eventually we'd run out of available columns. > So we'd definitely wish to encourage FDWs to have some more efficient > row-identity scheme than that one. Seems reasonable. I guess that that can address not only the issue [1] but our restriction on FDW row locking that APIs for that currently only allow TID for row identity, in a uniform way. > I don't see that as being something that constrains those two projects > to be done in a particular order; it'd just be a nice-to-have improvement. OK, thanks for the explanation! Best regards, Etsuro Fujita
Hi Tom, On Wed, Feb 20, 2019 at 6:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Obviously this'd be a major rewrite with no chance of making it into v12, > but it doesn't sound too big to get done during v13. Are you planning to work on this? Thanks, Amit
Amit Langote <amitlangote09@gmail.com> writes: > On Wed, Feb 20, 2019 at 6:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Obviously this'd be a major rewrite with no chance of making it into v12, >> but it doesn't sound too big to get done during v13. > Are you planning to work on this? It's on my list, but so are a lot of other things. If you'd like to work on it, feel free. regards, tom lane
On Wed, Jul 3, 2019 at 10:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Amit Langote <amitlangote09@gmail.com> writes: > > On Wed, Feb 20, 2019 at 6:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Obviously this'd be a major rewrite with no chance of making it into v12, > >> but it doesn't sound too big to get done during v13. > > > Are you planning to work on this? > > It's on my list, but so are a lot of other things. If you'd like to > work on it, feel free. Thanks for the reply. Let me see if I can get something done for the September CF. Regards, Amit