Обсуждение: MERGE ... WHEN NOT MATCHED BY SOURCE

Поиск
Список
Период
Сортировка

MERGE ... WHEN NOT MATCHED BY SOURCE

От
Dean Rasheed
Дата:
This allows MERGE to UPDATE or DELETE target rows where there is no
matching source row. In addition, it allows the existing "WHEN NOT
MATCHED" syntax to include an optional "BY TARGET" to make its meaning
more explicit. E.g.,

MERGE INTO tgt USING src ON ...
  WHEN NOT MATCHED BY SOURCE THEN UPDATE/DELETE ...
  WHEN NOT MATCHED BY TARGET THEN INSERT ...

AFAIK, this is not part of the standard (though I only have a very old
draft copy). It is supported by at least 2 other major DB vendors
though, and I think it usefully rounds off the set of possible MERGE
actions.

Attached is a WIP patch. I haven't updated the docs yet, and there are
probably a few other things to tidy up and test, but the basic
functionality is there.

Regards,
Dean

Вложения

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Dean Rasheed
Дата:
On Fri, 30 Dec 2022 at 16:56, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> Attached is a WIP patch.
>

Updated patch attached, now with updated docs and some other minor tidying up.

Regards,
Dean

Вложения

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Alvaro Herrera
Дата:
I haven't read this patch other than superficially; I suppose the
feature it's introducing is an OK one to have as an extension to the
standard.  (I hope the community members that are committee members 
will propose this extension to become part of the standard.)

On 2023-Jan-02, Dean Rasheed wrote:

> --- a/src/backend/optimizer/prep/preptlist.c
> +++ b/src/backend/optimizer/prep/preptlist.c
> @@ -157,15 +157,14 @@ preprocess_targetlist(PlannerInfo *root)
>              /*
>               * Add resjunk entries for any Vars used in each action's
>               * targetlist and WHEN condition that belong to relations other
> -             * than target.  Note that aggregates, window functions and
> -             * placeholder vars are not possible anywhere in MERGE's WHEN
> -             * clauses.  (PHVs may be added later, but they don't concern us
> -             * here.)
> +             * than target.  Note that aggregates and window functions are not
> +             * possible anywhere in MERGE's WHEN clauses, but PlaceHolderVars
> +             * may have been added by subquery pullup.
>               */
>              vars = pull_var_clause((Node *)
>                                     list_concat_copy((List *) action->qual,
>                                                      action->targetList),
> -                                   0);
> +                                   PVC_INCLUDE_PLACEHOLDERS);

Hmm, is this new because of NOT MATCHED BY SOURCE, or is it something
that can already be hit by existing features of MERGE?  In other words
-- is this a bug fix that should be backpatched ahead of introducing NOT
MATCHED BY SOURCE?

> @@ -127,10 +143,12 @@ transformMergeStmt(ParseState *pstate, M
>       */
>      is_terminal[0] = false;
>      is_terminal[1] = false;
> +    is_terminal[2] = false;

I think these 0/1/2 should be replaced by the values of MergeMatchKind.

> +    /* Join type required */
> +    if (left_join && right_join)
> +        qry->mergeJoinType = JOIN_FULL;
> +    else if (left_join)
> +        qry->mergeJoinType = JOIN_LEFT;
> +    else if (right_join)
> +        qry->mergeJoinType = JOIN_RIGHT;
> +    else
> +        qry->mergeJoinType = JOIN_INNER;

One of the review comments that MERGE got initially was that parse
analysis was not a place to "do query optimization", in the sense that
the original code was making a decision whether to make an outer or
inner join based on the set of WHEN clauses that appear in the command.
That's how we ended up with transform_MERGE_to_join and
mergeUseOuterJoin instead.  This new code is certainly not the same, but
it makes me a bit unconfortable.  Maybe it's OK, though.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Dean Rasheed
Дата:
On Thu, 5 Jan 2023 at 11:03, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> I haven't read this patch other than superficially; I suppose the
> feature it's introducing is an OK one to have as an extension to the
> standard.  (I hope the community members that are committee members
> will propose this extension to become part of the standard.)
>

Thanks for looking!

> > --- a/src/backend/optimizer/prep/preptlist.c
> > +++ b/src/backend/optimizer/prep/preptlist.c
> > @@ -157,15 +157,14 @@ preprocess_targetlist(PlannerInfo *root)
> >                       /*
> >                        * Add resjunk entries for any Vars used in each action's
> >                        * targetlist and WHEN condition that belong to relations other
> > -                      * than target.  Note that aggregates, window functions and
> > -                      * placeholder vars are not possible anywhere in MERGE's WHEN
> > -                      * clauses.  (PHVs may be added later, but they don't concern us
> > -                      * here.)
> > +                      * than target.  Note that aggregates and window functions are not
> > +                      * possible anywhere in MERGE's WHEN clauses, but PlaceHolderVars
> > +                      * may have been added by subquery pullup.
> >                        */
> >                       vars = pull_var_clause((Node *)
> >                                                                  list_concat_copy((List *) action->qual,
> >
action->targetList),
> > -                                                                0);
> > +                                                                PVC_INCLUDE_PLACEHOLDERS);
>
> Hmm, is this new because of NOT MATCHED BY SOURCE, or is it something
> that can already be hit by existing features of MERGE?  In other words
> -- is this a bug fix that should be backpatched ahead of introducing NOT
> MATCHED BY SOURCE?
>

It's new because of NOT MATCHED BY SOURCE, and I also found that I had
to make the same change in the MERGE INTO view patch, in the case
where the target view is simple enough to allow subquery pullup, but
also had INSTEAD OF triggers causing the pullup to happen in the
planner rather than the rewriter.

I couldn't think of a way that it could happen with the existing MERGE
code though, so I don't think it's a bug that needs fixing and
back-patching.

> > @@ -127,10 +143,12 @@ transformMergeStmt(ParseState *pstate, M
> >        */
> >       is_terminal[0] = false;
> >       is_terminal[1] = false;
> > +     is_terminal[2] = false;
>
> I think these 0/1/2 should be replaced by the values of MergeMatchKind.
>

Agreed.

> > +     /* Join type required */
> > +     if (left_join && right_join)
> > +             qry->mergeJoinType = JOIN_FULL;
> > +     else if (left_join)
> > +             qry->mergeJoinType = JOIN_LEFT;
> > +     else if (right_join)
> > +             qry->mergeJoinType = JOIN_RIGHT;
> > +     else
> > +             qry->mergeJoinType = JOIN_INNER;
>
> One of the review comments that MERGE got initially was that parse
> analysis was not a place to "do query optimization", in the sense that
> the original code was making a decision whether to make an outer or
> inner join based on the set of WHEN clauses that appear in the command.
> That's how we ended up with transform_MERGE_to_join and
> mergeUseOuterJoin instead.  This new code is certainly not the same, but
> it makes me a bit unconfortable.  Maybe it's OK, though.
>

Yeah I agree, it's a bit ugly. Perhaps a better solution would be to
do away with that field entirely and just make the decision in
transform_MERGE_to_join() by examining the action list again. That
would require making MergeAction's "matched" field a MergeMatchKind
rather than a bool, but maybe that's not so bad, since retaining that
information might prove useful one day.

Regards,
Dean



Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Dean Rasheed
Дата:
On Thu, 5 Jan 2023 at 13:21, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> On Thu, 5 Jan 2023 at 11:03, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > > +     /* Join type required */
> > > +     if (left_join && right_join)
> > > +             qry->mergeJoinType = JOIN_FULL;
> > > +     else if (left_join)
> > > +             qry->mergeJoinType = JOIN_LEFT;
> > > +     else if (right_join)
> > > +             qry->mergeJoinType = JOIN_RIGHT;
> > > +     else
> > > +             qry->mergeJoinType = JOIN_INNER;
> >
> > One of the review comments that MERGE got initially was that parse
> > analysis was not a place to "do query optimization", in the sense that
> > the original code was making a decision whether to make an outer or
> > inner join based on the set of WHEN clauses that appear in the command.
> > That's how we ended up with transform_MERGE_to_join and
> > mergeUseOuterJoin instead.  This new code is certainly not the same, but
> > it makes me a bit unconfortable.  Maybe it's OK, though.
> >
>
> Yeah I agree, it's a bit ugly. Perhaps a better solution would be to
> do away with that field entirely and just make the decision in
> transform_MERGE_to_join() by examining the action list again.
>

Attached is an updated patch taking that approach, allowing
mergeUseOuterJoin to be removed from the Query node, which I think is
probably a good thing.

Aside from that, it includes a few additional comment updates in the
executor that I'd missed, and psql tab completion support.

Regards,
Dean

Вложения

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Dean Rasheed
Дата:
Rebased version attached.

Regards,
Dean

Вложения

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Dean Rasheed
Дата:
On Tue, 10 Jan 2023 at 14:43, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> Rebased version attached.
>

Rebased version, following 8eba3e3f02 and 5d29d525ff.

Regards,
Dean

Вложения

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Ted Yu
Дата:


On Sat, Jan 21, 2023 at 3:05 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
On Tue, 10 Jan 2023 at 14:43, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> Rebased version attached.
>

Rebased version, following 8eba3e3f02 and 5d29d525ff.

Regards,
Dean
Hi,
In transform_MERGE_to_join :

+                       if (action->matchKind == MERGE_WHEN_NOT_MATCHED_BY_SOURCE)
+                               tgt_only_tuples = true;
+                       if (action->matchKind == MERGE_WHEN_NOT_MATCHED_BY_TARGET)

There should be an `else` in front of the second `if`.
When tgt_only_tuples and src_only_tuples are both true, we can come out of the loop.

Cheers

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Dean Rasheed
Дата:
On Sat, 21 Jan 2023 at 14:18, Ted Yu <yuzhihong@gmail.com> wrote:
>
> On Sat, Jan 21, 2023 at 3:05 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>>
>> Rebased version, following 8eba3e3f02 and 5d29d525ff.
>>

Another rebased version attached.

> In transform_MERGE_to_join :
>
> +                       if (action->matchKind == MERGE_WHEN_NOT_MATCHED_BY_SOURCE)
> +                               tgt_only_tuples = true;
> +                       if (action->matchKind == MERGE_WHEN_NOT_MATCHED_BY_TARGET)
>
> There should be an `else` in front of the second `if`.
> When tgt_only_tuples and src_only_tuples are both true, we can come out of the loop.
>

I decided not to do that. Adding an "else" doesn't change the code
that the compiler generates, and IMO it's slightly more readable
without it, since it keeps the line length shorter, and the test
conditions aligned, but that's a matter of opinion / personal
preference.

I think adding extra logic to exit the loop early if both
tgt_only_tuples and src_only_tuples are true would be a premature
optimisation, increasing the code size for no real benefit. In
practice, there are unlikely to be more than a few merge actions in
the list.

Regards,
Dean

Вложения

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Vik Fearing
Дата:
On 1/4/23 12:57, Alvaro Herrera wrote:
> I haven't read this patch other than superficially; I suppose the
> feature it's introducing is an OK one to have as an extension to the
> standard.  (I hope the community members that are committee members
> will propose this extension to become part of the standard.)


I have been doing some research on this, reading the original papers 
that introduced the feature and its improvements.

I don't see anything that ever considered what this patch proposes, even 
though SQL Server has it. (The initial MERGE didn't even have DELETE!)

SOURCE and TARGET are not currently keywords, but the only things that 
can come after MATCHED are THEN and AND, so I don't foresee any issues 
with us implementing this before the committee accepts such a change 
proposal.  I also don't see how the committee could possibly change the 
semantics of this, and two implementations having it is a good argument 
for getting it in.

We should be cautious in doing something differently from SQL Server 
here, and I would appreciate any differences being brought to my 
attention so I can incorporate them into a specification, even if that 
means resorting to the hated "implementation-defined".
-- 
Vik Fearing




Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Dean Rasheed
Дата:
I see the PlaceHolderVar issue turned out to be a pre-existing bug after all.
Rebased version attached.

Regards,
Dean

Вложения

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Alvaro Herrera
Дата:
On 2023-Mar-19, Dean Rasheed wrote:

> diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
> new file mode 100644
> index efe88cc..e1ebc8d
> --- a/src/backend/parser/gram.y
> +++ b/src/backend/parser/gram.y

> +merge_when_tgt_matched:
> +            WHEN MATCHED                    { $$ = MERGE_WHEN_MATCHED; }
> +            | WHEN NOT MATCHED BY SOURCE    { $$ = MERGE_WHEN_NOT_MATCHED_BY_SOURCE; }
> +        ;

I think a one-line comment on why this "matched" production matches "NOT
MATCHED BY" would be useful.  I think you have a big one in
transformMergeStmt already.


> +            /* Combine it with the action's WHEN condition */
> +            if (action->qual == NULL)
> +                action->qual = (Node *) ntest;
> +            else
> +                action->qual =
> +                    (Node *) makeBoolExpr(AND_EXPR,
> +                                          list_make2(ntest, action->qual),
> +                                          -1);

Hmm, I think ->qual is already in implicit-and form, so do you really
need to makeBoolExpr, or would it be sufficient to append this new
condition to the list?

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"El miedo atento y previsor es la madre de la seguridad" (E. Burke)



Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Dean Rasheed
Дата:
On Tue, 21 Mar 2023 at 10:28, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> > +                     /* Combine it with the action's WHEN condition */
> > +                     if (action->qual == NULL)
> > +                             action->qual = (Node *) ntest;
> > +                     else
> > +                             action->qual =
> > +                                     (Node *) makeBoolExpr(AND_EXPR,
> > +                                                                               list_make2(ntest, action->qual),
> > +                                                                               -1);
>
> Hmm, I think ->qual is already in implicit-and form, so do you really
> need to makeBoolExpr, or would it be sufficient to append this new
> condition to the list?
>

No, this has come directly from transformWhereClause() in the parser,
so it's an expression tree, not a list. Transforming to implicit-and
form doesn't happen until later.

Looking at it with fresh eyes though, I realise that I could have just written

    action->qual = make_and_qual((Node *) ntest, action->qual);

which is equivalent, but more concise.

Regards,
Dean



Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Alvaro Herrera
Дата:
On 2023-Mar-21, Dean Rasheed wrote:

> Looking at it with fresh eyes though, I realise that I could have just written
> 
>     action->qual = make_and_qual((Node *) ntest, action->qual);
> 
> which is equivalent, but more concise.

Nice.

I have no further observations about this patch.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Dean Rasheed
Дата:
On Tue, 21 Mar 2023 at 12:26, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2023-Mar-21, Dean Rasheed wrote:
>
> > Looking at it with fresh eyes though, I realise that I could have just written
> >
> >     action->qual = make_and_qual((Node *) ntest, action->qual);
> >
> > which is equivalent, but more concise.
>
> Nice.
>
> I have no further observations about this patch.
>

Looking at this one afresh, it seems that the change to make Vars
outer-join aware broke it -- the Var in the qual to test whether the
source row is null needs to be marked as nullable by the join added by
transform_MERGE_to_join(). That's something that needs to be done in
transform_MERGE_to_join(), so it makes more sense to add the new qual
there rather than in transformMergeStmt().

Also, now that MERGE has ruleutils support, it's clear that adding the
qual in transformMergeStmt() isn't right anyway, since it would then
appear in the deparsed output.

So attached is an updated patch doing that, which seems neater all
round, since adding the qual is closely related to the join-type
choice, which is now a decision taken entirely in
transform_MERGE_to_join(). This requires a new "mergeSourceRelation"
field on the Query structure, but as before, it does away with the
"mergeUseOuterJoin" field.

I've also updated the ruleutils support. In the absence of any WHEN
NOT MATCHED BY SOURCE actions, this will output not-matched actions
simply as "WHEN NOT MATCHED" for backwards compatibility, and to be
SQL-standard-compliant. If there are any WHEN NOT MATCHED BY SOURCE
actions though, I think it's preferable to output explicit "BY SOURCE"
and "BY TARGET" qualifiers for all not-matched actions, to make the
meaning clearer.

Regards,
Dean

Вложения

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Peter Smith
Дата:
Hi, this patch was marked in CF as "Needs Review" [1], but there has
been no activity on this thread for 6+ months.

Is anything else planned? Can you post something to elicit more
interest in the latest patch? Otherwise, if nothing happens then the
CF entry will be closed ("Returned with feedback") at the end of this
CF.

======
[1]  https://commitfest.postgresql.org/46/4092/

Kind Regards,
Peter Smith.



Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
vignesh C
Дата:
On Sat, 1 Jul 2023 at 18:04, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> On Tue, 21 Mar 2023 at 12:26, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > On 2023-Mar-21, Dean Rasheed wrote:
> >
> > > Looking at it with fresh eyes though, I realise that I could have just written
> > >
> > >     action->qual = make_and_qual((Node *) ntest, action->qual);
> > >
> > > which is equivalent, but more concise.
> >
> > Nice.
> >
> > I have no further observations about this patch.
> >
>
> Looking at this one afresh, it seems that the change to make Vars
> outer-join aware broke it -- the Var in the qual to test whether the
> source row is null needs to be marked as nullable by the join added by
> transform_MERGE_to_join(). That's something that needs to be done in
> transform_MERGE_to_join(), so it makes more sense to add the new qual
> there rather than in transformMergeStmt().
>
> Also, now that MERGE has ruleutils support, it's clear that adding the
> qual in transformMergeStmt() isn't right anyway, since it would then
> appear in the deparsed output.
>
> So attached is an updated patch doing that, which seems neater all
> round, since adding the qual is closely related to the join-type
> choice, which is now a decision taken entirely in
> transform_MERGE_to_join(). This requires a new "mergeSourceRelation"
> field on the Query structure, but as before, it does away with the
> "mergeUseOuterJoin" field.
>
> I've also updated the ruleutils support. In the absence of any WHEN
> NOT MATCHED BY SOURCE actions, this will output not-matched actions
> simply as "WHEN NOT MATCHED" for backwards compatibility, and to be
> SQL-standard-compliant. If there are any WHEN NOT MATCHED BY SOURCE
> actions though, I think it's preferable to output explicit "BY SOURCE"
> and "BY TARGET" qualifiers for all not-matched actions, to make the
> meaning clearer.

CFBot shows that the patch does not apply anymore as in [1]:
=== Applying patches on top of PostgreSQL commit ID
f2bf8fb04886e3ea82e7f7f86696ac78e06b7e60 ===
=== applying patch ./support-merge-when-not-matched-by-source-v8.patch
...
patching file doc/src/sgml/ref/merge.sgml
Hunk #5 FAILED at 409.
Hunk #9 FAILED at 673.
2 out of 9 hunks FAILED -- saving rejects to file
doc/src/sgml/ref/merge.sgml.rej
..
patching file src/include/nodes/parsenodes.h
Hunk #1 succeeded at 175 (offset -8 lines).
Hunk #2 succeeded at 1657 (offset -6 lines).
Hunk #3 succeeded at 1674 (offset -6 lines).
Hunk #4 FAILED at 1696.
1 out of 4 hunks FAILED -- saving rejects to file
src/include/nodes/parsenodes.h.rej

Please post an updated version for the same.

[1] - http://cfbot.cputube.org/patch_46_4092.log

Regards,
Vignesh



Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Dean Rasheed
Дата:
On Mon, 22 Jan 2024 at 02:10, Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, this patch was marked in CF as "Needs Review" [1], but there has
> been no activity on this thread for 6+ months.
>
> Is anything else planned? Can you post something to elicit more
> interest in the latest patch? Otherwise, if nothing happens then the
> CF entry will be closed ("Returned with feedback") at the end of this
> CF.
>

I think it has had a decent amount of review and all the review
comments have been addressed. I'm not quite sure from Alvaro's last
comment whether he was implying that he thought it was ready for
commit.

Looking back through the thread, the general sentiment seems to be in
favour of adding this feature, and I still think it's worth doing, but
I haven't managed to find much time to progress it recently.

Regards,
Dean



Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Dean Rasheed
Дата:
n Fri, 26 Jan 2024 at 14:59, vignesh C <vignesh21@gmail.com> wrote:
>
> CFBot shows that the patch does not apply anymore as in [1]:
>

Rebased version attached.

Regards,
Dean

Вложения

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Alvaro Herrera
Дата:
On 2024-Jan-26, Dean Rasheed wrote:

> I think it has had a decent amount of review and all the review
> comments have been addressed. I'm not quite sure from Alvaro's last
> comment whether he was implying that he thought it was ready for
> commit.

Well, firstly this is clearly a feature we want to have, even though
it's non-standard, because people use it and other implementations have
it.  (Eh, so maybe somebody should be talking to the SQL standard
committee about it).  As for code quality, I didn't do a comprehensive
review, but I think it is quite reasonable.  Therefore, my inclination
would be to get it committed soonish, and celebrate it widely so that
people can test it soon and complain if they see something they don't
like.

I have to say that I find the idea of booting patches as Returned with
Feedback just because of inactivity (as opposed to unresponsive authors)
rather wrong-headed, and I wish we didn't do it.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Dean Rasheed
Дата:
On Fri, 26 Jan 2024 at 15:57, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> Well, firstly this is clearly a feature we want to have, even though
> it's non-standard, because people use it and other implementations have
> it.  (Eh, so maybe somebody should be talking to the SQL standard
> committee about it).  As for code quality, I didn't do a comprehensive
> review, but I think it is quite reasonable.  Therefore, my inclination
> would be to get it committed soonish, and celebrate it widely so that
> people can test it soon and complain if they see something they don't
> like.
>

Thanks. I have been going over this patch again, and for the most
part, I'm pretty happy with it.

One thing that's bothering me though is what happens if a row being
merged is concurrently updated. Specifically, if a concurrent update
causes a formerly matching row to no longer match the join condition,
and there are both NOT MATCHED BY SOURCE and NOT MATCHED BY TARGET
actions, so that it's doing in full join between the source and target
relations. In this case, when the EPQ mechanism rescans the subplan
node, there will be 2 possible output tuples (one with source null,
and one with target null), and EvalPlanQual() will just return the
first one, which is a more-or-less arbitrary choice, depending on the
type of join (hash/merge), and (for a mergejoin) the values of the
inner and outer join keys. Thus, it may execute a NOT MATCHED BY
SOURCE action, or a NOT MATCHED BY TARGET action, and it's difficult
to predict which.

Arguably it's not worth worrying too much about what happens in a
corner-case concurrent update like this, when MERGE is already
inconsistent under other concurrent update scenarios, but I don't like
having unpredictable results like this, which can depend on the plan
chosen.

I think the best (and probably simplest) solution is to always opt for
a NOT MATCHED BY TARGET action in this case, so then the result is
predictable, and we can document what is expected to happen.

Regards,
Dean



Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Dean Rasheed
Дата:
On Mon, 29 Jan 2024 at 10:07, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> One thing that's bothering me though is what happens if a row being
> merged is concurrently updated. Specifically, if a concurrent update
> causes a formerly matching row to no longer match the join condition,
> and there are both NOT MATCHED BY SOURCE and NOT MATCHED BY TARGET
> actions, so that it's doing in full join between the source and target
> relations. In this case, when the EPQ mechanism rescans the subplan
> node, there will be 2 possible output tuples (one with source null,
> and one with target null), and EvalPlanQual() will just return the
> first one, which is a more-or-less arbitrary choice, depending on the
> type of join (hash/merge), and (for a mergejoin) the values of the
> inner and outer join keys. Thus, it may execute a NOT MATCHED BY
> SOURCE action, or a NOT MATCHED BY TARGET action, and it's difficult
> to predict which.
>

I set out to rebase this on top of 5f2e179bd3 (support for MERGE into
views), and ended up hacking on it quite a bit. Aside from some
cosmetic stuff, I made 3 bigger changes:

1). It turned out that simply rebasing this didn't work for NOT
MATCHED BY SOURCE actions on an auto-updatable view. This was due to
the fact that transformMergeStmt() puts the quals from a MERGE's join
condition temporarily into query->jointree->quals, as if they were
normal WHERE quals. That's a problem, because when the rewriter
expands a target auto-updatable view with its own WHERE quals, they
end up getting added to the same overall set of WHERE quals, which
transform_MERGE_to_join() then attaches to the JoinExpr that it
constructs. That's not a problem for the INNER/RIGHT joins used
without this patch, but for the LEFT/FULL joins produced when there
are NOT MATCHED BY SOURCE actions, it produces incorrect results,
because the view's WHERE quals on the target relation need to be
underneath the JoinExpr, not on it, to work correctly when the source
row is null.

To fix that, I added a new Query->mergeJoinCondition field to keep the
MERGE join quals separate from the query's WHERE quals during query
rewriting. That seems like a good separation to have on general
grounds anyway, but it's crucial to make this patch work properly. I
added a few more tests and this now seems to work well.


2). Having added Query->mergeJoinCondition, it then made more sense to
use that in the executor to distinguish MATCHED candidate rows from
NOT MATCHED BY SOURCE ones, rather than hacking each individual
action's quals. This avoids an additional qual check for every action.
The executor now builds 3 lists of actions (one per match kind), and
ExecMergeMatched() decides at the start which list it needs to scan,
depending on whether or not the candidate row matches the join quals.
That seems somewhat neater, and helped with the next point.

I'm not entirely happy with this though, since it means that the join
quals get checked a second time when there are NOT MATCHED BY SOURCE
actions. It would be better if it could somehow get that information
out of the underlying join node, but I'm not sure how to do that.


3). Thinking more about what to do if a concurrent update turns a
matched candidate row into a not matched one, and there are both NOT
MATCHED BY SOURCE and NOT MATCHED BY TARGET actions, I think the right
thing to do is to execute one action of each kind, as would happen if
the source and target rows had started out not matching. That's much
better than arbitrarily preferring one kind of NOT MATCHED action over
the other.

That turned out to be relatively easy to achieve -- if
ExecMergeMatched() detects a concurrent update that causes the join
quals to no longer pass when they used to, it switches from the
MATCHED list of actions to the NOT MATCHED BY SOURCE list, before
rescanning and executing the first qualifying action. Then it returns
false instead of true, to cause ExecMerge() to call
ExecMergeNotMatched(), so that it also executes a NOT MATCHED BY
TARGET action. I extended the isolation tests to test that, and the
results look quite good.

That'll need a little tweaking if MERGE gets RETURNING support, since
it won't then be able to execute two actions in a single call to the
ModifyTable node. I think that should be fairly easy to deal with
though, just by setting a flag on the node to indicate that there is a
pending NOT MATCHED BY TARGET action to execute the next time it gets
called.

Regards,
Dean

Вложения

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Dean Rasheed
Дата:
Rebased version attached.

Regards,
Dean

Вложения

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Dean Rasheed
Дата:
On Wed, 13 Mar 2024 at 14:32, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> Rebased version attached.
>

Rebased version attached, on top of c649fa24a4 (MERGE ... RETURNING support).

Aside from some cosmetic stuff, I've updated several tests to test
this together with RETURNING.

The updated isolation test tests the new interesting case where a
concurrent update causes a matched case to become not matched, and
there are both NOT MATCHED BY SOURCE and NOT MATCHED BY TARGET actions
to execute, and RETURNING is specified so that it is forced to defer
the NOT MATCHED BY TARGET action until the next invocation of
ExecModifyTable(), in order to return the rows from both not matched
actions.

I also tried to tidy up ExecMergeMatched() a little --- since we know
that it's only ever called with matched = true, it's simpler to just
Assert that at the top, and then only touch it in the few cases where
it needs to be changed to false.

A lot of the updates are comment updates, to try to make it clearer
how concurrent updates are handled, since that's a little more complex
with this patch.

Regards,
Dean

Вложения

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Dean Rasheed
Дата:
On Mon, 18 Mar 2024 at 08:59, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> Rebased version attached, on top of c649fa24a4 (MERGE ... RETURNING support).
>

Trivial rebase forced by 6185c9737c.

Regards,
Dean

Вложения

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

От
Dean Rasheed
Дата:
On Thu, 21 Mar 2024 at 09:35, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> Trivial rebase forced by 6185c9737c.
>

I think it would be good to get this committed.

It has had a decent amount of review, at least up to v9, but a number
of things have changed since then:

1). Concurrent update behaviour -- now if a concurrent update causes a
matched candidate row to no longer match the join condition, it will
execute the first qualifying NOT MATCHED BY SOURCE action, and then
the first qualifying NOT MATCHED [BY TARGET] action. I.e., it may
execute 2 actions, which makes sense because if the rows had started
out not matching, the full join would have output 2 rows.

2). ResultRelInfo now has 3 lists of actions, one per match kind.
Previously I was putting the NOT MATCHED BY SOURCE actions in the same
list as the MATCHED actions, since they are both handled by
ExecMergeMatched(). However, to achieve (1) above, it turned out to be
easier to have 3 separate lists, and this makes some other code a
little neater.

3). I've added a new field Query.mergeJoinCondition so that
transformMergeStmt() no longer puts the join conditions in
qry->jointree->quals. That's necessary to make it work correctly on an
auto-updatable view which might have its own quals, but it also seems
neater anyway.

4). To distinguish the MATCHED case from NOT MATCHED BY SOURCE case in
the executor, it now uses the join condition (previously it added a
"source IS [NOT] NULL" clause to each merge action). This has the
advantage that it involves just one qual check per candidate row,
rather than one for each action. On the downside, it's checking the
join condition twice (since the source subplan's join node already
checked it), but I couldn't see an easy way round that. (It only does
this if there are both MATCHED and NOT MATCHED BY SOURCE actions, so
it's not making any existing queries worse.)

5). To support (4), I added new fields
ModifyTablePath.mergeJoinConditions, ModifyTable.mergeJoinConditions
and ResultRelInfo.ri_MergeJoinCondition, since the attribute numbers
in the join condition might vary by partition.

6). I got rid of Query.mergeSourceRelation, which is no longer needed,
because of (4). (And as before, it also gets rid of
Query.mergeUseOuterJoin, since the parser is no longer making the
decision about what kind of join to build.)

7). To support (1), I added a new field
ModifyTableState.mt_merge_pending_not_matched, because if it has to
execute 2 actions following a concurrent update, and there is a
RETURNING clause, it has to defer the second action until the next
call to ExecModifyTable().

8). I've added isolation tests to test (1).

9). I've added a lot more regression tests.

10). I've made a lot of comment changes in nodeModifyTable.c,
especially relating to the discussion around concurrent updates.

Overall, I feel like this is in pretty good shape, and it manages to
make a few code simplifications that look quite nice.

Regards,
Dean