Обсуждение: Inconsistent nullingrels due to oversight in deconstruct_distribute_oj_quals

Поиск
Список
Период
Сортировка

Inconsistent nullingrels due to oversight in deconstruct_distribute_oj_quals

От
Richard Guo
Дата:
When we try to generate qual variants with different nullingrels in
deconstruct_distribute_oj_quals, we traverse all the JoinTreeItems and
adjust qual nulling bits as we crawl up the join tree.  For a
SpecialJoinInfo which commutes with current sjinfo from below left, in
the next level up it would null all the relids in its righthand.  So we
adjust qual nulling bits as below.

     /*
      * Adjust qual nulling bits for next level up, if needed.  We
      * don't want to put sjinfo's own bit in at all, and if we're
      * above sjinfo then we did it already.
      */
     if (below_sjinfo)
         quals = (List *)
             add_nulling_relids((Node *) quals,
                                othersj->min_righthand,
                                bms_make_singleton(othersj->ojrelid));

It seems to me there is oversight here.  Actually in next level up this
othersj would null all the relids in its syn_righthand, not only the
relids in its min_righthand.   If the quals happen to contain references
to relids which are in othersj->syn_righthand but not in
othersj->min_righthand, these relids would not get updated with
othersj->ojrelid added.  And this would cause qual nulling bits not
consistent.

I've managed to devise a query that can show this problem.

create table t1(a int, b int);
create table t2(a int, b int);
create table t3(a int, b int);
create table t4(a int, b int);

insert into t1 select i, i from generate_series(1,10)i;
insert into t2 select i, i from generate_series(1,10)i;
insert into t3 select i, i from generate_series(1,1000)i;
insert into t4 select i, i from generate_series(1,1000)i;
analyze;

select * from t1 left join (t2 left join t3 on t2.a > t3.a) on t1.b = t2.b left join t4 on t2.b = t3.b;

This query would trigger the Assert() in search_indexed_tlist_for_var.
So I wonder that we should use othersj->syn_righthand here.

--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -2046,7 +2046,7 @@ deconstruct_distribute_oj_quals(PlannerInfo *root,
        if (below_sjinfo)
            quals = (List *)
                add_nulling_relids((Node *) quals,
-                                  othersj->min_righthand,
+                                  othersj->syn_righthand,
                                   bms_make_singleton(othersj->ojrelid));

Thanks
Richard

Re: Inconsistent nullingrels due to oversight in deconstruct_distribute_oj_quals

От
Tom Lane
Дата:
Richard Guo <guofenglinux@gmail.com> writes:
> It seems to me there is oversight here.  Actually in next level up this
> othersj would null all the relids in its syn_righthand, not only the
> relids in its min_righthand.

Good point.  I think this code originated before it was clear to me
that nullingrels would need to follow the syntactic structure.

> This query would trigger the Assert() in search_indexed_tlist_for_var.
> So I wonder that we should use othersj->syn_righthand here.

There are two such calls in deconstruct_distribute_oj_quals ...
don't they both need this change?

            regards, tom lane



Re: Inconsistent nullingrels due to oversight in deconstruct_distribute_oj_quals

От
Tom Lane
Дата:
I wrote:
> Richard Guo <guofenglinux@gmail.com> writes:
>> It seems to me there is oversight here.  Actually in next level up this
>> othersj would null all the relids in its syn_righthand, not only the
>> relids in its min_righthand.

> Good point.  I think this code originated before it was clear to me
> that nullingrels would need to follow the syntactic structure.

Although ... the entire point here is that we're trying to build quals
that don't match the original syntactic structure.  I'm worried that
distribute_qual_to_rels will do the wrong thing (put the qual at the
wrong level) if we add more nullingrel bits than we meant to.  This
might be less trivial than it appears.

            regards, tom lane



Re: Inconsistent nullingrels due to oversight in deconstruct_distribute_oj_quals

От
Richard Guo
Дата:

On Thu, Feb 9, 2023 at 11:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
> This query would trigger the Assert() in search_indexed_tlist_for_var.
> So I wonder that we should use othersj->syn_righthand here.

There are two such calls in deconstruct_distribute_oj_quals ...
don't they both need this change?
 
Yeah, I wondered about that too, but didn't manage to devise a query
that can show the problem caused by the call for 'above_sjinfo' case.
After a night of sleep I came up with one this morning. :-)

create table t (a int, b int);

insert into t select i, i from generate_series(1,10)i;
analyze t;

select * from t t1 left join t t2 left join t t3 on t2.b = t3.b left join t t4 on t2.a > t3.a on t2.a > t1.a;

In this query, for the qual 't2.a > t3.a', when we try to push t3/t4
join to above t1/t2 join, we fail to add t1/t2 ojrelid to
nullingrels of t3.a, because t3 is not in t1/t2 join's min_righthand
(but in its syn_righthand).  We really should have done that because
after the commutation t1/t2 join can null not only t2 but also t3 in
this case.

Thanks
Richard

Re: Inconsistent nullingrels due to oversight in deconstruct_distribute_oj_quals

От
Richard Guo
Дата:

On Fri, Feb 10, 2023 at 11:08 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Thu, Feb 9, 2023 at 11:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
> This query would trigger the Assert() in search_indexed_tlist_for_var.
> So I wonder that we should use othersj->syn_righthand here.

There are two such calls in deconstruct_distribute_oj_quals ...
don't they both need this change?
 
Yeah, I wondered about that too, but didn't manage to devise a query
that can show the problem caused by the call for 'above_sjinfo' case.
After a night of sleep I came up with one this morning. :-)

create table t (a int, b int);

insert into t select i, i from generate_series(1,10)i;
analyze t;

select * from t t1 left join t t2 left join t t3 on t2.b = t3.b left join t t4 on t2.a > t3.a on t2.a > t1.a;

In this query, for the qual 't2.a > t3.a', when we try to push t3/t4
join to above t1/t2 join, we fail to add t1/t2 ojrelid to
nullingrels of t3.a, because t3 is not in t1/t2 join's min_righthand
(but in its syn_righthand).  We really should have done that because
after the commutation t1/t2 join can null not only t2 but also t3 in
this case.
 
However, for 'above_sjinfo' case, we should not use
othersj->syn_righthand,  because othersj->syn_righthand contains relids
in sjinfo's righthand which should not be nulled by othersj after the
commutation.  It seems what we should use here is sjinfo->syn_lefthand.

--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -1990,7 +1990,7 @@ deconstruct_distribute_oj_quals(PlannerInfo *root,
        if (above_sjinfo)
            quals = (List *)
                add_nulling_relids((Node *) quals,
-                                  othersj->min_righthand,
+                                  sjinfo->syn_lefthand,
                                   bms_make_singleton(othersj->ojrelid));

Thanks
Richard

Re: Inconsistent nullingrels due to oversight in deconstruct_distribute_oj_quals

От
Tom Lane
Дата:
Richard Guo <guofenglinux@gmail.com> writes:
> However, for 'above_sjinfo' case, we should not use
> othersj->syn_righthand,  because othersj->syn_righthand contains relids
> in sjinfo's righthand which should not be nulled by othersj after the
> commutation.  It seems what we should use here is sjinfo->syn_lefthand.

I had a hard time wrapping my brain around that to start with, but
now I think you're right.  othersj is syntactically above the current
join, so its syn_righthand will cover all of the current join, but
we only want to add nulling bits to Vars of the current join's LHS.
(That is, we need to transform Pbc to Pb*c, not Pb*c*.)

I also realized that there was a fairly critical nearby bug:
make_outerjoininfo was failing to check whether the upper join's qual
is actually of the form "Pbc", without any references to the lower join's
LHS.  So that led us to setting commute bits in some cases where we
shouldn't, further confusing deconstruct_distribute_oj_quals.  (I think
this snuck in because its other code path doesn't need to make such a
check, it being syntactically impossible to have such a reference if
we start from the other form of the identity.)

Fix pushed.  This seems to take care of Robins' latest example in
the bug #17781 thread [1], too.

            regards, tom lane

[1] https://postgr.es/m/CAEP4nAx9C5gXNBfEA0JBfz7B+5f1Bawt-RWQWyhev-wdps8BZA@mail.gmail.com