Re: ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1
Дата
Msg-id 3139165.1687203444@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
I wrote:
> Richard Guo <guofenglinux@gmail.com> writes:
>> So it seems that we need to do nullingrel adjustments in a more common
>> place.

> I agree: this suggests that we fixed it in the wrong place.

So pursuant to that, 0001 attached reverts the code changes from bfd332b3f
and 63e4f13d2 (keeping the test cases and some unrelated comment fixes).
Then the question is what to do instead.  I've not come up with a better
idea than to hack it in identify_current_nestloop_params (per 0002), as
you proposed upthread.  I don't like this too much, as it's on the hairy
edge of making setrefs.c's nullingrel cross-checks completely useless for
NestLoopParams; but the alternatives aren't attractive either.

>> I think it exposes a new issue.  It seems that we extract a problematic
>> lateral_relids from lateral references within PlaceHolderVars in
>> create_lateral_join_info.  I doubt that we should use ph_lateral
>> directly.  It seems more reasonable to me that we strip outer-join
>> relids from ph_lateral and then use that for lateral_relids.

I experimented with that (0003) and it fixes your example query.
I think it is functionally okay, because the lateral_relids just need
to be a sufficient subset of the lateral references' requirements to
ensure we can evaluate them where needed; other mechanisms should ensure
that the right sorts of joins happen.  It seems a bit unsatisfying
though, especially given that we just largely lobotomized setrefs.c's
cross-checks for these same references.  I don't have a better idea
however, and beta2 is fast approaching.

            regards, tom lane

diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 5ba266fdb6..c2f211a60d 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -430,24 +430,6 @@ have_unsafe_outer_join_ref(PlannerInfo *root,
  * These are returned in parallel lists in *param_exprs and *operators.
  * We also set *binary_mode to indicate whether strict binary matching is
  * required.
- *
- * A complication is that innerrel's lateral_vars may contain nullingrel
- * markers that need adjustment.  This occurs if we have applied outer join
- * identity 3,
- *        (A leftjoin B on (Pab)) leftjoin C on (Pb*c)
- *        = A leftjoin (B leftjoin C on (Pbc)) on (Pab)
- * and C contains lateral references to B.  It's still safe to apply the
- * identity, but the parser will have created those references in the form
- * "b*" (i.e., with varnullingrels listing the A/B join), while what we will
- * have available from the nestloop's outer side is just "b".  We deal with
- * that here by stripping the nullingrels down to what is available from the
- * outer side according to outerrel->relids.
- * That fixes matters for the case of forward application of identity 3.
- * If the identity was applied in the reverse direction, we will have
- * innerrel's lateral_vars containing too few nullingrel bits rather than
- * too many.  Currently, that causes no problems because setrefs.c applies
- * only a subset check to nullingrels in NestLoopParams, but we'd have to
- * work harder if we ever want to tighten that check.
  */
 static bool
 paraminfo_get_equal_hashops(PlannerInfo *root, ParamPathInfo *param_info,
@@ -551,25 +533,6 @@ paraminfo_get_equal_hashops(PlannerInfo *root, ParamPathInfo *param_info,
             return false;
         }

-        /* OK, but adjust its nullingrels before adding it to result */
-        expr = copyObject(expr);
-        if (IsA(expr, Var))
-        {
-            Var           *var = (Var *) expr;
-
-            var->varnullingrels = bms_intersect(var->varnullingrels,
-                                                outerrel->relids);
-        }
-        else if (IsA(expr, PlaceHolderVar))
-        {
-            PlaceHolderVar *phv = (PlaceHolderVar *) expr;
-
-            phv->phnullingrels = bms_intersect(phv->phnullingrels,
-                                               outerrel->relids);
-        }
-        else
-            Assert(false);
-
         *operators = lappend_oid(*operators, typentry->eq_opr);
         *param_exprs = lappend(*param_exprs, expr);

diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ec5552327f..1ca26baa25 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -2289,11 +2289,9 @@ set_join_references(PlannerInfo *root, Join *join, int rtoffset)
              * the outer-join level at which they are used, Vars seen in the
              * NestLoopParam expression may have nullingrels that are just a
              * subset of those in the Vars actually available from the outer
-             * side.  Lateral references can create the same situation, as
-             * explained in the comments for process_subquery_nestloop_params
-             * and paraminfo_get_equal_hashops.  Not checking this exactly is
-             * a bit grotty, but the work needed to make things match up
-             * perfectly seems well out of proportion to the value.
+             * side.  Not checking this exactly is a bit grotty, but the work
+             * needed to make things match up perfectly seems well out of
+             * proportion to the value.
              */
             nlp->paramval = (Var *) fix_upper_expr(root,
                                                    (Node *) nlp->paramval,
diff --git a/src/backend/optimizer/util/paramassign.c b/src/backend/optimizer/util/paramassign.c
index e94f7e7563..66534c0a78 100644
--- a/src/backend/optimizer/util/paramassign.c
+++ b/src/backend/optimizer/util/paramassign.c
@@ -421,26 +421,8 @@ replace_nestloop_param_placeholdervar(PlannerInfo *root, PlaceHolderVar *phv)
  * provide these values.  This differs from replace_nestloop_param_var in
  * that the PARAM_EXEC slots to use have already been determined.
  *
- * An additional complication is that the subplan_params may contain
- * nullingrel markers that need adjustment.  This occurs if we have applied
- * outer join identity 3,
- *        (A leftjoin B on (Pab)) leftjoin C on (Pb*c)
- *        = A leftjoin (B leftjoin C on (Pbc)) on (Pab)
- * and C is a subquery containing lateral references to B.  It's still safe
- * to apply the identity, but the parser will have created those references
- * in the form "b*" (i.e., with varnullingrels listing the A/B join), while
- * what we will have available from the nestloop's outer side is just "b".
- * We deal with that here by stripping the nullingrels down to what is
- * available from the outer side according to root->curOuterRels.
- * That fixes matters for the case of forward application of identity 3.
- * If the identity was applied in the reverse direction, we will have
- * subplan_params containing too few nullingrel bits rather than too many.
- * Currently, that causes no problems because setrefs.c applies only a
- * subset check to nullingrels in NestLoopParams, but we'd have to work
- * harder if we ever want to tighten that check.
- *
  * Note that we also use root->curOuterRels as an implicit parameter for
- * sanity checks and nullingrel adjustments.
+ * sanity checks.
  */
 void
 process_subquery_nestloop_params(PlannerInfo *root, List *subplan_params)
@@ -467,19 +449,17 @@ process_subquery_nestloop_params(PlannerInfo *root, List *subplan_params)
                 nlp = (NestLoopParam *) lfirst(lc2);
                 if (nlp->paramno == pitem->paramId)
                 {
+                    Assert(equal(var, nlp->paramval));
                     /* Present, so nothing to do */
                     break;
                 }
             }
             if (lc2 == NULL)
             {
-                /* No, so add it after adjusting its nullingrels */
-                var = copyObject(var);
-                var->varnullingrels = bms_intersect(var->varnullingrels,
-                                                    root->curOuterRels);
+                /* No, so add it */
                 nlp = makeNode(NestLoopParam);
                 nlp->paramno = pitem->paramId;
-                nlp->paramval = var;
+                nlp->paramval = copyObject(var);
                 root->curOuterParams = lappend(root->curOuterParams, nlp);
             }
         }
@@ -500,19 +480,17 @@ process_subquery_nestloop_params(PlannerInfo *root, List *subplan_params)
                 nlp = (NestLoopParam *) lfirst(lc2);
                 if (nlp->paramno == pitem->paramId)
                 {
+                    Assert(equal(phv, nlp->paramval));
                     /* Present, so nothing to do */
                     break;
                 }
             }
             if (lc2 == NULL)
             {
-                /* No, so add it after adjusting its nullingrels */
-                phv = copyObject(phv);
-                phv->phnullingrels = bms_intersect(phv->phnullingrels,
-                                                   root->curOuterRels);
+                /* No, so add it */
                 nlp = makeNode(NestLoopParam);
                 nlp->paramno = pitem->paramId;
-                nlp->paramval = (Var *) phv;
+                nlp->paramval = (Var *) copyObject(phv);
                 root->curOuterParams = lappend(root->curOuterParams, nlp);
             }
         }
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1ca26baa25..c63758cb2b 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -2289,9 +2289,11 @@ set_join_references(PlannerInfo *root, Join *join, int rtoffset)
              * the outer-join level at which they are used, Vars seen in the
              * NestLoopParam expression may have nullingrels that are just a
              * subset of those in the Vars actually available from the outer
-             * side.  Not checking this exactly is a bit grotty, but the work
-             * needed to make things match up perfectly seems well out of
-             * proportion to the value.
+             * side.  (Lateral references can also cause this, as explained in
+             * the comments for identify_current_nestloop_params.)  Not
+             * checking this exactly is a bit grotty, but the work needed to
+             * make things match up perfectly seems well out of proportion to
+             * the value.
              */
             nlp->paramval = (Var *) fix_upper_expr(root,
                                                    (Node *) nlp->paramval,
diff --git a/src/backend/optimizer/util/paramassign.c b/src/backend/optimizer/util/paramassign.c
index 66534c0a78..d6a923b0b6 100644
--- a/src/backend/optimizer/util/paramassign.c
+++ b/src/backend/optimizer/util/paramassign.c
@@ -503,6 +503,28 @@ process_subquery_nestloop_params(PlannerInfo *root, List *subplan_params)
  * Identify any NestLoopParams that should be supplied by a NestLoop plan
  * node with the specified lefthand rels.  Remove them from the active
  * root->curOuterParams list and return them as the result list.
+ *
+ * XXX Here we also hack up the returned Vars and PHVs so that they do not
+ * contain nullingrel sets exceeding what is available from the outer side.
+ * This is needed if we have applied outer join identity 3,
+ *        (A leftjoin B on (Pab)) leftjoin C on (Pb*c)
+ *        = A leftjoin (B leftjoin C on (Pbc)) on (Pab)
+ * and C contains lateral references to B.  It's still safe to apply the
+ * identity, but the parser will have created those references in the form
+ * "b*" (i.e., with varnullingrels listing the A/B join), while what we will
+ * have available from the nestloop's outer side is just "b".  We deal with
+ * that here by stripping the nullingrels down to what is available from the
+ * outer side according to leftrelids.
+ *
+ * That fixes matters for the case of forward application of identity 3.
+ * If the identity was applied in the reverse direction, we will have
+ * parameter Vars containing too few nullingrel bits rather than too many.
+ * Currently, that causes no problems because setrefs.c applies only a
+ * subset check to nullingrels in NestLoopParams, but we'd have to work
+ * harder if we ever want to tighten that check.  This is all pretty annoying
+ * because it greatly weakens setrefs.c's cross-check, but the alternative
+ * seems to be to generate multiple versions of each laterally-parameterized
+ * subquery, which'd be unduly expensive.
  */
 List *
 identify_current_nestloop_params(PlannerInfo *root, Relids leftrelids)
@@ -517,13 +539,19 @@ identify_current_nestloop_params(PlannerInfo *root, Relids leftrelids)

         /*
          * We are looking for Vars and PHVs that can be supplied by the
-         * lefthand rels.
+         * lefthand rels.  When we find one, it's okay to modify it in-place
+         * because all the routines above make a fresh copy to put into
+         * curOuterParams.
          */
         if (IsA(nlp->paramval, Var) &&
             bms_is_member(nlp->paramval->varno, leftrelids))
         {
+            Var           *var = (Var *) nlp->paramval;
+
             root->curOuterParams = foreach_delete_current(root->curOuterParams,
                                                           cell);
+            var->varnullingrels = bms_intersect(var->varnullingrels,
+                                                leftrelids);
             result = lappend(result, nlp);
         }
         else if (IsA(nlp->paramval, PlaceHolderVar) &&
@@ -531,8 +559,12 @@ identify_current_nestloop_params(PlannerInfo *root, Relids leftrelids)
                                                      (PlaceHolderVar *) nlp->paramval)->ph_eval_at,
                                leftrelids))
         {
+            PlaceHolderVar *phv = (PlaceHolderVar *) nlp->paramval;
+
             root->curOuterParams = foreach_delete_current(root->curOuterParams,
                                                           cell);
+            phv->phnullingrels = bms_intersect(phv->phnullingrels,
+                                               leftrelids);
             result = lappend(result, nlp);
         }
     }
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index cc4c122fdd..af3e6bba6d 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2607,6 +2607,23 @@ select * from int8_tbl t1
                      Filter: (q1 = t2.q1)
 (8 rows)

+explain (costs off)
+select * from int8_tbl t1
+    left join int8_tbl t2 on true
+    left join lateral
+      (select * from generate_series(t2.q1, 100)) s
+      on t2.q1 = 1;
+                     QUERY PLAN
+----------------------------------------------------
+ Nested Loop Left Join
+   ->  Seq Scan on int8_tbl t1
+   ->  Materialize
+         ->  Nested Loop Left Join
+               Join Filter: (t2.q1 = 1)
+               ->  Seq Scan on int8_tbl t2
+               ->  Function Scan on generate_series
+(7 rows)
+
 explain (costs off)
 select * from onek t1
     left join onek t2 on true
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index e77e469570..ee19af838f 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -521,6 +521,13 @@ select * from int8_tbl t1
       (select * from int8_tbl t3 where t3.q1 = t2.q1 offset 0) s
       on t2.q1 = 1;

+explain (costs off)
+select * from int8_tbl t1
+    left join int8_tbl t2 on true
+    left join lateral
+      (select * from generate_series(t2.q1, 100)) s
+      on t2.q1 = 1;
+
 explain (costs off)
 select * from onek t1
     left join onek t2 on true
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 69ef483d28..b31d892121 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -580,6 +580,7 @@ create_lateral_join_info(PlannerInfo *root)
     {
         PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
         Relids        eval_at = phinfo->ph_eval_at;
+        Relids        lateral_refs;
         int            varno;

         if (phinfo->ph_lateral == NULL)
@@ -587,6 +588,15 @@ create_lateral_join_info(PlannerInfo *root)

         found_laterals = true;

+        /*
+         * Include only baserels not outer joins in the evaluation sites'
+         * lateral relids.  This avoids problems when outer join order gets
+         * rearranged, and it should still ensure that the lateral values are
+         * available when needed.
+         */
+        lateral_refs = bms_intersect(phinfo->ph_lateral, root->all_baserels);
+        Assert(!bms_is_empty(lateral_refs));
+
         if (bms_get_singleton_member(eval_at, &varno))
         {
             /* Evaluation site is a baserel */
@@ -594,10 +604,10 @@ create_lateral_join_info(PlannerInfo *root)

             brel->direct_lateral_relids =
                 bms_add_members(brel->direct_lateral_relids,
-                                phinfo->ph_lateral);
+                                lateral_refs);
             brel->lateral_relids =
                 bms_add_members(brel->lateral_relids,
-                                phinfo->ph_lateral);
+                                lateral_refs);
         }
         else
         {
@@ -610,7 +620,7 @@ create_lateral_join_info(PlannerInfo *root)
                 if (brel == NULL)
                     continue;    /* ignore outer joins in eval_at */
                 brel->lateral_relids = bms_add_members(brel->lateral_relids,
-                                                       phinfo->ph_lateral);
+                                                       lateral_refs);
             }
         }
     }
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index af3e6bba6d..372147292a 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2624,6 +2624,23 @@ select * from int8_tbl t1
                ->  Function Scan on generate_series
 (7 rows)

+explain (costs off)
+select * from int8_tbl t1
+    left join int8_tbl t2 on true
+    left join lateral
+      (select t2.q1 from int8_tbl t3) s
+      on t2.q1 = 1;
+                QUERY PLAN
+-------------------------------------------
+ Nested Loop Left Join
+   ->  Seq Scan on int8_tbl t1
+   ->  Materialize
+         ->  Nested Loop Left Join
+               Join Filter: (t2.q1 = 1)
+               ->  Seq Scan on int8_tbl t2
+               ->  Seq Scan on int8_tbl t3
+(7 rows)
+
 explain (costs off)
 select * from onek t1
     left join onek t2 on true
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index ee19af838f..84a1d69956 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -528,6 +528,13 @@ select * from int8_tbl t1
       (select * from generate_series(t2.q1, 100)) s
       on t2.q1 = 1;

+explain (costs off)
+select * from int8_tbl t1
+    left join int8_tbl t2 on true
+    left join lateral
+      (select t2.q1 from int8_tbl t3) s
+      on t2.q1 = 1;
+
 explain (costs off)
 select * from onek t1
     left join onek t2 on true

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: index prefetching
Следующее
От: Tom Lane
Дата:
Сообщение: Re: run pgindent on a regular basis / scripted manner