Обсуждение: Moving SS_finalize_plan processing to the end of planning

Поиск
Список
Период
Сортировка

Moving SS_finalize_plan processing to the end of planning

От
Tom Lane
Дата:
I've started to work on path-ification of the upper planner (finally),
and since that's going to be a large patch in any case, I've been looking
for pieces that could be bitten off and done separately.  One such piece
is the fact that SS_finalize_plan (computation of extParams/allParams)
currently gets run at the end of subquery_planner; as long as that's true,
we cannot have subquery_planner returning paths rather than concrete
plans.  The attached patch moves that processing to occur just before
set_plan_references is run.

Ideally I'd like to get rid of SS_finalize_plan altogether and merge its
work into set_plan_references, so as to save one traversal of the finished
plan tree.  However, that's a bit difficult because of the fact that the
traversal order is different: in SS_finalize_plan, we must visit subplans
before main plan, whereas set_plan_references wants to do the main plan
first.  (I experimented with changing that, but then the flat rangetable
comes out in a different order, with unpleasant side effects on the way
EXPLAIN prints things.)  Since that would be purely a minor performance
improvement, I set that goal aside for now.

Basically what this patch does is to divide what had been done in
SS_finalize_plan into three steps:

* SS_identify_outer_params determines which outer-query-level Params will
be available for the current query level to use, and saves that aside in
a new PlannerInfo field root->outer_params.  This processing turns out
to be the only reason that SS_finalize_plan had to be called in
subquery_planner: we have to capture this data before exiting
subquery_planner because the upper levels' plan_params lists may change
as they plan other subplans.

* SS_attach_initplans does the work of attaching initplans to the parent
plan node and adjusting the parent's cost estimate accordingly.

* SS_finalize_plan now *only* does extParam/allParam calculation.

I had to change things around a bit in planagg.c (which was already a
hack, and the law of conservation of cruft applies).  Otherwise it's
pretty straightforward and removes some existing hacks.  One notable
point is that there's no longer an assumption within SS_finalize_plan
that initPlans can only appear in the top plan node.

Any objections?

            regards, tom lane

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 7609183..a878498 100644
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
*************** _outPlannerInfo(StringInfo str, const Pl
*** 1799,1804 ****
--- 1799,1805 ----
      WRITE_NODE_FIELD(glob);
      WRITE_UINT_FIELD(query_level);
      WRITE_NODE_FIELD(plan_params);
+     WRITE_BITMAPSET_FIELD(outer_params);
      WRITE_BITMAPSET_FIELD(all_baserels);
      WRITE_BITMAPSET_FIELD(nullable_baserels);
      WRITE_NODE_FIELD(join_rel_list);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index f461586..404c6f5 100644
*** a/src/backend/optimizer/plan/createplan.c
--- b/src/backend/optimizer/plan/createplan.c
*************** make_material(Plan *lefttree)
*** 4473,4483 ****
   * materialize_finished_plan: stick a Material node atop a completed plan
   *
   * There are a couple of places where we want to attach a Material node
!  * after completion of subquery_planner().  This currently requires hackery.
!  * Since subquery_planner has already run SS_finalize_plan on the subplan
!  * tree, we have to kluge up parameter lists for the Material node.
!  * Possibly this could be fixed by postponing SS_finalize_plan processing
!  * until setrefs.c is run?
   */
  Plan *
  materialize_finished_plan(Plan *subplan)
--- 4473,4479 ----
   * materialize_finished_plan: stick a Material node atop a completed plan
   *
   * There are a couple of places where we want to attach a Material node
!  * after completion of subquery_planner(), without any MaterialPath path.
   */
  Plan *
  materialize_finished_plan(Plan *subplan)
*************** materialize_finished_plan(Plan *subplan)
*** 4498,4507 ****
      matplan->plan_rows = subplan->plan_rows;
      matplan->plan_width = subplan->plan_width;

-     /* parameter kluge --- see comments above */
-     matplan->extParam = bms_copy(subplan->extParam);
-     matplan->allParam = bms_copy(subplan->allParam);
-
      return matplan;
  }

--- 4494,4499 ----
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index f0e9c05..a761cfd 100644
*** a/src/backend/optimizer/plan/planagg.c
--- b/src/backend/optimizer/plan/planagg.c
*************** build_minmax_path(PlannerInfo *root, Min
*** 416,428 ****
       *         WHERE col IS NOT NULL AND existing-quals
       *         ORDER BY col ASC/DESC
       *         LIMIT 1)
       *----------
       */
      subroot = (PlannerInfo *) palloc(sizeof(PlannerInfo));
      memcpy(subroot, root, sizeof(PlannerInfo));
      subroot->parse = parse = (Query *) copyObject(root->parse);
!     /* make sure subroot planning won't change root->init_plans contents */
!     subroot->init_plans = list_copy(root->init_plans);
      /* There shouldn't be any OJ or LATERAL info to translate, as yet */
      Assert(subroot->join_info_list == NIL);
      Assert(subroot->lateral_info_list == NIL);
--- 416,438 ----
       *         WHERE col IS NOT NULL AND existing-quals
       *         ORDER BY col ASC/DESC
       *         LIMIT 1)
+      *
+      * We cheat a bit here by building what is effectively a subplan query
+      * level without taking the trouble to increment varlevelsup of outer
+      * references.  Therefore we don't increment the subroot's query_level nor
+      * repoint its parent_root to the parent level.  We can get away with that
+      * because the plan will be an initplan and therefore cannot need any
+      * parameters from the parent level.  But see hackery in make_agg_subplan;
+      * we might someday need to do this the hard way.
       *----------
       */
      subroot = (PlannerInfo *) palloc(sizeof(PlannerInfo));
      memcpy(subroot, root, sizeof(PlannerInfo));
      subroot->parse = parse = (Query *) copyObject(root->parse);
!     /* reset subplan-related stuff */
!     subroot->plan_params = NIL;
!     subroot->outer_params = NULL;
!     subroot->init_plans = NIL;
      /* There shouldn't be any OJ or LATERAL info to translate, as yet */
      Assert(subroot->join_info_list == NIL);
      Assert(subroot->lateral_info_list == NIL);
*************** make_agg_subplan(PlannerInfo *root, MinM
*** 578,600 ****
                                 0, 1);

      /*
       * Convert the plan into an InitPlan, and make a Param for its result.
       */
      mminfo->param =
!         SS_make_initplan_from_plan(subroot, plan,
                                     exprType((Node *) mminfo->target),
                                     -1,
                                     exprCollation((Node *) mminfo->target));
-
-     /*
-      * Make sure the initplan gets into the outer PlannerInfo, along with any
-      * other initplans generated by the sub-planning run.  We had to include
-      * the outer PlannerInfo's pre-existing initplans into the inner one's
-      * init_plans list earlier, so make sure we don't put back any duplicate
-      * entries.
-      */
-     root->init_plans = list_concat_unique_ptr(root->init_plans,
-                                               subroot->init_plans);
  }

  /*
--- 588,617 ----
                                 0, 1);

      /*
+      * We have to do some of the same cleanup that subquery_planner() would
+      * do, namely cope with params and initplans used within this plan tree.
+      *
+      * This is a little bit messy because although we initially created the
+      * subroot by cloning the outer root, it really is a subplan and needs to
+      * consider initplans belonging to the outer root as providing available
+      * parameters.  So temporarily change its parent_root pointer.
+      * (Fortunately, SS_identify_outer_params doesn't care whether the depth
+      * of parent_root nesting matches query_level.)
+      */
+     subroot->parent_root = root;
+     SS_identify_outer_params(subroot);
+     subroot->parent_root = root->parent_root;
+
+     SS_attach_initplans(subroot, plan);
+
+     /*
       * Convert the plan into an InitPlan, and make a Param for its result.
       */
      mminfo->param =
!         SS_make_initplan_from_plan(root, subroot, plan,
                                     exprType((Node *) mminfo->target),
                                     -1,
                                     exprCollation((Node *) mminfo->target));
  }

  /*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 09d4ea1..56b614a 100644
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*************** standard_planner(Query *parse, int curso
*** 239,244 ****
--- 239,263 ----
              top_plan = materialize_finished_plan(top_plan);
      }

+     /*
+      * If any Params were generated, run through the plan tree and compute
+      * each plan node's extParams/allParams sets.  Ideally we'd merge this
+      * into set_plan_references' tree traversal, but for now it has to be
+      * separate because we need to visit subplans before not after main plan.
+      */
+     if (glob->nParamExec > 0)
+     {
+         Assert(list_length(glob->subplans) == list_length(glob->subroots));
+         forboth(lp, glob->subplans, lr, glob->subroots)
+         {
+             Plan       *subplan = (Plan *) lfirst(lp);
+             PlannerInfo *subroot = (PlannerInfo *) lfirst(lr);
+
+             SS_finalize_plan(subroot, subplan);
+         }
+         SS_finalize_plan(root, top_plan);
+     }
+
      /* final cleanup of the plan */
      Assert(glob->finalrtable == NIL);
      Assert(glob->finalrowmarks == NIL);
*************** subquery_planner(PlannerGlobal *glob, Qu
*** 312,318 ****
                   bool hasRecursion, double tuple_fraction,
                   PlannerInfo **subroot)
  {
-     int            num_old_subplans = list_length(glob->subplans);
      PlannerInfo *root;
      Plan       *plan;
      List       *newWithCheckOptions;
--- 331,336 ----
*************** subquery_planner(PlannerGlobal *glob, Qu
*** 327,332 ****
--- 345,351 ----
      root->query_level = parent_root ? parent_root->query_level + 1 : 1;
      root->parent_root = parent_root;
      root->plan_params = NIL;
+     root->outer_params = NULL;
      root->planner_cxt = CurrentMemoryContext;
      root->init_plans = NIL;
      root->cte_plan_ids = NIL;
*************** subquery_planner(PlannerGlobal *glob, Qu
*** 656,668 ****
      }

      /*
!      * If any subplans were generated, or if there are any parameters to worry
!      * about, build initPlan list and extParam/allParam sets for plan nodes,
!      * and attach the initPlans to the top plan node.
       */
!     if (list_length(glob->subplans) != num_old_subplans ||
!         root->glob->nParamExec > 0)
!         SS_finalize_plan(root, plan, true);

      /* Return internal info if caller wants it */
      if (subroot)
--- 675,691 ----
      }

      /*
!      * Capture the set of outer-level param IDs we have access to, for use
!      * during setrefs.c processing.
       */
!     SS_identify_outer_params(root);
!
!     /*
!      * If any initPlans were created in this query level, attach them to the
!      * topmost plan node for the level, and increment that node's cost to
!      * account for them.
!      */
!     SS_attach_initplans(root, plan);

      /* Return internal info if caller wants it */
      if (subroot)
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index f3038cd..444128f 100644
*** a/src/backend/optimizer/plan/subselect.c
--- b/src/backend/optimizer/plan/subselect.c
***************
*** 22,27 ****
--- 22,28 ----
  #include "nodes/nodeFuncs.h"
  #include "optimizer/clauses.h"
  #include "optimizer/cost.h"
+ #include "optimizer/pathnode.h"
  #include "optimizer/planmain.h"
  #include "optimizer/planner.h"
  #include "optimizer/prep.h"
*************** process_sublinks_mutator(Node *node, pro
*** 2048,2107 ****
  }

  /*
!  * SS_finalize_plan - do final sublink and parameter processing for a
!  * completed Plan.
   *
!  * This recursively computes the extParam and allParam sets for every Plan
!  * node in the given plan tree.  It also optionally attaches any previously
!  * generated InitPlans to the top plan node.  (Any InitPlans should already
!  * have been put through SS_finalize_plan.)
   */
  void
! SS_finalize_plan(PlannerInfo *root, Plan *plan, bool attach_initplans)
  {
!     Bitmapset  *valid_params,
!                *initExtParam,
!                *initSetParam;
!     Cost        initplan_cost;
      PlannerInfo *proot;
      ListCell   *l;

      /*
!      * Examine any initPlans to determine the set of external params they
!      * reference, the set of output params they supply, and their total cost.
!      * We'll use at least some of this info below.  (Note we are assuming that
!      * finalize_plan doesn't touch the initPlans.)
!      *
!      * In the case where attach_initplans is false, we are assuming that the
!      * existing initPlans are siblings that might supply params needed by the
!      * current plan.
       */
!     initExtParam = initSetParam = NULL;
!     initplan_cost = 0;
!     foreach(l, root->init_plans)
!     {
!         SubPlan    *initsubplan = (SubPlan *) lfirst(l);
!         Plan       *initplan = planner_subplan_get_plan(root, initsubplan);
!         ListCell   *l2;
!
!         initExtParam = bms_add_members(initExtParam, initplan->extParam);
!         foreach(l2, initsubplan->setParam)
!         {
!             initSetParam = bms_add_member(initSetParam, lfirst_int(l2));
!         }
!         initplan_cost += initsubplan->startup_cost + initsubplan->per_call_cost;
!     }

      /*
!      * Now determine the set of params that are validly referenceable in this
!      * query level; to wit, those available from outer query levels plus the
!      * output parameters of any local initPlans.  (We do not include output
!      * parameters of regular subplans.  Those should only appear within the
!      * testexpr of SubPlan nodes, and are taken care of locally within
!      * finalize_primnode.  Likewise, special parameters that are generated by
!      * nodes such as ModifyTable are handled within finalize_plan.)
       */
!     valid_params = bms_copy(initSetParam);
      for (proot = root->parent_root; proot != NULL; proot = proot->parent_root)
      {
          /* Include ordinary Var/PHV/Aggref params */
--- 2049,2086 ----
  }

  /*
!  * SS_identify_outer_params - identify the Params available from outer levels
   *
!  * This must be run after SS_replace_correlation_vars and SS_process_sublinks
!  * processing is complete in a given query level as well as all of its
!  * descendant levels (which means it's most practical to do it at the end of
!  * processing the query level).  We compute the set of paramIds that outer
!  * levels will make available to this level+descendants, and record it in
!  * root->outer_params for use during setrefs.c processing.  (We can't just
!  * compute it at setrefs.c time, because the upper levels' plan_params lists
!  * are transient and will be gone by then.)
   */
  void
! SS_identify_outer_params(PlannerInfo *root)
  {
!     Bitmapset  *outer_params;
      PlannerInfo *proot;
      ListCell   *l;

      /*
!      * If no parameters have been assigned anywhere in the tree, we certainly
!      * don't need to do anything here.
       */
!     if (root->glob->nParamExec == 0)
!         return;

      /*
!      * Scan all query levels above this one to see which parameters are due to
!      * be available from them, either because lower query levels have
!      * requested them (via plan_params) or because they will be available from
!      * initPlans of those levels.
       */
!     outer_params = NULL;
      for (proot = root->parent_root; proot != NULL; proot = proot->parent_root)
      {
          /* Include ordinary Var/PHV/Aggref params */
*************** SS_finalize_plan(PlannerInfo *root, Plan
*** 2109,2115 ****
          {
              PlannerParamItem *pitem = (PlannerParamItem *) lfirst(l);

!             valid_params = bms_add_member(valid_params, pitem->paramId);
          }
          /* Include any outputs of outer-level initPlans */
          foreach(l, proot->init_plans)
--- 2088,2094 ----
          {
              PlannerParamItem *pitem = (PlannerParamItem *) lfirst(l);

!             outer_params = bms_add_member(outer_params, pitem->paramId);
          }
          /* Include any outputs of outer-level initPlans */
          foreach(l, proot->init_plans)
*************** SS_finalize_plan(PlannerInfo *root, Plan
*** 2119,2166 ****

              foreach(l2, initsubplan->setParam)
              {
!                 valid_params = bms_add_member(valid_params, lfirst_int(l2));
              }
          }
          /* Include worktable ID, if a recursive query is being planned */
          if (proot->wt_param_id >= 0)
!             valid_params = bms_add_member(valid_params, proot->wt_param_id);
      }

!     /*
!      * Now recurse through plan tree.
!      */
!     (void) finalize_plan(root, plan, valid_params, NULL);
!
!     bms_free(valid_params);

!     /*
!      * Finally, attach any initPlans to the topmost plan node, and add their
!      * extParams to the topmost node's, too.  However, any setParams of the
!      * initPlans should not be present in the topmost node's extParams, only
!      * in its allParams.  (As of PG 8.1, it's possible that some initPlans
!      * have extParams that are setParams of other initPlans, so we have to
!      * take care of this situation explicitly.)
!      *
!      * We also add the eval cost of each initPlan to the startup cost of the
!      * top node.  This is a conservative overestimate, since in fact each
!      * initPlan might be executed later than plan startup, or even not at all.
!      */
!     if (attach_initplans)
      {
!         plan->initPlan = root->init_plans;
!         root->init_plans = NIL; /* make sure they're not attached twice */

!         /* allParam must include all these params */
!         plan->allParam = bms_add_members(plan->allParam, initExtParam);
!         plan->allParam = bms_add_members(plan->allParam, initSetParam);
!         /* extParam must include any child extParam */
!         plan->extParam = bms_add_members(plan->extParam, initExtParam);
!         /* but extParam shouldn't include any setParams */
!         plan->extParam = bms_del_members(plan->extParam, initSetParam);
!         /* ensure extParam is exactly NULL if it's empty */
!         if (bms_is_empty(plan->extParam))
!             plan->extParam = NULL;

          plan->startup_cost += initplan_cost;
          plan->total_cost += initplan_cost;
--- 2098,2139 ----

              foreach(l2, initsubplan->setParam)
              {
!                 outer_params = bms_add_member(outer_params, lfirst_int(l2));
              }
          }
          /* Include worktable ID, if a recursive query is being planned */
          if (proot->wt_param_id >= 0)
!             outer_params = bms_add_member(outer_params, proot->wt_param_id);
      }
+     root->outer_params = outer_params;
+ }

! /*
!  * SS_attach_initplans - attach initplans to topmost plan node
!  *
!  * Attach any initplans created in the current query level to the topmost plan
!  * node for the query level, and increment that node's cost to account for
!  * them.  (The initPlans could actually go in any node at or above where
!  * they're referenced, but there seems no reason to put them any lower than
!  * the topmost node for the query level.)
!  */
! void
! SS_attach_initplans(PlannerInfo *root, Plan *plan)
! {
!     ListCell   *lc;

!     plan->initPlan = root->init_plans;
!     foreach(lc, plan->initPlan)
      {
!         SubPlan    *initsubplan = (SubPlan *) lfirst(lc);
!         Cost        initplan_cost;

!         /*
!          * Assume each initPlan gets run once during top plan startup.  This
!          * is a conservative overestimate, since in fact an initPlan might be
!          * executed later than plan startup, or even not at all.
!          */
!         initplan_cost = initsubplan->startup_cost + initsubplan->per_call_cost;

          plan->startup_cost += initplan_cost;
          plan->total_cost += initplan_cost;
*************** SS_finalize_plan(PlannerInfo *root, Plan
*** 2168,2183 ****
  }

  /*
   * Recursive processing of all nodes in the plan tree
   *
!  * valid_params is the set of param IDs considered valid to reference in
!  * this plan node or its children.
   * scan_params is a set of param IDs to force scan plan nodes to reference.
   * This is for EvalPlanQual support, and is always NULL at the top of the
   * recursion.
   *
   * The return value is the computed allParam set for the given Plan node.
   * This is just an internal notational convenience.
   */
  static Bitmapset *
  finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
--- 2141,2175 ----
  }

  /*
+  * SS_finalize_plan - do final parameter processing for a completed Plan.
+  *
+  * This recursively computes the extParam and allParam sets for every Plan
+  * node in the given plan tree.  (Oh, and RangeTblFunction.funcparams too.)
+  *
+  * We assume that SS_finalize_plan has already been run on any initplans or
+  * subplans the plan tree could reference.
+  */
+ void
+ SS_finalize_plan(PlannerInfo *root, Plan *plan)
+ {
+     /* No setup needed, just recurse through plan tree. */
+     (void) finalize_plan(root, plan, root->outer_params, NULL);
+ }
+
+ /*
   * Recursive processing of all nodes in the plan tree
   *
!  * valid_params is the set of param IDs supplied by outer plan levels
!  * that are valid to reference in this plan node or its children.
!  *
   * scan_params is a set of param IDs to force scan plan nodes to reference.
   * This is for EvalPlanQual support, and is always NULL at the top of the
   * recursion.
   *
   * The return value is the computed allParam set for the given Plan node.
   * This is just an internal notational convenience.
+  *
+  * Do not scribble on caller's values of valid_params or scan_params!
   */
  static Bitmapset *
  finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
*************** finalize_plan(PlannerInfo *root, Plan *p
*** 2186,2192 ****
--- 2178,2187 ----
      finalize_primnode_context context;
      int            locally_added_param;
      Bitmapset  *nestloop_params;
+     Bitmapset  *initExtParam;
+     Bitmapset  *initSetParam;
      Bitmapset  *child_params;
+     ListCell   *l;

      if (plan == NULL)
          return NULL;
*************** finalize_plan(PlannerInfo *root, Plan *p
*** 2197,2202 ****
--- 2192,2220 ----
      nestloop_params = NULL;        /* there aren't any */

      /*
+      * Examine any initPlans to determine the set of external params they
+      * reference and the set of output params they supply.  (We assume
+      * SS_finalize_plan was run on them already.)
+      */
+     initExtParam = initSetParam = NULL;
+     foreach(l, plan->initPlan)
+     {
+         SubPlan    *initsubplan = (SubPlan *) lfirst(l);
+         Plan       *initplan = planner_subplan_get_plan(root, initsubplan);
+         ListCell   *l2;
+
+         initExtParam = bms_add_members(initExtParam, initplan->extParam);
+         foreach(l2, initsubplan->setParam)
+         {
+             initSetParam = bms_add_member(initSetParam, lfirst_int(l2));
+         }
+     }
+
+     /* Any setParams are validly referenceable in this node and children */
+     if (initSetParam)
+         valid_params = bms_union(valid_params, initSetParam);
+
+     /*
       * When we call finalize_primnode, context.paramids sets are automatically
       * merged together.  But when recursing to self, we have to do it the hard
       * way.  We want the paramids set to include params in subplans as well as
*************** finalize_plan(PlannerInfo *root, Plan *p
*** 2274,2291 ****
              break;

          case T_SubqueryScan:

!             /*
!              * In a SubqueryScan, SS_finalize_plan has already been run on the
!              * subplan by the inner invocation of subquery_planner, so there's
!              * no need to do it again.  Instead, just pull out the subplan's
!              * extParams list, which represents the params it needs from my
!              * level and higher levels.
!              */
!             context.paramids = bms_add_members(context.paramids,
!                                  ((SubqueryScan *) plan)->subplan->extParam);
!             /* We need scan_params too, though */
!             context.paramids = bms_add_members(context.paramids, scan_params);
              break;

          case T_FunctionScan:
--- 2292,2313 ----
              break;

          case T_SubqueryScan:
+             {
+                 SubqueryScan *sscan = (SubqueryScan *) plan;
+                 RelOptInfo *rel;

!                 /* We must run SS_finalize_plan on the subquery */
!                 rel = find_base_rel(root, sscan->scan.scanrelid);
!                 Assert(rel->subplan == sscan->subplan);
!                 SS_finalize_plan(rel->subroot, sscan->subplan);
!
!                 /* Now we can add its extParams to the parent's params */
!                 context.paramids = bms_add_members(context.paramids,
!                                                    sscan->subplan->extParam);
!                 /* We need scan_params too, though */
!                 context.paramids = bms_add_members(context.paramids,
!                                                    scan_params);
!             }
              break;

          case T_FunctionScan:
*************** finalize_plan(PlannerInfo *root, Plan *p
*** 2338,2344 ****
                   * have to do instead is to find the referenced CTE plan and
                   * incorporate its external paramids, so that the correct
                   * things will happen if the CTE references outer-level
!                  * variables.  See test cases for bug #4902.
                   */
                  int            plan_id = ((CteScan *) plan)->ctePlanId;
                  Plan       *cteplan;
--- 2360,2367 ----
                   * have to do instead is to find the referenced CTE plan and
                   * incorporate its external paramids, so that the correct
                   * things will happen if the CTE references outer-level
!                  * variables.  See test cases for bug #4902.  (We assume
!                  * SS_finalize_plan was run on the CTE plan already.)
                   */
                  int            plan_id = ((CteScan *) plan)->ctePlanId;
                  Plan       *cteplan;
*************** finalize_plan(PlannerInfo *root, Plan *p
*** 2610,2639 ****
                                            locally_added_param);
      }

!     /* Now we have all the paramids */

      if (!bms_is_subset(context.paramids, valid_params))
          elog(ERROR, "plan should not reference subplan's variable");

      /*
!      * Note: by definition, extParam and allParam should have the same value
!      * in any plan node that doesn't have child initPlans.  We set them equal
!      * here, and later SS_finalize_plan will update them properly in node(s)
!      * that it attaches initPlans to.
!      *
       * For speed at execution time, make sure extParam/allParam are actually
       * NULL if they are empty sets.
       */
!     if (bms_is_empty(context.paramids))
!     {
          plan->extParam = NULL;
          plan->allParam = NULL;
-     }
-     else
-     {
-         plan->extParam = context.paramids;
-         plan->allParam = bms_copy(context.paramids);
-     }

      return plan->allParam;
  }
--- 2633,2667 ----
                                            locally_added_param);
      }

!     /* Now we have all the paramids referenced in this node and children */

      if (!bms_is_subset(context.paramids, valid_params))
          elog(ERROR, "plan should not reference subplan's variable");

      /*
!      * The plan node's allParams and extParams fields should include all its
!      * referenced paramids, plus contributions from any child initPlans.
!      * However, any setParams of the initPlans should not be present in the
!      * parent node's extParams, only in its allParams.  (It's possible that
!      * some initPlans have extParams that are setParams of other initPlans.)
!      */
!
!     /* allParam must include initplans' extParams and setParams */
!     plan->allParam = bms_union(context.paramids, initExtParam);
!     plan->allParam = bms_add_members(plan->allParam, initSetParam);
!     /* extParam must include any initplan extParams */
!     plan->extParam = bms_union(context.paramids, initExtParam);
!     /* but not any initplan setParams */
!     plan->extParam = bms_del_members(plan->extParam, initSetParam);
!
!     /*
       * For speed at execution time, make sure extParam/allParam are actually
       * NULL if they are empty sets.
       */
!     if (bms_is_empty(plan->extParam))
          plan->extParam = NULL;
+     if (bms_is_empty(plan->allParam))
          plan->allParam = NULL;

      return plan->allParam;
  }
*************** finalize_primnode(Node *node, finalize_p
*** 2686,2692 ****

          /*
           * Add params needed by the subplan to paramids, but excluding those
!          * we will pass down to it.
           */
          subparamids = bms_copy(plan->extParam);
          foreach(lc, subplan->parParam)
--- 2714,2721 ----

          /*
           * Add params needed by the subplan to paramids, but excluding those
!          * we will pass down to it.  (We assume SS_finalize_plan was run on
!          * the subplan already.)
           */
          subparamids = bms_copy(plan->extParam);
          foreach(lc, subplan->parParam)
*************** finalize_primnode(Node *node, finalize_p
*** 2706,2718 ****
   *
   * The plan is expected to return a scalar value of the given type/collation.
   * We build an EXPR_SUBLINK SubPlan node and put it into the initplan
!  * list for the current query level.  A Param that represents the initplan's
   * output is returned.
-  *
-  * We assume the plan hasn't been put through SS_finalize_plan.
   */
  Param *
! SS_make_initplan_from_plan(PlannerInfo *root, Plan *plan,
                             Oid resulttype, int32 resulttypmod,
                             Oid resultcollation)
  {
--- 2735,2746 ----
   *
   * The plan is expected to return a scalar value of the given type/collation.
   * We build an EXPR_SUBLINK SubPlan node and put it into the initplan
!  * list for the outer query level.  A Param that represents the initplan's
   * output is returned.
   */
  Param *
! SS_make_initplan_from_plan(PlannerInfo *root,
!                            PlannerInfo *subroot, Plan *plan,
                             Oid resulttype, int32 resulttypmod,
                             Oid resultcollation)
  {
*************** SS_make_initplan_from_plan(PlannerInfo *
*** 2720,2743 ****
      Param       *prm;

      /*
-      * We must run SS_finalize_plan(), since that's normally done before a
-      * subplan gets put into the initplan list.  Tell it not to attach any
-      * pre-existing initplans to this one, since they are siblings not
-      * children of this initplan.  (This is something else that could perhaps
-      * be cleaner if we did extParam/allParam processing in setrefs.c instead
-      * of here?  See notes for materialize_finished_plan.)
-      */
-
-     /*
-      * Build extParam/allParam sets for plan nodes.
-      */
-     SS_finalize_plan(root, plan, false);
-
-     /*
       * Add the subplan and its PlannerInfo to the global lists.
       */
      root->glob->subplans = lappend(root->glob->subplans, plan);
!     root->glob->subroots = lappend(root->glob->subroots, root);

      /*
       * Create a SubPlan node and add it to the outer list of InitPlans. Note
--- 2748,2757 ----
      Param       *prm;

      /*
       * Add the subplan and its PlannerInfo to the global lists.
       */
      root->glob->subplans = lappend(root->glob->subplans, plan);
!     root->glob->subroots = lappend(root->glob->subroots, subroot);

      /*
       * Create a SubPlan node and add it to the outer list of InitPlans. Note
*************** SS_make_initplan_from_plan(PlannerInfo *
*** 2757,2763 ****
       * parParam and args lists remain empty.
       */

!     cost_subplan(root, node, plan);

      /*
       * Make a Param that will be the subplan's output.
--- 2771,2777 ----
       * parParam and args lists remain empty.
       */

!     cost_subplan(subroot, node, plan);

      /*
       * Make a Param that will be the subplan's output.
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 9bf1c66..401ba5b 100644
*** a/src/backend/optimizer/prep/prepjointree.c
--- b/src/backend/optimizer/prep/prepjointree.c
*************** pull_up_simple_subquery(PlannerInfo *roo
*** 899,904 ****
--- 899,905 ----
      subroot->query_level = root->query_level;
      subroot->parent_root = root->parent_root;
      subroot->plan_params = NIL;
+     subroot->outer_params = NULL;
      subroot->planner_cxt = CurrentMemoryContext;
      subroot->init_plans = NIL;
      subroot->cte_plan_ids = NIL;
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index cb916ea..5dc23d9 100644
*** a/src/include/nodes/relation.h
--- b/src/include/nodes/relation.h
*************** typedef struct PlannerInfo
*** 131,137 ****
--- 131,144 ----

      struct PlannerInfo *parent_root;    /* NULL at outermost Query */

+     /*
+      * plan_params contains the expressions that this query level needs to
+      * make available to a lower query level that is currently being planned.
+      * outer_params contains the paramIds of PARAM_EXEC Params that outer
+      * query levels will make available to this query level.
+      */
      List       *plan_params;    /* list of PlannerParamItems, see below */
+     Bitmapset  *outer_params;

      /*
       * simple_rel_array holds pointers to "base rels" and "other rels" (see
diff --git a/src/include/optimizer/subselect.h b/src/include/optimizer/subselect.h
index c609ac3..c3b1c79 100644
*** a/src/include/optimizer/subselect.h
--- b/src/include/optimizer/subselect.h
*************** extern JoinExpr *convert_EXISTS_sublink_
*** 25,33 ****
                                 Relids available_rels);
  extern Node *SS_replace_correlation_vars(PlannerInfo *root, Node *expr);
  extern Node *SS_process_sublinks(PlannerInfo *root, Node *expr, bool isQual);
! extern void SS_finalize_plan(PlannerInfo *root, Plan *plan,
!                  bool attach_initplans);
! extern Param *SS_make_initplan_from_plan(PlannerInfo *root, Plan *plan,
                      Oid resulttype, int32 resulttypmod, Oid resultcollation);
  extern Param *assign_nestloop_param_var(PlannerInfo *root, Var *var);
  extern Param *assign_nestloop_param_placeholdervar(PlannerInfo *root,
--- 25,35 ----
                                 Relids available_rels);
  extern Node *SS_replace_correlation_vars(PlannerInfo *root, Node *expr);
  extern Node *SS_process_sublinks(PlannerInfo *root, Node *expr, bool isQual);
! extern void SS_identify_outer_params(PlannerInfo *root);
! extern void SS_attach_initplans(PlannerInfo *root, Plan *plan);
! extern void SS_finalize_plan(PlannerInfo *root, Plan *plan);
! extern Param *SS_make_initplan_from_plan(PlannerInfo *root,
!                            PlannerInfo *subroot, Plan *plan,
                      Oid resulttype, int32 resulttypmod, Oid resultcollation);
  extern Param *assign_nestloop_param_var(PlannerInfo *root, Var *var);
  extern Param *assign_nestloop_param_placeholdervar(PlannerInfo *root,
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index da407ef..68f0bac 100644
*** a/src/test/regress/expected/join.out
--- b/src/test/regress/expected/join.out
*************** select * from
*** 4869,4874 ****
--- 4869,4931 ----
    0 | 9998 |  0
  (1 row)

+ -- check proper extParam/allParam handling (this isn't exactly a LATERAL issue,
+ -- but we can make the test case much more compact with LATERAL)
+ explain (verbose, costs off)
+ select * from (values (0), (1)) v(id),
+ lateral (select * from int8_tbl t1,
+          lateral (select * from
+                     (select * from int8_tbl t2
+                      where q1 = any (select q2 from int8_tbl t3
+                                      where q2 = (select greatest(t1.q1,t2.q2))
+                                        and (select v.id=0)) offset 0) ss2) ss
+          where t1.q1 = ss.q2) ss0;
+                            QUERY PLAN
+ -----------------------------------------------------------------
+  Nested Loop
+    Output: "*VALUES*".column1, t1.q1, t1.q2, ss2.q1, ss2.q2
+    ->  Seq Scan on public.int8_tbl t1
+          Output: t1.q1, t1.q2
+    ->  Nested Loop
+          Output: "*VALUES*".column1, ss2.q1, ss2.q2
+          ->  Values Scan on "*VALUES*"
+                Output: "*VALUES*".column1
+          ->  Subquery Scan on ss2
+                Output: ss2.q1, ss2.q2
+                Filter: (t1.q1 = ss2.q2)
+                ->  Seq Scan on public.int8_tbl t2
+                      Output: t2.q1, t2.q2
+                      Filter: (SubPlan 3)
+                      SubPlan 3
+                        ->  Result
+                              Output: t3.q2
+                              One-Time Filter: $4
+                              InitPlan 1 (returns $2)
+                                ->  Result
+                                      Output: GREATEST($0, t2.q2)
+                              InitPlan 2 (returns $4)
+                                ->  Result
+                                      Output: ($3 = 0)
+                              ->  Seq Scan on public.int8_tbl t3
+                                    Output: t3.q1, t3.q2
+                                    Filter: (t3.q2 = $2)
+ (27 rows)
+
+ select * from (values (0), (1)) v(id),
+ lateral (select * from int8_tbl t1,
+          lateral (select * from
+                     (select * from int8_tbl t2
+                      where q1 = any (select q2 from int8_tbl t3
+                                      where q2 = (select greatest(t1.q1,t2.q2))
+                                        and (select v.id=0)) offset 0) ss2) ss
+          where t1.q1 = ss.q2) ss0;
+  id |        q1        |        q2         |        q1        |        q2
+ ----+------------------+-------------------+------------------+------------------
+   0 | 4567890123456789 |               123 | 4567890123456789 | 4567890123456789
+   0 | 4567890123456789 |  4567890123456789 | 4567890123456789 | 4567890123456789
+   0 | 4567890123456789 | -4567890123456789 | 4567890123456789 | 4567890123456789
+ (3 rows)
+
  -- test some error cases where LATERAL should have been used but wasn't
  select f1,g from int4_tbl a, (select f1 as g) ss;
  ERROR:  column "f1" does not exist
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index f924532..6599702 100644
*** a/src/test/regress/sql/join.sql
--- b/src/test/regress/sql/join.sql
*************** select * from
*** 1524,1529 ****
--- 1524,1550 ----
             where f1 = any (select unique1 from tenk1
                             where unique2 = v.x offset 0)) ss;

+ -- check proper extParam/allParam handling (this isn't exactly a LATERAL issue,
+ -- but we can make the test case much more compact with LATERAL)
+ explain (verbose, costs off)
+ select * from (values (0), (1)) v(id),
+ lateral (select * from int8_tbl t1,
+          lateral (select * from
+                     (select * from int8_tbl t2
+                      where q1 = any (select q2 from int8_tbl t3
+                                      where q2 = (select greatest(t1.q1,t2.q2))
+                                        and (select v.id=0)) offset 0) ss2) ss
+          where t1.q1 = ss.q2) ss0;
+
+ select * from (values (0), (1)) v(id),
+ lateral (select * from int8_tbl t1,
+          lateral (select * from
+                     (select * from int8_tbl t2
+                      where q1 = any (select q2 from int8_tbl t3
+                                      where q2 = (select greatest(t1.q1,t2.q2))
+                                        and (select v.id=0)) offset 0) ss2) ss
+          where t1.q1 = ss.q2) ss0;
+
  -- test some error cases where LATERAL should have been used but wasn't
  select f1,g from int4_tbl a, (select f1 as g) ss;
  select f1,g from int4_tbl a, (select a.f1 as g) ss;

Re: Moving SS_finalize_plan processing to the end of planning

От
David Rowley
Дата:
On 10 August 2015 at 07:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I've started to work on path-ification of the upper planner (finally),
and since that's going to be a large patch in any case, I've been looking
for pieces that could be bitten off and done separately.  One such piece
is the fact that SS_finalize_plan (computation of extParams/allParams)
currently gets run at the end of subquery_planner; as long as that's true,
we cannot have subquery_planner returning paths rather than concrete
plans.  The attached patch moves that processing to occur just before
set_plan_references is run.

Basically what this patch does is to divide what had been done in
SS_finalize_plan into three steps:

* SS_identify_outer_params determines which outer-query-level Params will
be available for the current query level to use, and saves that aside in
a new PlannerInfo field root->outer_params.  This processing turns out
to be the only reason that SS_finalize_plan had to be called in
subquery_planner: we have to capture this data before exiting
subquery_planner because the upper levels' plan_params lists may change
as they plan other subplans.

* SS_attach_initplans does the work of attaching initplans to the parent
plan node and adjusting the parent's cost estimate accordingly.

* SS_finalize_plan now *only* does extParam/allParam calculation.

I had to change things around a bit in planagg.c (which was already a
hack, and the law of conservation of cruft applies).  Otherwise it's
pretty straightforward and removes some existing hacks.  One notable
point is that there's no longer an assumption within SS_finalize_plan
that initPlans can only appear in the top plan node.

Any objections?

Great! I'm very interested in this work, specifically around the grouping_planner() changes too.

I've looked over the patch and from what I understand it seems like a good solid step in the right direction.

I was digging around the grouping_planner() last week with the intentions of making some changes there to allow GROUP BY before join, but in the end decided to stay well clear of this area until this pathification is done. So far I've managed to keep my changes away from the upper planner and I've added "GroupingPath" types, which from what I can predict of what you'll be making changes to, I think you'll also need to have grouping_planner() return a few variations of "GroupingPath" to allow the paths list to be passed up to subquery_planner() and on up to functions like recurse_set_operations() so that they have the option of choosing GroupAggregate / MergeAppend to implement UNION.

If I'm right on this, then maybe there's a few things you can copy and paste from the patch I posted here: http://www.postgresql.org/message-id/CAKJS1f-sEcm=gTfS-DqjsOcsZ-vLhrP_hSRNtJjq-S7Egn8Rqw@mail.gmail.com  specifically around create_grouping_path()?

Kind Regards

David Rowley

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
 

Re: Moving SS_finalize_plan processing to the end of planning

От
Tom Lane
Дата:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On 10 August 2015 at 07:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I've started to work on path-ification of the upper planner (finally),

> I was digging around the grouping_planner() last week with the intentions
> of making some changes there to allow GROUP BY before join, but in the end
> decided to stay well clear of this area until this pathification is done.
> So far I've managed to keep my changes away from the upper planner and I've
> added "GroupingPath" types, which from what I can predict of what you'll be
> making changes to, I think you'll also need to have grouping_planner()
> return a few variations of "GroupingPath" to allow the paths list to be
> passed up to subquery_planner() and on up to functions
> like recurse_set_operations() so that they have the option of choosing
> GroupAggregate / MergeAppend to implement UNION.

> If I'm right on this, then maybe there's a few things you can copy and
> paste from the patch I posted here:
> http://www.postgresql.org/message-id/CAKJS1f-sEcm=gTfS-DqjsOcsZ-vLhrP_hSRNtJjq-S7Egn8Rqw@mail.gmail.com
>  specifically around create_grouping_path()?

Yeah, I saw your patch, but have not yet had time to think about what
parts of it I could borrow.
        regards, tom lane



Re: Moving SS_finalize_plan processing to the end of planning

От
Robert Haas
Дата:
On Sun, Aug 9, 2015 at 3:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I've started to work on path-ification of the upper planner (finally),

I don't have any specific technical comments on what you've proposed
here, but I'm excited to hear that this is moving forward.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Moving SS_finalize_plan processing to the end of planning

От
Robert Haas
Дата:
On Sun, Aug 9, 2015 at 3:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I've started to work on path-ification of the upper planner (finally),
> and since that's going to be a large patch in any case, I've been looking
> for pieces that could be bitten off and done separately.  One such piece
> is the fact that SS_finalize_plan (computation of extParams/allParams)
> currently gets run at the end of subquery_planner; as long as that's true,
> we cannot have subquery_planner returning paths rather than concrete
> plans.  The attached patch moves that processing to occur just before
> set_plan_references is run.

Tom,

Do you expect to do more work on the upper planner path-ification stuff soon?

Just checking.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Moving SS_finalize_plan processing to the end of planning

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> Do you expect to do more work on the upper planner path-ification stuff soon?

Yeah, I do actually have some work in progress.  Not sure how soon I'll be
ready to show it --- there's a lot of code to break and reassemble :-(
        regards, tom lane



Re: Moving SS_finalize_plan processing to the end of planning

От
Robert Haas
Дата:
On Thu, Sep 10, 2015 at 9:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Do you expect to do more work on the upper planner path-ification stuff soon?
>
> Yeah, I do actually have some work in progress.  Not sure how soon I'll be
> ready to show it --- there's a lot of code to break and reassemble :-(

Yeah, I remember looking at it a bit and finding it quite an
intimidating piece of code.  I'm just asking because it would be nice
if we got the infrastructure into this release in time to do something
with it - specifically, I'd like to find a way for FDWs to push down
aggregates, a capability for which EDB has had multiple customer
requests (and I'm sure we're not the only ones).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Moving SS_finalize_plan processing to the end of planning

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Sep 10, 2015 at 9:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> Do you expect to do more work on the upper planner path-ification stuff soon?

>> Yeah, I do actually have some work in progress.  Not sure how soon I'll be
>> ready to show it --- there's a lot of code to break and reassemble :-(

> Yeah, I remember looking at it a bit and finding it quite an
> intimidating piece of code.  I'm just asking because it would be nice
> if we got the infrastructure into this release in time to do something
> with it - specifically, I'd like to find a way for FDWs to push down
> aggregates, a capability for which EDB has had multiple customer
> requests (and I'm sure we're not the only ones).

Yeah, I'm well aware that this is blocking progress in several areas.
I do intend to make it happen, but it's not the only demand on my time.
        regards, tom lane



Re: Moving SS_finalize_plan processing to the end of planning

От
Robert Haas
Дата:
On Fri, Sep 11, 2015 at 9:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Sep 10, 2015 at 9:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Robert Haas <robertmhaas@gmail.com> writes:
>>>> Do you expect to do more work on the upper planner path-ification stuff soon?
>
>>> Yeah, I do actually have some work in progress.  Not sure how soon I'll be
>>> ready to show it --- there's a lot of code to break and reassemble :-(
>
>> Yeah, I remember looking at it a bit and finding it quite an
>> intimidating piece of code.  I'm just asking because it would be nice
>> if we got the infrastructure into this release in time to do something
>> with it - specifically, I'd like to find a way for FDWs to push down
>> aggregates, a capability for which EDB has had multiple customer
>> requests (and I'm sure we're not the only ones).
>
> Yeah, I'm well aware that this is blocking progress in several areas.
> I do intend to make it happen, but it's not the only demand on my time.

Sure, I understand that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company