Обсуждение: Safe outer-join orderings

Поиск
Список
Период
Сортировка

Safe outer-join orderings

От
Tom Lane
Дата:
I looked into Ian Harding's problem reported here:
http://archives.postgresql.org/pgsql-general/2007-02/msg00530.php

It's possible to reproduce the problem in a simplified form in the
regression database, using this query:

explain
select * from tenk1 a left join   (tenk1 b inner join     (tenk1 c left join tenk1 d using (unique1))    using
(unique1))using (unique1);
 

It works OK with default settings, but after setting join_collapse_limit
= 2 it fails with "failed to build any 2-way joins".  The reason is that
8.2's new logic for rearranging outer joins is getting confused.

In this scenario we compute min_lefthand = (A) and min_righthand = (B C)
for the upper outer join, while the lower outer join has min_lefthand
= (C) and min_righthand = (D).  Normally the upper's min_righthand would
be (B C D) which would force joining in the syntactic order, but here we
have been able to prove we can commute the two outer joins, so we
exclude D from the upper min_righthand.  This situation should allow
several different join orders, including the syntactic ordering of
(A (B (C D))).

The problem is that make_join_rel() is being too strict.  As explained
in the planner README:

: So the restriction enforced by make_join_rel is that a proposed join
: can't join across a RHS boundary (ie, join anything inside the RHS
: to anything else) unless the join validly implements some outer join.

So we can form (C D) because it's legal according to the lower OJ, but
when we consider forming (B (C D)) we conclude that it crosses the upper
one's RHS and yet is not an outer join, and reject it.  Normally this
mistake is masked because the planner also considers ((B C) D) and
decides that's legal; but if join_collapse_limit is too small then that
path isn't explored and we end up not finding any legal join path.
Ian's original problem query is complicated enough to expose the problem
even at the default join_collapse_limit of 8.

Quite aside from the possibility of failure, this is bad because we
might be excluding the most efficient join order.

As best I can see, it would be allowable for make_join_rel to accept an
inner join that overlaps but isn't contained in an OJ RHS so long as
(1) it doesn't overlap the LHS (otherwise it's a premature attempt to
perform the outer join; and (2) *both* of the input relations overlap
the OJ's RHS.  Since each input relation must have been previously found
valid, that means that whatever is "sticking out" of the RHS must have
been legally joined to something inside the RHS by a previous outer
join.  I haven't quite convinced myself of it, but I think that we might
not even need to test (1) separately, as legal inputs overlapping the
RHS couldn't overlap the LHS.

Comments?  Is there still a hole here?
        regards, tom lane


Re: Safe outer-join orderings

От
Tom Lane
Дата:
I wrote:
> I looked into Ian Harding's problem reported here:
> http://archives.postgresql.org/pgsql-general/2007-02/msg00530.php

I've applied the attached patch, but it could do with some more
testing.  Please try it out on your database and see if you get sane
answers ...

            regards, tom lane

Index: src/backend/optimizer/README
===================================================================
RCS file: /cvsroot/pgsql/src/backend/optimizer/README,v
retrieving revision 1.35
diff -c -r1.35 README
*** src/backend/optimizer/README    1 Jul 2006 18:38:32 -0000    1.35
--- src/backend/optimizer/README    13 Feb 2007 02:21:36 -0000
***************
*** 225,232 ****
  non-FULL joins can be freely associated into the lefthand side of an
  OJ, but in general they can't be associated into the righthand side.
  So the restriction enforced by make_join_rel is that a proposed join
! can't join across a RHS boundary (ie, join anything inside the RHS
! to anything else) unless the join validly implements some outer join.
  (To support use of identity 3, we have to allow cases where an apparent
  violation of a lower OJ's RHS is committed while forming an upper OJ.
  If this wouldn't in fact be legal, the upper OJ's minimum LHS or RHS
--- 225,232 ----
  non-FULL joins can be freely associated into the lefthand side of an
  OJ, but in general they can't be associated into the righthand side.
  So the restriction enforced by make_join_rel is that a proposed join
! can't join a rel within or partly within an RHS boundary to one outside
! the boundary, unless the join validly implements some outer join.
  (To support use of identity 3, we have to allow cases where an apparent
  violation of a lower OJ's RHS is committed while forming an upper OJ.
  If this wouldn't in fact be legal, the upper OJ's minimum LHS or RHS
Index: src/backend/optimizer/geqo/geqo_eval.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/optimizer/geqo/geqo_eval.c,v
retrieving revision 1.81.2.1
diff -c -r1.81.2.1 geqo_eval.c
*** src/backend/optimizer/geqo/geqo_eval.c    12 Dec 2006 21:31:09 -0000    1.81.2.1
--- src/backend/optimizer/geqo/geqo_eval.c    13 Feb 2007 02:21:36 -0000
***************
*** 182,188 ****
       * tour other than the one given.  To the extent that the heuristics are
       * helpful, however, this will be a better plan than the raw tour.
       *
!      * Also, when a join attempt fails (because of IN-clause constraints), we
       * may be able to recover and produce a workable plan, where the old code
       * just had to give up.  This case acts the same as a false result from
       * desirable_join().
--- 182,188 ----
       * tour other than the one given.  To the extent that the heuristics are
       * helpful, however, this will be a better plan than the raw tour.
       *
!      * Also, when a join attempt fails (because of OJ or IN constraints), we
       * may be able to recover and produce a workable plan, where the old code
       * just had to give up.  This case acts the same as a false result from
       * desirable_join().
***************
*** 262,270 ****
          return true;

      /*
!      * Join if the rels are members of the same outer-join side. This is
!      * needed to ensure that we can find a valid solution in a case where
!      * an OJ contains a clauseless join.
       */
      foreach(l, root->oj_info_list)
      {
--- 262,270 ----
          return true;

      /*
!      * Join if the rels overlap the same outer-join side and don't already
!      * implement the outer join. This is needed to ensure that we can find a
!      * valid solution in a case where an OJ contains a clauseless join.
       */
      foreach(l, root->oj_info_list)
      {
***************
*** 273,283 ****
          /* ignore full joins --- other mechanisms preserve their ordering */
          if (ojinfo->is_full_join)
              continue;
!         if (bms_is_subset(outer_rel->relids, ojinfo->min_righthand) &&
!             bms_is_subset(inner_rel->relids, ojinfo->min_righthand))
              return true;
!         if (bms_is_subset(outer_rel->relids, ojinfo->min_lefthand) &&
!             bms_is_subset(inner_rel->relids, ojinfo->min_lefthand))
              return true;
      }

--- 273,287 ----
          /* ignore full joins --- other mechanisms preserve their ordering */
          if (ojinfo->is_full_join)
              continue;
!         if (bms_overlap(outer_rel->relids, ojinfo->min_righthand) &&
!             bms_overlap(inner_rel->relids, ojinfo->min_righthand) &&
!             !bms_overlap(outer_rel->relids, ojinfo->min_lefthand) &&
!             !bms_overlap(inner_rel->relids, ojinfo->min_lefthand))
              return true;
!         if (bms_overlap(outer_rel->relids, ojinfo->min_lefthand) &&
!             bms_overlap(inner_rel->relids, ojinfo->min_lefthand) &&
!             !bms_overlap(outer_rel->relids, ojinfo->min_righthand) &&
!             !bms_overlap(inner_rel->relids, ojinfo->min_righthand))
              return true;
      }

Index: src/backend/optimizer/path/joinrels.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/optimizer/path/joinrels.c,v
retrieving revision 1.81.2.1
diff -c -r1.81.2.1 joinrels.c
*** src/backend/optimizer/path/joinrels.c    12 Dec 2006 21:31:09 -0000    1.81.2.1
--- src/backend/optimizer/path/joinrels.c    13 Feb 2007 02:21:36 -0000
***************
*** 350,357 ****
          /* ignore full joins --- other mechanisms preserve their ordering */
          if (ojinfo->is_full_join)
              continue;
!         /* anything inside the RHS is definitely restricted */
!         if (bms_is_subset(rel->relids, ojinfo->min_righthand))
              return true;
          /* if it's a proper subset of the LHS, it's also restricted */
          if (bms_is_subset(rel->relids, ojinfo->min_lefthand) &&
--- 350,358 ----
          /* ignore full joins --- other mechanisms preserve their ordering */
          if (ojinfo->is_full_join)
              continue;
!         /* if it overlaps RHS and isn't yet joined to LHS, it's restricted */
!         if (bms_overlap(rel->relids, ojinfo->min_righthand) &&
!             !bms_overlap(rel->relids, ojinfo->min_lefthand))
              return true;
          /* if it's a proper subset of the LHS, it's also restricted */
          if (bms_is_subset(rel->relids, ojinfo->min_lefthand) &&
***************
*** 468,483 ****
              /*----------
               * Otherwise, the proposed join overlaps the RHS but isn't
               * a valid implementation of this OJ.  It might still be
!              * a valid implementation of some other OJ, however.  We have
!              * to allow this to support the associative identity
!              *    (a LJ b on Pab) LJ c ON Pbc = a LJ (b LJ c ON Pbc) on Pab
               * since joining B directly to C violates the lower OJ's RHS.
               * We assume that make_outerjoininfo() set things up correctly
!              * so that we'll only match to the upper OJ if the transformation
!              * is valid.  Set flag here to check at bottom of loop.
               *----------
               */
!             is_valid_inner = false;
          }
      }

--- 469,504 ----
              /*----------
               * Otherwise, the proposed join overlaps the RHS but isn't
               * a valid implementation of this OJ.  It might still be
!              * a legal join, however.  If both inputs overlap the RHS,
!              * assume that it's OK.  Since the inputs presumably got past
!              * this function's checks previously, they can't overlap the
!              * LHS and their violations of the RHS boundary must represent
!              * OJs that have been determined to commute with this one.
!              * We have to allow this to work correctly in cases like
!              *        (a LEFT JOIN (b JOIN (c LEFT JOIN d)))
!              * when the c/d join has been determined to commute with the join
!              * to a, and hence d is not part of min_righthand for the upper
!              * join.  It should be legal to join b to c/d but this will appear
!              * as a violation of the upper join's RHS.
!              * Furthermore, if one input overlaps the RHS and the other does
!              * not, we should still allow the join if it is a valid
!              * implementation of some other OJ.  We have to allow this to
!              * support the associative identity
!              *        (a LJ b on Pab) LJ c ON Pbc = a LJ (b LJ c ON Pbc) on Pab
               * since joining B directly to C violates the lower OJ's RHS.
               * We assume that make_outerjoininfo() set things up correctly
!              * so that we'll only match to some OJ if the join is valid.
!              * Set flag here to check at bottom of loop.
               *----------
               */
!             if (bms_overlap(rel1->relids, ojinfo->min_righthand) &&
!                 bms_overlap(rel2->relids, ojinfo->min_righthand))
!             {
!                 /* seems OK */
!                 Assert(!bms_overlap(joinrelids, ojinfo->min_lefthand));
!             }
!             else
!                 is_valid_inner = false;
          }
      }

Index: src/backend/optimizer/plan/initsplan.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/optimizer/plan/initsplan.c,v
retrieving revision 1.123.2.2
diff -c -r1.123.2.2 initsplan.c
*** src/backend/optimizer/plan/initsplan.c    8 Jan 2007 16:47:35 -0000    1.123.2.2
--- src/backend/optimizer/plan/initsplan.c    13 Feb 2007 02:21:36 -0000
***************
*** 373,379 ****

          /*
           * For an OJ, form the OuterJoinInfo now, because we need the OJ's
!          * semantic scope (ojscope) to pass to distribute_qual_to_rels.
           */
          if (j->jointype != JOIN_INNER)
          {
--- 373,381 ----

          /*
           * For an OJ, form the OuterJoinInfo now, because we need the OJ's
!          * semantic scope (ojscope) to pass to distribute_qual_to_rels.  But
!          * we mustn't add it to oj_info_list just yet, because we don't want
!          * distribute_qual_to_rels to think it is an outer join below us.
           */
          if (j->jointype != JOIN_INNER)
          {
***************
*** 454,461 ****
   * the caller, so that left_rels is always the nonnullable side.  Hence
   * we need only distinguish the LEFT and FULL cases.
   *
!  * The node should eventually be put into root->oj_info_list, but we
   * do not do that here.
   */
  static OuterJoinInfo *
  make_outerjoininfo(PlannerInfo *root,
--- 456,468 ----
   * the caller, so that left_rels is always the nonnullable side.  Hence
   * we need only distinguish the LEFT and FULL cases.
   *
!  * The node should eventually be appended to root->oj_info_list, but we
   * do not do that here.
+  *
+  * Note: we assume that this function is invoked bottom-up, so that
+  * root->oj_info_list already contains entries for all outer joins that are
+  * syntactically below this one; and indeed that oj_info_list is ordered
+  * with syntactically lower joins listed first.
   */
  static OuterJoinInfo *
  make_outerjoininfo(PlannerInfo *root,