Обсуждение: Costing bug in hash join logic for semi joins

Поиск
Список
Период
Сортировка

Costing bug in hash join logic for semi joins

От
RK
Дата:
There is a costing bug in hash join logic seems to have been introduced by the patch related to inner_unique enhancements(commit: 9c7f5229ad68d7e0e4dd149e3f80257893e404d4). Specifically, "hashjointuples" which tracks the number of matches for hash clauses is computed wrong for inner unique scenario. This leads to lot of semi-joins  incorrectly turn to inner joins with unique on inner side. Function "final_cost_hashjoin" has special handling to cost semi/anti joins to account for early stop after the first match. This is enhanced by the above said commit to be used for inner_unique scenario also. However, "hashjointuples" computation is not fixed to handle this new scenario which is incorrectly stepping into the anti join logic and assuming unmatched rows. Fix is to handle inner_unique case when computing "hashjointuples".  Here is the outline of the code that shows the bug.

void
final_cost_hashjoin(PlannerInfo *root, HashPath *path,
                              JoinCostWorkspace *workspace,
                            JoinPathExtraData *extra)
{
         .....
        /* CPU costs */
        if (path->jpath.jointype == JOIN_SEMI ||
            path->jpath.jointype == JOIN_ANTI ||
            extra->inner_unique)

       {
                 ......
                                           
                 /* Get # of tuples that will pass the basic join */
               if (path->jpath.jointype == JOIN_SEMI)
                  hashjointuples = outer_matched_rows;
              else
                 hashjointuples = outer_path_rows - outer_matched_rows; 

        }
       else
       {
         .....
       }
}

Thanks, RK (Salesforce)

Re: Costing bug in hash join logic for semi joins

От
David Rowley
Дата:
On 10 July 2018 at 11:44, RK <korlapati@gmail.com> wrote:
> There is a costing bug in hash join logic seems to have been introduced by
> the patch related to inner_unique enhancements(commit:
> 9c7f5229ad68d7e0e4dd149e3f80257893e404d4). Specifically, "hashjointuples"
> which tracks the number of matches for hash clauses is computed wrong for
> inner unique scenario. This leads to lot of semi-joins  incorrectly turn to
> inner joins with unique on inner side. Function "final_cost_hashjoin" has
> special handling to cost semi/anti joins to account for early stop after the
> first match. This is enhanced by the above said commit to be used for
> inner_unique scenario also. However, "hashjointuples" computation is not
> fixed to handle this new scenario which is incorrectly stepping into the
> anti join logic and assuming unmatched rows. Fix is to handle inner_unique
> case when computing "hashjointuples".  Here is the outline of the code that
> shows the bug.

Thanks for the analysis and the report. I agree the code is wrong.
Looks simple enough just to handle the semi and unique joins in the
else clause and make the if handle the ANTI join.

I've done that in the attached.  Also on reading the comment above, it
looks slightly incorrect. To me, it looks like it's applying a
twentieth of the cost and not a tenth as the comment claims. I
couldn't resist updating that too.

I didn't feel the need to add any regression tests around this. It
seems like an unlikely bug to ever appear again.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Costing bug in hash join logic for semi joins

От
David Rowley
Дата:
On 10 July 2018 at 22:21, David Rowley <david.rowley@2ndquadrant.com> wrote:
> I've done that in the attached.  Also on reading the comment above, it
> looks slightly incorrect. To me, it looks like it's applying a
> twentieth of the cost and not a tenth as the comment claims. I
> couldn't resist updating that too.

I've added this patch to the September commit fest:

https://commitfest.postgresql.org/19/1720/

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Costing bug in hash join logic for semi joins

От
Tom Lane
Дата:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On 10 July 2018 at 11:44, RK <korlapati@gmail.com> wrote:
>> There is a costing bug in hash join logic seems to have been introduced by
>> the patch related to inner_unique enhancements(commit:
>> 9c7f5229ad68d7e0e4dd149e3f80257893e404d4).

> Thanks for the analysis and the report. I agree the code is wrong.
> Looks simple enough just to handle the semi and unique joins in the
> else clause and make the if handle the ANTI join.

Looks good to me too.  But you should have done "make check-world".
It turns out that this causes a couple of plan choices in contrib tests
to flip back to what they were before 9c7f5229a.  We hadn't questioned
those changes closely at the time, which may have been unduly sloppy.

> I've done that in the attached.  Also on reading the comment above, it
> looks slightly incorrect. To me, it looks like it's applying a
> twentieth of the cost and not a tenth as the comment claims. I
> couldn't resist updating that too.

No, it's correct as-is.  There's a factor of 0.5 in the corresponding
run_cost calculations for the other two cases, so using 0.05 here is
one-tenth as much.

Pushed with those adjustments.

            regards, tom lane