Re: Why hash join cost calculation need reduction

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: Why hash join cost calculation need reduction
Дата
Msg-id 20130613124428.GJ7200@tamriel.snowman.net
обсуждение исходный текст
Ответ на Why hash join cost calculation need reduction  (高健 <luckyjackgao@gmail.com>)
Ответы Re: Why hash join cost calculation need reduction  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
Greetings,

* 高健 (luckyjackgao@gmail.com) wrote:
> And I found the following function of PostgreSQL9.2.1. The hash join cost
> is calculated.
>
> But what confused me  is a reuction calculation:
>
> qp_qual_cost.per_tuple -= hash_qual_cost.per_tuple;
>
> My question is:
>
> Why the reduction  is needed here  for cost calculation?

    cost_qual_eval(&hash_qual_cost, hashclauses, root);

returns the costs for *just the quals which can be used for the
hashjoin*, while

    cost_qual_eval(&qp_qual_cost, path->jpath.joinrestrictinfo, root);

returns the costs for *ALL the quals*

    qp_qual_cost.startup -= hash_qual_cost.startup;

and

    qp_qual_cost.per_tuple -= hash_qual_cost.per_tuple;

extract the cost attributed to the quals used in the hashjoin from the
cost of the other quals in the overall expression.

The reason that we do this is because we're going to use a
hashjoin-specific costing for the qual costs later on in
final_cost_hashjoin:

startup_cost += hash_qual_cost.startup;
run_cost += hash_qual_cost.per_tuple * outer_path_rows *
    clamp_row_est(inner_path_rows * innerbucketsize) * 0.5;

if we didn't do that, we'd end up double-counting those costs.

> In fact , For my sql statement:
>
> <select * from sales s inner join customers c on s.cust_id = c.cust_id;>
>
> When I set  cpu_operator_cost to 0.0025,
>
> qp_qual_cost.per_tuple  and  hash_qual_cost.per_tuple are all 0.0025.
>
> So after reduction,  qp_qual_cost.per_tuple   is set to 0.

Yes, because ALL the quals involved in your statement are quals being
used for the hashjoin- and those costs are calculated later on, as I
illustrated above.

> I think that  per_tuple cost can not be omitted here.

The per-tuple cost isn't omitted, it's added in later based on the
expected costs for doing those per-tuple operations for building and
using the hash table.

    Thanks,

        Stephen

Вложения

В списке pgsql-general по дате отправления:

Предыдущее
От: Andrew Tipton
Дата:
Сообщение: Re: Determining the type (array, object, or scalar) of a JSON value
Следующее
От: Rebecca Clarke
Дата:
Сообщение: Re: Get data type aliases