Re: Why enable_hashjoin Completely disables HashJoin

Поиск
Список
Период
Сортировка
От Quan Zongliang
Тема Re: Why enable_hashjoin Completely disables HashJoin
Дата
Msg-id 5d9f64f5-5051-8b9a-bafa-2c3309e8865b@yeah.net
обсуждение исходный текст
Ответ на Re: Why enable_hashjoin Completely disables HashJoin  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: Why enable_hashjoin Completely disables HashJoin  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers

On 2023/4/3 19:44, Tomas Vondra wrote:
> On 4/3/23 12:23, Quan Zongliang wrote:
>> Hi,
>>
>> I found that the enable_hashjoin disables HashJoin completely.
>> It's in the function add_paths_to_joinrel:
>>
>> if (enable_hashjoin || jointype == JOIN_FULL)
>>      hash_inner_and_outer(root, joinrel, outerrel, innerrel,
>>                  jointype, &extra);
>>
>> Instead, it should add a disable cost to the cost calculation of
>> hashjoin. And now final_cost_hashjoin does the same thing:
>>
>> if (!enable_hashjoin)
>>      startup_cost += disable_cost;
>>
>>
>> enable_mergejoin has the same problem.
>>
>> Test case:
>>
>> CREATE TABLE t_score_01(
>> s_id int,
>> s_score int,
>> s_course char(8),
>> c_id int);
>>
>> CREATE TABLE t_student_01(
>> s_id int,
>> s_name char(8));
>>
>> insert into t_score_01 values(
>> generate_series(1, 1000000), random()*100, 'course', generate_series(1,
>> 1000000));
>>
>> insert into t_student_01 values(generate_series(1, 1000000), 'name');
>>
>> analyze t_score_01;
>> analyze t_student_01;
>>
>> SET enable_hashjoin TO off;
>> SET enable_nestloop TO off;
>> SET enable_mergejoin TO off;
>>
>> explain select count(*)
>> from t_student_01 a join t_score_01 b on a.s_id=b.s_id;
>>
>> After disabling all three, the HashJoin path should still be chosen.
>>
> 
> It's not clear to me why that behavior would be desirable? Why is this
> an issue you need so solve?
> 
Because someone noticed that when he set enable_hashjoin, 
enable_mergejoin and enable_nestloop to off. The statement seemed to get 
stuck (actually because it chose the NestedLoop path, which took a long 
long time to run).
If enable_hashjoin and enable_nestloop disable generating these two 
paths. Then enable_nestloop should do the same thing, but it doesn't.

> AFAIK the reason why some paths are actually disabled (not built at all)
> while others are only penalized by adding disable_cost is that we need
> to end up with at least one way to execute the query. So we pick a path
> that we know is possible (e.g. seqscan) and hard-disable other paths.
> But the always-possible path is only soft-disabled by disable_cost.
> 
> For joins, we do the same thing. The hash/merge joins may not be
> possible, because the data types may not have hash/sort operators, etc.
> Nestloop is always possible. So we soft-disable nestloop but
> hard-disable hash/merge joins.
> 
> I doubt we want to change this behavior, unless there's a good reason to
> do that ...
It doesn't have to change. Because selecting NestedLoop doesn't really 
get stuck either. It just takes too long to run.

I will change the patch status to Withdrawn.
> 
> 
> regards
> 




В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Hayato Kuroda (Fujitsu)"
Дата:
Сообщение: [PoC] pg_upgrade: allow to upgrade publisher node
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: Minimal logical decoding on standbys