Re: Why enable_hashjoin Completely disables HashJoin
От | Quan Zongliang |
---|---|
Тема | Re: Why enable_hashjoin Completely disables HashJoin |
Дата | |
Msg-id | 5d9f64f5-5051-8b9a-bafa-2c3309e8865b@yeah.net обсуждение исходный текст |
Ответ на | Re: Why enable_hashjoin Completely disables HashJoin (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Ответы |
Re: Why enable_hashjoin Completely disables HashJoin
(Robert Haas <robertmhaas@gmail.com>)
|
Список | pgsql-hackers |
On 2023/4/3 19:44, Tomas Vondra wrote: > On 4/3/23 12:23, Quan Zongliang wrote: >> Hi, >> >> I found that the enable_hashjoin disables HashJoin completely. >> It's in the function add_paths_to_joinrel: >> >> if (enable_hashjoin || jointype == JOIN_FULL) >> hash_inner_and_outer(root, joinrel, outerrel, innerrel, >> jointype, &extra); >> >> Instead, it should add a disable cost to the cost calculation of >> hashjoin. And now final_cost_hashjoin does the same thing: >> >> if (!enable_hashjoin) >> startup_cost += disable_cost; >> >> >> enable_mergejoin has the same problem. >> >> Test case: >> >> CREATE TABLE t_score_01( >> s_id int, >> s_score int, >> s_course char(8), >> c_id int); >> >> CREATE TABLE t_student_01( >> s_id int, >> s_name char(8)); >> >> insert into t_score_01 values( >> generate_series(1, 1000000), random()*100, 'course', generate_series(1, >> 1000000)); >> >> insert into t_student_01 values(generate_series(1, 1000000), 'name'); >> >> analyze t_score_01; >> analyze t_student_01; >> >> SET enable_hashjoin TO off; >> SET enable_nestloop TO off; >> SET enable_mergejoin TO off; >> >> explain select count(*) >> from t_student_01 a join t_score_01 b on a.s_id=b.s_id; >> >> After disabling all three, the HashJoin path should still be chosen. >> > > It's not clear to me why that behavior would be desirable? Why is this > an issue you need so solve? > Because someone noticed that when he set enable_hashjoin, enable_mergejoin and enable_nestloop to off. The statement seemed to get stuck (actually because it chose the NestedLoop path, which took a long long time to run). If enable_hashjoin and enable_nestloop disable generating these two paths. Then enable_nestloop should do the same thing, but it doesn't. > AFAIK the reason why some paths are actually disabled (not built at all) > while others are only penalized by adding disable_cost is that we need > to end up with at least one way to execute the query. So we pick a path > that we know is possible (e.g. seqscan) and hard-disable other paths. > But the always-possible path is only soft-disabled by disable_cost. > > For joins, we do the same thing. The hash/merge joins may not be > possible, because the data types may not have hash/sort operators, etc. > Nestloop is always possible. So we soft-disable nestloop but > hard-disable hash/merge joins. > > I doubt we want to change this behavior, unless there's a good reason to > do that ... It doesn't have to change. Because selecting NestedLoop doesn't really get stuck either. It just takes too long to run. I will change the patch status to Withdrawn. > > > regards >
В списке pgsql-hackers по дате отправления:
Предыдущее
От: "Hayato Kuroda (Fujitsu)"Дата:
Сообщение: [PoC] pg_upgrade: allow to upgrade publisher node