DISTINCT and GROUP BY: possible performance enhancement?

Поиск

Список

Период

Сортировка

От	Ang Chin Han
Тема	DISTINCT and GROUP BY: possible performance enhancement?
Дата	7 ноября 2004 г. 14:04:25
Msg-id	8c88543604110703044cf70e15@mail.gmail.com обсуждение исходный текст
Список	pgsql-performance

Дерево обсуждения

Is

SELECT DISTINCT foo, bar FROM baz;

equivalent to

SELECT foo, bar from baz GROUP BY foo, bar;

?

In the former case, pgsql >= 7.4 does not use HashAgg, but uses it for
the latter case. In many circumstances, esp. for large amount of data
in the table baz, the second case is an order of a magnitude faster.

For example (pgsql8b4):

regress=# explain analyze select distinct four from tenk1;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Unique  (cost=1109.39..1159.39 rows=4 width=4) (actual
time=90.017..106.936 rows=4 loops=1)
   ->  Sort  (cost=1109.39..1134.39 rows=10000 width=4) (actual
time=90.008..95.589 rows=10000 loops=1)
         Sort Key: four
         ->  Seq Scan on tenk1  (cost=0.00..445.00 rows=10000 width=4)
(actual time=0.027..45.454 rows=10000 loops=1)
 Total runtime: 110.927 ms
(5 rows)

regress=# explain analyze select distinct four from tenk1 group by four;
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=470.04..470.06 rows=4 width=4) (actual
time=47.487..47.498 rows=4 loops=1)
   ->  Sort  (cost=470.04..470.05 rows=4 width=4) (actual
time=47.484..47.486 rows=4 loops=1)
         Sort Key: four
         ->  HashAggregate  (cost=470.00..470.00 rows=4 width=4)
(actual time=47.444..47.451 rows=4 loops=1)
               ->  Seq Scan on tenk1  (cost=0.00..445.00 rows=10000
width=4) (actual time=0.013..31.068 rows=10000 loops=1)
 Total runtime: 47.822 ms
(6 rows)

If they're equivalent, can we have pgsql use HashAgg for DISTINCTs?
Yes, I've read planner.c's comments on "Executor doesn't support
hashed aggregation with DISTINCT aggregates.", but I believe using
HashAgg is better if the product of the columns' n_distinct statistic
is way less than the number of expected rows.

В списке pgsql-performance по дате отправления:

Предыдущее

От: "Scott Marlowe"
Дата: 07 ноября 2004 г., 00:58:13
Сообщение: Re: poor performance in migrated database

Следующее

От: "Iain"
Дата: 08 ноября 2004 г., 05:12:57
Сообщение: Re: postgresql amd-64

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

DISTINCT and GROUP BY: possible performance enhancement?

Предыдущее

Следующее