Re: Re[2]: planner/optimizer hash index method and so on

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Re[2]: planner/optimizer hash index method and so on
Дата
Msg-id 18041.954950651@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re[2]: planner/optimizer hash index method and so on  ("Alexey V. Meledin" <avm@webclub.ru>)
Ответы Re[2]: Re[2]: planner/optimizer hash index method and so on  ("Alexey V. Meledin" <avm@webclub.ru>)
Список pgsql-sql
"Alexey V. Meledin" <avm@webclub.ru> writes:
>>> create index itd_hash_data_id on ifs_tree_default USING HASH (ifs_data_id);
>>> create index iatd_hash_data_id on ifs_account_tree_data USING HASH (ifs_data_id);

TL> Why would you do that?  The hash index method doesn't have any advantage
TL> over btree that I can see, and it's got a lot of disadvantages.

> But as I understand from documentation, Hash Join is the preferable
> method of JOIN, because of Seq Scan performed only once (I suppose, that
> it's not full table scan!?!).

Where in the documentation did you see that?  If it implies that then
it's wrong.  We could certainly simplify the optimizer a lot if it could
just always pick a hash join ;-).

But more to the point, whether a hash join is used has nothing to do
with what kind of index you have.  You do not need a hash index to
support a hash join.  Hash join means that we build a in-memory hash
table containing the relevant rows from the inner relation (discarding
any that can be rejected due to single-relation WHERE conditions), and
then probe into that table for each row of the outer relation.

> Nested Joins performs too many scans for results in JOIN, containing
> many rows and are so slow on large tables (planner/optimizer problem?).

Could be.  The optimizer is a work-in-progress; I wouldn't claim that it
always makes the right choices.  But without more details it's hard to
say if it's doing the wrong thing or not.

> 1. What is the difference among "Seq Scan" and "Index Scan" in
> a planer/optimizer query execution plan?

sequential scan (read whole table sequentially) or index scan (use
index to scan just the rows meeting a WHERE condition that matches
the index).

> So, the difference is in ROWS and WIDTH:
> PS: Each table has 100000 rows.
> 1. Nested Loop  (cost=4.05 rows=1 width=28)
-> Index Scan using aaa_index_b on aaa a  (cost=2.05 rows=1 width=12)
-> Index Scan using bbb_pk on bbb b  (cost=2.00 rows=61984 width=16)
> 2. Nested Loop  (cost=4.05 rows=1 width=12)
-> Index Scan using aaa_index_b on aaa a  (cost=2.05 rows=1 width=4)
-> Index Scan using bbb_pk on bbb b  (cost=2.00 rows=1 width=8)

> So, questions:
> 1. can I say, that the second query is more optimal then 1-st?

I'm not sure why 6.5 fails to display a reasonable rows estimate in
the first case.  Current sources produce a more reasonable result.
That rows estimate is clearly wrong, or at least inconsistent with
the estimated cost of the indexscan and the estimated number of
rows produced by the join.

> 2. Variants I try:
> 2.1. When I use no indexes, then:
> 2.1.1. Nested Loop  (cost=2442.50 rows=1 width=28)
-> Seq Scan on aaa a  (cost=2440.50 rows=1 width=12)
-> Index Scan using bbb_pk on bbb b  (cost=2.00 rows=61984 width=16)
> 2.1.2. Nested Loop  (cost=2442.50 rows=1 width=12)
-> Seq Scan on aaa a  (cost=2440.50 rows=1 width=4)
-> Index Scan using bbb_pk on bbb b  (cost=2.00 rows=1 width=8)

> 2.2. I try to set indexes on a.a and b.a, supposing that it helps a
> bit on relation "a.a=b.a"
> 2.2.1. Nested Loop  (cost=3355.28 rows=1 width=28)
-> Seq Scan on aaa a  (cost=3353.28 rows=1 width=12)
-> Index Scan using bbb_pk on bbb b  (cost=2.00 rows=85159 width=16)
> 2.2.2.  Nested Loop  (cost=3355.28 rows=1 width=12)
-> Seq Scan on aaa a  (cost=3353.28 rows=1 width=4)
-> Index Scan using bbb_pk on bbb b  (cost=2.00 rows=1 width=8)

> Perfomace become lower!! Why?

Did performance *actually* change?  I don't see how it could,
considering that the plan is the same.  I hope you're not confusing
the planner's cost estimates with reality ;-).

If you're wondering why the estimate changed, it's probably because
CREATE INDEX updates the planner's information about number of rows
and number of disk pages the relation has.  I guess you loaded
more data since your last VACUUM.

> 2.4. I add indexes on a.a and a.b
> Nothing happens!

It's already using an index that covers column b, and the index on
column a doesn't help because it can only use one index in a scan.
        regards, tom lane


В списке pgsql-sql по дате отправления:

Предыдущее
От: Oleg Broytmann
Дата:
Сообщение: Re: Is Pgsql an OORDBMS?
Следующее
От: "Alexey V. Meledin"
Дата:
Сообщение: Re[2]: Re[2]: planner/optimizer hash index method and so on