Re[2]: planner/optimizer hash index method and so on

Поиск
Список
Период
Сортировка
От Alexey V. Meledin
Тема Re[2]: planner/optimizer hash index method and so on
Дата
Msg-id 11555.000405@webclub.ru
обсуждение исходный текст
Ответ на Re: planner complaints (was approve VKPts5 unsubscribe pgsql)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Re[2]: planner/optimizer hash index method and so on  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-sql
Hi, Tom!

Monday, April 03, 2000, 6:32:06 PM, you wrote:

>> 2. I've replace btree indexes on relation
>> AND atd.ifs_data_id = def.ifs_data_id;
>> with:
>> create index itd_hash_data_id on ifs_tree_default USING HASH (ifs_data_id);
>> create index iatd_hash_data_id on ifs_account_tree_data USING HASH (ifs_data_id);

TL> Why would you do that?  The hash index method doesn't have any advantage
TL> over btree that I can see, and it's got a lot of disadvantages.
But as I understand from documentation, Hash Join is the preferable
method of JOIN, because of Seq Scan performed only once (I suppose, that
it's not full table scan!?!).
Nested Joins performs too many scans for results in JOIN, containing
many rows and are so slow on large tables (planner/optimizer problem?).
Thereby I've found that it's more efficient to make several queries
from hard one (planner/optimizer explains me up to 8 neseted loops on it :().

Additional questions:
1. What is the difference among "Seq Scan" and "Index Scan" in
a planer/optimizer query execution plan?
Seq Scan - "sequential data scan" or "sequential scan, based on
"serial" field"?

2. I have
create table aaa
(a int4, b int4, c int4, d int4,CONSTRAINT aaa_pk PRIMARY KEY(a,b,c)
);

create table bbb (
a int4, b int4, c int4, g serial, f int4
,CONSTRAINT bbb_pk PRIMARY KEY (a, b, c, g));

create index aaa_index_a on aaa (a);
create index aaa_index_b on aaa (b);
create index aaa_index_c on aaa (c);
create index aaa_index_d on aaa (d);

--create index bbb_index_fk on bbb (a,b,c); --FK
create index bbb_index_a on bbb (a);
create index bbb_index_b on bbb (b);
create index bbb_index_c on bbb (c);

!PLEASE, verify indexes (I'm not shure at 100%)

explain select b.f from bbb b, aaa a
where a.b=50001     and a.c=50002     and a.d=50003     -- at this point I have "aaa" rows for join     -- performing
jon    and a.b=b.b     and a.c=b.c     and a.a=b.a;
 
Nested Loop  (cost=4.05 rows=1 width=28) ->  Index Scan using aaa_index_b on aaa a  (cost=2.05 rows=1 width=12) ->
IndexScan using bbb_pk on bbb b  (cost=2.00 rows=61984 width=16) 
 
explain select b.f from bbb b, aaa a
where a.b=50001     and a.c=50002     and a.d=50003     -- at this point I have "aaa" rows for join     and b.b=2
andb.c=3     -- at this point I have "bbb" rows for join     -- performing jon     and a.a=b.a;
 
Nested Loop  (cost=4.05 rows=1 width=12) ->  Index Scan using aaa_index_b on aaa a  (cost=2.05 rows=1 width=4) ->
IndexScan using bbb_pk on bbb b  (cost=2.00 rows=1 width=8)
 

=================================
Two queries are almost equal, but in the first I perform join by wide
PK key. In the second only one key is explicity used in JOIN.

So, the difference is in ROWS and WIDTH:
PS: Each table has 100000 rows.
1. Nested Loop  (cost=4.05 rows=1 width=28) ->  Index Scan using aaa_index_b on aaa a  (cost=2.05 rows=1 width=12) ->
IndexScan using bbb_pk on bbb b  (cost=2.00 rows=61984 width=16)
 
2. Nested Loop  (cost=4.05 rows=1 width=12) ->  Index Scan using aaa_index_b on aaa a  (cost=2.05 rows=1 width=4) ->
IndexScan using bbb_pk on bbb b  (cost=2.00 rows=1 width=8)
 

So, questions:
1. can I say, that the second query is more optimal then 1-st?

2. Variants I try:
2.1. When I use no indexes, then:
2.1.1. Nested Loop  (cost=2442.50 rows=1 width=28) ->  Seq Scan on aaa a  (cost=2440.50 rows=1 width=12) ->  Index Scan
usingbbb_pk on bbb b  (cost=2.00 rows=61984 width=16)
 
2.1.2. Nested Loop  (cost=2442.50 rows=1 width=12) ->  Seq Scan on aaa a  (cost=2440.50 rows=1 width=4) ->  Index Scan
usingbbb_pk on bbb b  (cost=2.00 rows=1 width=8)
 

2.2. I try to set indexes on a.a and b.a, supposing that it helps a
bit on relation "a.a=b.a"
2.2.1. Nested Loop  (cost=3355.28 rows=1 width=28) ->  Seq Scan on aaa a  (cost=3353.28 rows=1 width=12) ->  Index Scan
usingbbb_pk on bbb b  (cost=2.00 rows=85159 width=16)
 
2.2.2.  Nested Loop  (cost=3355.28 rows=1 width=12) ->  Seq Scan on aaa a  (cost=3353.28 rows=1 width=4) ->  Index Scan
usingbbb_pk on bbb b  (cost=2.00 rows=1 width=8)
 

Perfomace become lower!! Why?

2.3. I've set only this indexes:
create index aaa_index_bcd on aaa (b,c,d);
create index bbb_index_bc on bbb (b,c);
and take the most better perfomance:
2.3.1. Nested Loop  (cost=4.00 rows=1 width=28) ->  Index Scan using aaa_index_bcd on aaa a  (cost=2.00 rows=1
width=12)->  Index Scan using bbb_pk on bbb b  (cost=2.00 rows=85159 width=16)
 
2.3.2. Nested Loop  (cost=4.00 rows=1 width=12) ->  Index Scan using aaa_index_bcd on aaa a  (cost=2.00 rows=1 width=4)
-> Index Scan using bbb_pk on bbb b  (cost=2.00 rows=1 width=8)
 

2.4. I add indexes on a.a and a.b
2.4.1. Nested Loop  (cost=4.00 rows=1 width=28) ->  Index Scan using aaa_index_bcd on aaa a  (cost=2.00 rows=1
width=12)->  Index Scan using bbb_pk on bbb b  (cost=2.00 rows=85159 width=16)
 
2.4.2. Nested Loop  (cost=4.00 rows=1 width=12) ->  Index Scan using aaa_index_bcd on aaa a  (cost=2.00 rows=1 width=4)
-> Index Scan using bbb_pk on bbb b  (cost=2.00 rows=1 width=8)
 

Nothing happens!
Is it occurs because PostgreSQL makes JOIN on a.a and b.a in memory?
Or Use of wide PK is more prefferable than simple index?

Can anybody collect results of this small experiment and make right
deduction?

Regards, Alexey V. Meledin
InterForge Developers Group, Saint-Petersburg
look_to: <www.etcompany.ru><www.businessweb.ru>
<www.inplan.spb.ru><www.pia.ru>>>>>>>>>>>>>>>>>
mail_to: <avm@webclub.ru><nick_as: <cureman>>>>




В списке pgsql-sql по дате отправления:

Предыдущее
От: "Moray McConnachie"
Дата:
Сообщение: Re: Foreign Key Implementation
Следующее
От: Kovacs Zoltan Sandor
Дата:
Сообщение: Re: Foreign Key Implementation