Re[2]: Re[2]: planner/optimizer hash index method and so on

Поиск
Список
Период
Сортировка
От Alexey V. Meledin
Тема Re[2]: Re[2]: planner/optimizer hash index method and so on
Дата
Msg-id 16863.000405@webclub.ru
обсуждение исходный текст
Ответ на Re: Re[2]: planner/optimizer hash index method and so on  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-sql
Hi, Tom!

Thanks a lot!
The rest small questions are below:

Wednesday, April 05, 2000, 8:04:11 PM, you wrote:

>> But as I understand from documentation, Hash Join is the preferable
>> method of JOIN, because of Seq Scan performed only once (I suppose, that
>> it's not full table scan!?!).
TL> Where in the documentation did you see that?  If it implies that then
TL> it's wrong.  We could certainly simplify the optimizer a lot if it could
TL> just always pick a hash join ;-).
nested iteration join: The right relation is scanned once for every tuple found in the left relation. This strategy is
easyto implement but can be very time consuming.
 
merge sort join: Each relation is sorted on the join attributes before the join starts. Then the two relations are
mergedtogether taking into account that both relations are ordered on the join attributes. This kind of join is more
attractivebecause every relation has to be scanned only once.
 
hash join: the right relation is first hashed on its join attributes. Next the left relation is scanned and the
appropriatevalues of every tuple found are used as hash keys to locate the tuples in the right relation.
 

There is no direct indication, but there is no words about sequential
table scan performed in hash method. So I confused a bit.
(possibly because of my poor english :( ).

>> Nested Joins performs too many scans for results in JOIN, containing
>> many rows and are so slow on large tables (planner/optimizer problem?).
TL> Could be.  The optimizer is a work-in-progress; I wouldn't claim that it
TL> always makes the right choices.  But without more details it's hard to
TL> say if it's doing the wrong thing or not.
for example: planner/optimizer doesn't understand parallel retriving.
select a.value, b.value
from table a, table b, joint aa, joint bb
where a.id=aa.id     AND b.id=bb.id     AND a.id=aa.id     AND bb.value=1     AND aa.value=2;

There are two parallel queries. Planner/optimizer can't recognize it
well. It explains two many Nested joins, that it's simple to make 3
queries (2 INTO temp tables and 1 performing a join).

PS: For PostgreSQL it's (I see) better to rebuild all table if I want to
add or delete fields to object of real life (I mean a goods and it's
behaviour, for example. This is in general to improve admin interface
functionality and possibilities ).

>> 1. What is the difference among "Seq Scan" and "Index Scan" in
>> a planer/optimizer query execution plan?

TL> sequential scan (read whole table sequentially) or index scan (use
TL> index to scan just the rows meeting a WHERE condition that matches
TL> the index).
So if I see that in my query Sequential Scan occur, than that is an
alarm in my database model?

>> So, the difference is in ROWS and WIDTH:
>> PS: Each table has 100000 rows.
>> 1. Nested Loop  (cost=4.05 rows=1 width=28)
->> Index Scan using aaa_index_b on aaa a  (cost=2.05 rows=1 width=12)
->> Index Scan using bbb_pk on bbb b  (cost=2.00 rows=61984 width=16)
>> 2. Nested Loop  (cost=4.05 rows=1 width=12)
->> Index Scan using aaa_index_b on aaa a  (cost=2.05 rows=1 width=4)
->> Index Scan using bbb_pk on bbb b  (cost=2.00 rows=1 width=8)
>> So, questions:
>> 1. can I say, that the second query is more optimal then 1-st?

TL> I'm not sure why 6.5 fails to display a reasonable rows estimate in
TL> the first case.  Current sources produce a more reasonable result.
TL> That rows estimate is clearly wrong, or at least inconsistent with
TL> the estimated cost of the indexscan and the estimated number of
TL> rows produced by the join.
PostgreSQL 6.5.3 on FreeBSD 3.3. build from ports collection.
I've verify with vacuum, but nothing happens. :( small bug

>> 2. Variants I try:
>> 2.1. When I use no indexes, then:
>> 2.1.1. Nested Loop  (cost=2442.50 rows=1 width=28)
>> 2.1.2. Nested Loop  (cost=2442.50 rows=1 width=12)
>> 2.2. I try to set indexes on a.a and b.a, supposing that it helps a
>> bit on relation "a.a=b.a"
>> 2.2.1. Nested Loop  (cost=3355.28 rows=1 width=28)
>> 2.2.2.  Nested Loop  (cost=3355.28 rows=1 width=12)
>> Perfomace become lower!! Why?

TL> Did performance *actually* change?  I don't see how it could,
TL> considering that the plan is the same.  I hope you're not confusing
TL> the planner's cost estimates with reality ;-).
Hmmmm..... BUT HOW CAN I ANALIZE MY QUERY AND INDEXES??????
Confusing .... What to do? I've thought that perfomance changes occur,
correlating on index combination optimizer uses.

TL> If you're wondering why the estimate changed, it's probably because
TL> CREATE INDEX updates the planner's information about number of rows
TL> and number of disk pages the relation has.  I guess you loaded
TL> more data since your last VACUUM.
I provide clean experiment. You are right. vacuum rebuilds
planner/optimizer statistics.

>> 2.4. I add indexes on a.a and a.b
>> Nothing happens!

TL> It's already using an index that covers column b, and the index on
TL> column a doesn't help because it can only use one index in a scan.
But if it has the choise, then what rule it implements.
Why not to use narrow index in this case?

TL>                         regards, tom lane

Regards, Alexey V. Meledin
InterForge Developers Group, Saint-Petersburg
look_to: <www.etcompany.ru><www.businessweb.ru>
<www.inplan.spb.ru><www.pia.ru>>>>>>>>>>>>>>>>>
mail_to: <avm@webclub.ru><nick_as: <cureman>>>>




В списке pgsql-sql по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Re[2]: planner/optimizer hash index method and so on
Следующее
От: "Teresa Mozejko - Ricardo Costa"
Дата:
Сообщение: libpq