Re[2]: Re[2]: planner/optimizer hash index method and so on
От | Alexey V. Meledin |
---|---|
Тема | Re[2]: Re[2]: planner/optimizer hash index method and so on |
Дата | |
Msg-id | 16863.000405@webclub.ru обсуждение исходный текст |
Ответ на | Re: Re[2]: planner/optimizer hash index method and so on (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-sql |
Hi, Tom! Thanks a lot! The rest small questions are below: Wednesday, April 05, 2000, 8:04:11 PM, you wrote: >> But as I understand from documentation, Hash Join is the preferable >> method of JOIN, because of Seq Scan performed only once (I suppose, that >> it's not full table scan!?!). TL> Where in the documentation did you see that? If it implies that then TL> it's wrong. We could certainly simplify the optimizer a lot if it could TL> just always pick a hash join ;-). nested iteration join: The right relation is scanned once for every tuple found in the left relation. This strategy is easyto implement but can be very time consuming. merge sort join: Each relation is sorted on the join attributes before the join starts. Then the two relations are mergedtogether taking into account that both relations are ordered on the join attributes. This kind of join is more attractivebecause every relation has to be scanned only once. hash join: the right relation is first hashed on its join attributes. Next the left relation is scanned and the appropriatevalues of every tuple found are used as hash keys to locate the tuples in the right relation. There is no direct indication, but there is no words about sequential table scan performed in hash method. So I confused a bit. (possibly because of my poor english :( ). >> Nested Joins performs too many scans for results in JOIN, containing >> many rows and are so slow on large tables (planner/optimizer problem?). TL> Could be. The optimizer is a work-in-progress; I wouldn't claim that it TL> always makes the right choices. But without more details it's hard to TL> say if it's doing the wrong thing or not. for example: planner/optimizer doesn't understand parallel retriving. select a.value, b.value from table a, table b, joint aa, joint bb where a.id=aa.id AND b.id=bb.id AND a.id=aa.id AND bb.value=1 AND aa.value=2; There are two parallel queries. Planner/optimizer can't recognize it well. It explains two many Nested joins, that it's simple to make 3 queries (2 INTO temp tables and 1 performing a join). PS: For PostgreSQL it's (I see) better to rebuild all table if I want to add or delete fields to object of real life (I mean a goods and it's behaviour, for example. This is in general to improve admin interface functionality and possibilities ). >> 1. What is the difference among "Seq Scan" and "Index Scan" in >> a planer/optimizer query execution plan? TL> sequential scan (read whole table sequentially) or index scan (use TL> index to scan just the rows meeting a WHERE condition that matches TL> the index). So if I see that in my query Sequential Scan occur, than that is an alarm in my database model? >> So, the difference is in ROWS and WIDTH: >> PS: Each table has 100000 rows. >> 1. Nested Loop (cost=4.05 rows=1 width=28) ->> Index Scan using aaa_index_b on aaa a (cost=2.05 rows=1 width=12) ->> Index Scan using bbb_pk on bbb b (cost=2.00 rows=61984 width=16) >> 2. Nested Loop (cost=4.05 rows=1 width=12) ->> Index Scan using aaa_index_b on aaa a (cost=2.05 rows=1 width=4) ->> Index Scan using bbb_pk on bbb b (cost=2.00 rows=1 width=8) >> So, questions: >> 1. can I say, that the second query is more optimal then 1-st? TL> I'm not sure why 6.5 fails to display a reasonable rows estimate in TL> the first case. Current sources produce a more reasonable result. TL> That rows estimate is clearly wrong, or at least inconsistent with TL> the estimated cost of the indexscan and the estimated number of TL> rows produced by the join. PostgreSQL 6.5.3 on FreeBSD 3.3. build from ports collection. I've verify with vacuum, but nothing happens. :( small bug >> 2. Variants I try: >> 2.1. When I use no indexes, then: >> 2.1.1. Nested Loop (cost=2442.50 rows=1 width=28) >> 2.1.2. Nested Loop (cost=2442.50 rows=1 width=12) >> 2.2. I try to set indexes on a.a and b.a, supposing that it helps a >> bit on relation "a.a=b.a" >> 2.2.1. Nested Loop (cost=3355.28 rows=1 width=28) >> 2.2.2. Nested Loop (cost=3355.28 rows=1 width=12) >> Perfomace become lower!! Why? TL> Did performance *actually* change? I don't see how it could, TL> considering that the plan is the same. I hope you're not confusing TL> the planner's cost estimates with reality ;-). Hmmmm..... BUT HOW CAN I ANALIZE MY QUERY AND INDEXES?????? Confusing .... What to do? I've thought that perfomance changes occur, correlating on index combination optimizer uses. TL> If you're wondering why the estimate changed, it's probably because TL> CREATE INDEX updates the planner's information about number of rows TL> and number of disk pages the relation has. I guess you loaded TL> more data since your last VACUUM. I provide clean experiment. You are right. vacuum rebuilds planner/optimizer statistics. >> 2.4. I add indexes on a.a and a.b >> Nothing happens! TL> It's already using an index that covers column b, and the index on TL> column a doesn't help because it can only use one index in a scan. But if it has the choise, then what rule it implements. Why not to use narrow index in this case? TL> regards, tom lane Regards, Alexey V. Meledin InterForge Developers Group, Saint-Petersburg look_to: <www.etcompany.ru><www.businessweb.ru> <www.inplan.spb.ru><www.pia.ru>>>>>>>>>>>>>>>>> mail_to: <avm@webclub.ru><nick_as: <cureman>>>>
В списке pgsql-sql по дате отправления: