Обсуждение: Selection of join algorithm.

Поиск
Список
Период
Сортировка

Selection of join algorithm.

От
Ishaya Bhatt
Дата:
Hi,

I am trying to analyze join performance. But I see that even for a table having 100,000 rows and join attribute as primary key, postgres always performs hash join.

Can anyone please tell me under which conditions merge join or nested loop join is invoked?

Thanks and Regards,
Ishaya

Re: Selection of join algorithm.

От
Atri Sharma
Дата:


On Saturday, March 8, 2014, Ishaya Bhatt <ishayabhatt@gmail.com> wrote:
Hi,

I am trying to analyze join performance. But I see that even for a table having 100,000 rows and join attribute as primary key, postgres always performs hash join.

Can anyone please tell me under which conditions merge join or nested loop join is invoked?



Nested loop is generally performed when one of the tables being joined is small so the inner loop will be fast.

Hash joins are the preferred join types unless the hash table does not fit in work_mem.if that is the case,then some other join algorithm is preferred.

Regards,

Atri 


--
Regards,
 
Atri
l'apprenant

Re: Selection of join algorithm.

От
Jeff Janes
Дата:
On Sat, Mar 8, 2014 at 6:18 AM, Ishaya Bhatt <ishayabhatt@gmail.com> wrote:
Hi,

I am trying to analyze join performance. But I see that even for a table having 100,000 rows and join attribute as primary key, postgres always performs hash join.

Can anyone please tell me under which conditions merge join or nested loop join is invoked?


Unless you trying to look into the source code of postgresql to see how the internals of the planner works, this should really go to pgsql-performance@postgresql.org, not to hackers.

A nested loop would be favored if there were some WHERE condition that filtered out nearly all of the rows of the "outer" table.  In that case, only a small amount of the inner table needs to be accessed, and so reading the whole thing to hash it would be too expensive.

A merge join would be favored if you used an "ORDER BY" to ask for the data to be sorted in the same order as the merge join would naturally deliver it in.

If the data is too large to fit in work_mem, it might favor either the merge join or nested loop compared to the hash join. This stuff is hard to discuss in the abstract.  It is probably best to use the enable_*join settings to see what it does with your actual data (or better yet a synthetic data set whose generator you can share with us).

Cheers,

Jeff