Re: Question about Expected rows value in EXPLAIN output for Nested Loop node

Поиск
Список
Период
Сортировка
От bb ddd
Тема Re: Question about Expected rows value in EXPLAIN output for Nested Loop node
Дата
Msg-id 302576964.978332.1557491557322@nm63.abv.bg
обсуждение исходный текст
Ответ на Question about Expected rows value in EXPLAIN output for Nested Loop node  (bb ddd <nnickoloff1234@abv.bg>)
Ответы Re: Question about Expected rows value in EXPLAIN output for NestedLoop node  (Adrian Klaver <adrian.klaver@aklaver.com>)
Список pgsql-general
Not sure why some of the new lines disappeared in the original message. 
here is the part that got most corrupted with some more new lines lets 
hope it looks more readable now



1) I have 2 Index Scans inside a Nested Loop

2) Nested Loop's expected rows=1268


3.1) First Index Scan's expected rows=7

3.2) Second Index Scan's expected rows=43810








 >-------- Оригинално писмо --------

 >От: bb ddd nnickoloff1234@abv.bg

 >Относно: Question about Expected rows value in EXPLAIN output for Nested Loop node

 >До: pgsql-general@postgresql.org

 >Изпратено на: 10.05.2019 15:16



 
> Hello,
 
> I have the following question. 
 
> Looking at the output of EXPLAIN ANALYZE:
 
> 1) I have 2 Index Scans inside a Nested Loop2) Nested Loop's expected rows=12683.1) First Index Scan's expected
rows=73.2)Second Index Scan's expected rows=43810
 
 
> I was living under the impression that a Nested Loop's expected rows are ALWAYS equal to the product of its 2 child
nodes,but obviously 7*43810 is far from 1268.What am i missing? 
 
 
> 
 
> How is the 43810 number estimated - more precisely, does it use the stats for the join condition or it is applied
laterand it is based only on the additional filters that are part of that index scan? 
 
 
> And most importantly how is the 1268 number calculated?
 
> Note the second index scan has several filters in addition to the join condition, not sure if relevant but worth
mentioning(i already implied it a couple of lines above, but best to say it explicitly i suppose). 
 
 
> First table has 11K records, second around 76M.
 
> 
 
> I am not sure how these numbers are estimated and any pointer to a detailed documentation of the EXPLAIN ANALYZE
outputfor each type of node and how they are calculated (especially if includes the exact statistics used etc.) will be
greatlyappreciated. Without one at the moment i am only guessing, and below is my best guess what is going on.
 
 
>  
 
> The only way this makes sense to me is if the 43810 row count is estimated only on the additional filters without
takinginto account the join condition and then in the Nested Loop node statistics about how probably it is the join to
succeedare used is applied, because in my case indeed only a very very small fraction (lets say 0.008) of the rows from
thefirst table are referenced in the second one (but that FK is almost never null), so roughly speaking
0.008*(43810*7)=2453is close enough to 1268.
 
 
> So my main question is: Is what i tried to describe above the algorithm these numbers are estimated? Any details on
whatexactly statistics are involved in getting the actual probability for the join condition will be much appreciated
aswell.
 
 
> 
 
> Thanks,Regards,Nikolay



В списке pgsql-general по дате отправления:

Предыдущее
От: bb ddd
Дата:
Сообщение: Question about Expected rows value in EXPLAIN output for Nested Loop node
Следующее
От: Adrian Klaver
Дата:
Сообщение: Re: Question about Expected rows value in EXPLAIN output for NestedLoop node