Обсуждение: Question about Expected rows value in EXPLAIN output for Nested Loop node

Поиск
Список
Период
Сортировка

Question about Expected rows value in EXPLAIN output for Nested Loop node

От
bb ddd
Дата:
Hello,
I have the following question. 
Looking at the output of EXPLAIN ANALYZE:
1) I have 2 Index Scans inside a Nested Loop2) Nested Loop's expected rows=12683.1) First Index Scan's expected
rows=73.2)Second Index Scan's expected rows=43810
 
I was living under the impression that a Nested Loop's expected rows are ALWAYS equal to the product of its 2 child
nodes,but obviously 7*43810 is far from 1268.What am i missing? 
 

How is the 43810 number estimated - more precisely, does it use the stats for the join condition or it is applied later
andit is based only on the additional filters that are part of that index scan? 
 
And most importantly how is the 1268 number calculated?
Note the second index scan has several filters in addition to the join condition, not sure if relevant but worth
mentioning(i already implied it a couple of lines above, but best to say it explicitly i suppose). 
 
First table has 11K records, second around 76M.

I am not sure how these numbers are estimated and any pointer to a detailed documentation of the EXPLAIN ANALYZE output
foreach type of node and how they are calculated (especially if includes the exact statistics used etc.) will be
greatlyappreciated. Without one at the moment i am only guessing, and below is my best guess what is going on.
 
 
The only way this makes sense to me is if the 43810 row count is estimated only on the additional filters without
takinginto account the join condition and then in the Nested Loop node statistics about how probably it is the join to
succeedare used is applied, because in my case indeed only a very very small fraction (lets say 0.008) of the rows from
thefirst table are referenced in the second one (but that FK is almost never null), so roughly speaking
0.008*(43810*7)=2453is close enough to 1268.
 
So my main question is: Is what i tried to describe above the algorithm these numbers are estimated? Any details on
whatexactly statistics are involved in getting the actual probability for the join condition will be much appreciated
aswell.
 

Thanks,Regards,Nikolay



Re: Question about Expected rows value in EXPLAIN output for Nested Loop node

От
bb ddd
Дата:
Not sure why some of the new lines disappeared in the original message. 
here is the part that got most corrupted with some more new lines lets 
hope it looks more readable now



1) I have 2 Index Scans inside a Nested Loop

2) Nested Loop's expected rows=1268


3.1) First Index Scan's expected rows=7

3.2) Second Index Scan's expected rows=43810








 >-------- Оригинално писмо --------

 >От: bb ddd nnickoloff1234@abv.bg

 >Относно: Question about Expected rows value in EXPLAIN output for Nested Loop node

 >До: pgsql-general@postgresql.org

 >Изпратено на: 10.05.2019 15:16



 
> Hello,
 
> I have the following question. 
 
> Looking at the output of EXPLAIN ANALYZE:
 
> 1) I have 2 Index Scans inside a Nested Loop2) Nested Loop's expected rows=12683.1) First Index Scan's expected
rows=73.2)Second Index Scan's expected rows=43810
 
 
> I was living under the impression that a Nested Loop's expected rows are ALWAYS equal to the product of its 2 child
nodes,but obviously 7*43810 is far from 1268.What am i missing? 
 
 
> 
 
> How is the 43810 number estimated - more precisely, does it use the stats for the join condition or it is applied
laterand it is based only on the additional filters that are part of that index scan? 
 
 
> And most importantly how is the 1268 number calculated?
 
> Note the second index scan has several filters in addition to the join condition, not sure if relevant but worth
mentioning(i already implied it a couple of lines above, but best to say it explicitly i suppose). 
 
 
> First table has 11K records, second around 76M.
 
> 
 
> I am not sure how these numbers are estimated and any pointer to a detailed documentation of the EXPLAIN ANALYZE
outputfor each type of node and how they are calculated (especially if includes the exact statistics used etc.) will be
greatlyappreciated. Without one at the moment i am only guessing, and below is my best guess what is going on.
 
 
>  
 
> The only way this makes sense to me is if the 43810 row count is estimated only on the additional filters without
takinginto account the join condition and then in the Nested Loop node statistics about how probably it is the join to
succeedare used is applied, because in my case indeed only a very very small fraction (lets say 0.008) of the rows from
thefirst table are referenced in the second one (but that FK is almost never null), so roughly speaking
0.008*(43810*7)=2453is close enough to 1268.
 
 
> So my main question is: Is what i tried to describe above the algorithm these numbers are estimated? Any details on
whatexactly statistics are involved in getting the actual probability for the join condition will be much appreciated
aswell.
 
 
> 
 
> Thanks,Regards,Nikolay



Re: Question about Expected rows value in EXPLAIN output for NestedLoop node

От
Adrian Klaver
Дата:
On 5/10/19 5:32 AM, bb ddd wrote:
> 
> Not sure why some of the new lines disappeared in the original message.
> here is the part that got most corrupted with some more new lines lets
> hope it looks more readable now
> 
> 
> 
> 1) I have 2 Index Scans inside a Nested Loop
> 
> 2) Nested Loop's expected rows=1268
> 
> 
> 3.1) First Index Scan's expected rows=7
> 
> 3.2) Second Index Scan's expected rows=43810
> 

The best way to handle this is to post your EXPLAIN output to:

https://explain.depesz.com/

and then post the link here.

-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Question about Expected rows value in EXPLAIN output for Nested Loop node

От
bb ddd
Дата:
Thanks for the reply. Here it is: https://explain.depesz.com/s/LQCS 

My main trouble is not with this exact case, but i am looking for a general description of the algorithm how those
numbersare calculated. In the meanwhile i also constructed a couple of artificial tables to experiment with and see how
thesenumbers change, and am pretty sure my guesses there in the original question are quite close to what is going on,
butwould be so much easier to read explain plans, if there was some documentation about how the expected rows are
calculatedfor different nodes, in different contexts (like when loops=1 vs loops>1) maybe what statistics they use, how
theydepend on the numbers of their child nodes etc. 
 








 >-------- Оригинално писмо --------

 >От: Adrian Klaver adrian.klaver@aklaver.com

 >Относно: Re: Question about Expected rows value in EXPLAIN output for Nested
 Loop node

 >До: bb ddd <nnickoloff1234@abv.bg>

 >Изпратено на: 10.05.2019 16:52



 
> On 5/10/19 5:32 AM, bb ddd wrote:
 
> > 
 
> > Not sure why some of the new lines disappeared in the original message.
 
> > here is the part that got most corrupted with some more new lines lets
 
> > hope it looks more readable now
 
> > 
 
> > 
 
> > 
 
> > 1) I have 2 Index Scans inside a Nested Loop
 
> > 
 
> > 2) Nested Loop's expected rows=1268
 
> > 
 
> > 
 
> > 3.1) First Index Scan's expected rows=7
 
> > 
 
> > 3.2) Second Index Scan's expected rows=43810
 
> > 
 
> 
 
> The best way to handle this is to post your EXPLAIN output to:
 
> 
 
> https://explain.depesz.com/
 
> 
 
> and then post the link here.
 
> 
 
> -- 
 
> Adrian Klaver
 
> adrian.klaver@aklaver.com



Re: Question about Expected rows value in EXPLAIN output for NestedLoop node

От
Adrian Klaver
Дата:
On 5/10/19 8:56 AM, bb ddd wrote:
> Thanks for the reply. Here it is: https://explain.depesz.com/s/LQCS
> 
> My main trouble is not with this exact case, but i am looking for a general description of the algorithm how those
numbersare calculated. In the meanwhile i also constructed a couple of artificial tables to experiment with and see how
thesenumbers change, and am pretty sure my guesses there in the original question are quite close to what is going on,
butwould be so much easier to read explain plans, if there was some documentation about how the expected rows are
calculatedfor different nodes, in different contexts (like when loops=1 vs loops>1) maybe what statistics they use, how
theydepend on the numbers of their child nodes etc.
 
> 
> 

How about?:
https://www.postgresql.org/docs/11/using-explain.html


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Question about Expected rows value in EXPLAIN output for Nested Loop node

От
bb ddd
Дата:
Thanks, Adrian, but i was looking for something that goes into more depth.
For example there is one case described there where we have a Nested Loop with rows=33, and its 2 child nodes have each
rows=10.
But first of all this is a very exotic join condition (t1.hundred < t2.hundred) and second of all i cannot find any
explanationhow this number 33 is derived. They literally spend less than 2 sentences on this case.
 

In my case i as well have Nested Loop's rows value different than the product of its 2 children's rows values, but with
anormal join condition on a foreign key.
 
My guesses (again) are the same way it keeps some statistics (very curious what exactly) about what is the probability
2random rows from each table satisfy that condition (t1.hundred < t2.hundred), it also keeps statistics what is the
probability2 random rows from each table satisfy the regular normal join condition like the one i have in my case
(basicallyt1.t2_id=t2.id). And in both cases it just applies that probability to the product of the rows values of the
2child nodes, to calculate the expected rows value of the result of the Nested Loop. But i am just guessing.
 







 >-------- Оригинално писмо --------

 >От: Adrian Klaver adrian.klaver@aklaver.com

 >Относно: Re: Question about Expected rows value in EXPLAIN output for Nested
 Loop node

 >До: bb ddd <nnickoloff1234@abv.bg>

 >Изпратено на: 10.05.2019 19:02



 
> On 5/10/19 8:56 AM, bb ddd wrote:
 
> > Thanks for the reply. Here it is: https://explain.depesz.com/s/LQCS
 
> > 
 
> > My main trouble is not with this exact case, but i am looking for a general description of the algorithm how those
numbersare calculated. In the meanwhile i also constructed a couple of artificial tables to experiment with and see how
thesenumbers change, and am pretty sure my guesses there in the original question are quite close to what is going on,
butwould be so much easier to read explain plans, if there was some documentation about how the expected rows are
calculatedfor different nodes, in different contexts (like when loops=1 vs loops>1) maybe what statistics they use, how
theydepend on the numbers of their child nodes etc.
 
 
> > 
 
> > 
 
> 
 
> How about?:
 
> https://www.postgresql.org/docs/11/using-explain.html
 
> 
 
> 
 
> -- 
 
> Adrian Klaver
 
> adrian.klaver@aklaver.com



Re: Question about Expected rows value in EXPLAIN output for NestedLoop node

От
Adrian Klaver
Дата:
On 5/11/19 2:33 AM, bb ddd wrote:
> 
> Thanks, Adrian, but i was looking for something that goes into more depth.
> For example there is one case described there where we have a Nested Loop with rows=33, and its 2 child nodes have
eachrows=10.
 
> But first of all this is a very exotic join condition (t1.hundred < t2.hundred) and second of all i cannot find any
explanationhow this number 33 is derived. They literally spend less than 2 sentences on this case.
 
> 
> In my case i as well have Nested Loop's rows value different than the product of its 2 children's rows values, but
witha normal join condition on a foreign key.
 
> My guesses (again) are the same way it keeps some statistics (very curious what exactly) about what is the
probability2 random rows from each table satisfy that condition 
 

https://www.postgresql.org/docs/11/planner-stats-details.html

(t1.hundred < t2.hundred), it also keeps statistics what is the 
probability 2 random rows from each table satisfy the regular normal 
join condition like the one i have in my case (basically 
t1.t2_id=t2.id). And in both cases it just applies that probability to 
the product of the rows values of the 2 child nodes, to calculate the 
expected rows value of the result of the Nested Loop. But i am just 
guessing.
> 
> 
> 
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Question about Expected rows value in EXPLAIN output for Nested Loop node

От
Tom Lane
Дата:
bb ddd <nnickoloff1234@abv.bg> writes:
> Thanks, Adrian, but i was looking for something that goes into more depth.

Use the source, Luke.

git clone git://git.postgresql.org/git/postgresql.git

Likely places to look for this purpose include

src/backend/optimizer/README
src/backend/optimizer/path/costsize.c
src/backend/optimizer/path/clausesel.c
src/backend/utils/adt/selfuncs.c

The short answer to your question is that no, the size of the join
relation is not estimated by multiplying the sizes of input paths
for some particular nestloop plan.  The join size estimate is made
first and the same size is applied to all paths for the join.

In the case of a nestloop where the entire join condition was pushed
into the RHS path, you'd ideally expect that the LHS size times the
estimated size of the filtered RHS path matches the previously-made
join size estimate ... but those are different code paths using
different estimators, and since this is all very approximate and
full of heuristic guesses, sometimes they don't match very well.

            regards, tom lane