Обсуждение: demystifying nested loop vs. merge join query plan choice

Поиск
Список
Период
Сортировка

demystifying nested loop vs. merge join query plan choice

От
Sandeep Gupta
Дата:
I have two  postgres instances each with a database of same schema. The dataset in both is ''same'' but for randomness i.e.  both contain two tables pc(did) and tc(pid, did) that have almost
same number of rows and have been generate from same distribution. 

However the query plan for the join turns out to be completely different: on one join takes 2.3 secs while on the other it takes 7 secs.


Here are the statistics:

for database 1:
     size of tc table:  49987585
     size of pc table: 499616

join plan: 
                                                                        QUERY PLAN                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1534125.08..1534125.09 rows=1 width=0) (actual time=8473.296..8473.296 rows=1 loops=1)
   ->  Merge Join  (cost=2.48..1514765.90 rows=7743672 width=0) (actual time=0.084..8409.065 rows=998038 loops=1)
         Merge Cond: (pc.did = tc.did)
         ->  Index Only Scan using pc_did_idx on pc  (cost=0.00..12987.04 rows=499616 width=4) (actual time=0.016..58.202 rows=499616 loops=1)
               Heap Fetches: 0
         ->  Index Only Scan using tc_did_idx on tc  (cost=0.00..1298125.32 rows=49987616 width=4) (actual time=0.014..5141.809 rows=49997291 loops=1)
               Heap Fetches: 0
 Total runtime: 8473.337 ms
'

Query Running time:  5135


for database 2:
  size of tc table: 50012415
  size of pc table: 500384

                                                                   QUERY PLAN                                                                   
------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=35279895.52..35279895.53 rows=1 width=0) (actual time=2501.970..2501.970 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..35276697.82 rows=1279080 width=0) (actual time=0.038..2418.766 rows=1000834 loops=1)
         ->  Index Only Scan using pc_did_idx on pc  (cost=0.00..15224.56 rows=500384 width=4) (actual time=0.017..109.080 rows=500384 loops=1)
               Heap Fetches: 500384
         ->  Index Only Scan using tc_did_idx on tc  (cost=0.00..70.44 rows=3 width=4) (actual time=0.004..0.004 rows=2 loops=500384)
               Index Cond: (did = pc.did)
               Heap Fetches: 1000834
 Total runtime: 2502.017 ms

Query running time: 2090.388 ms

My question is why is the query plan so different for two datasets that are really exactly the same. And how can i force the plan to be nested index scan on 
database 1 . 


-Sandeep

Re: demystifying nested loop vs. merge join query plan choice

От
Pavel Stehule
Дата:
Hello

do you have same configuration?

Regards

Pavel

2013/7/31 Sandeep Gupta <gupta.sandeep@gmail.com>:
> I have two  postgres instances each with a database of same schema. The
> dataset in both is ''same'' but for randomness i.e.  both contain two tables
> pc(did) and tc(pid, did) that have almost
> same number of rows and have been generate from same distribution.
>
> However the query plan for the join turns out to be completely different: on
> one join takes 2.3 secs while on the other it takes 7 secs.
>
>
> Here are the statistics:
>
> for database 1:
>      size of tc table:  49987585
>      size of pc table: 499616
>
> join plan:
>
> QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=1534125.08..1534125.09 rows=1 width=0) (actual
> time=8473.296..8473.296 rows=1 loops=1)
>    ->  Merge Join  (cost=2.48..1514765.90 rows=7743672 width=0) (actual
> time=0.084..8409.065 rows=998038 loops=1)
>          Merge Cond: (pc.did = tc.did)
>          ->  Index Only Scan using pc_did_idx on pc  (cost=0.00..12987.04
> rows=499616 width=4) (actual time=0.016..58.202 rows=499616 loops=1)
>                Heap Fetches: 0
>          ->  Index Only Scan using tc_did_idx on tc  (cost=0.00..1298125.32
> rows=49987616 width=4) (actual time=0.014..5141.809 rows=49997291 loops=1)
>                Heap Fetches: 0
>  Total runtime: 8473.337 ms
> '
>
> Query Running time:  5135
>
>
> for database 2:
>   size of tc table: 50012415
>   size of pc table: 500384
>
>                                                                    QUERY
> PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=35279895.52..35279895.53 rows=1 width=0) (actual
> time=2501.970..2501.970 rows=1 loops=1)
>    ->  Nested Loop  (cost=0.00..35276697.82 rows=1279080 width=0) (actual
> time=0.038..2418.766 rows=1000834 loops=1)
>          ->  Index Only Scan using pc_did_idx on pc  (cost=0.00..15224.56
> rows=500384 width=4) (actual time=0.017..109.080 rows=500384 loops=1)
>                Heap Fetches: 500384
>          ->  Index Only Scan using tc_did_idx on tc  (cost=0.00..70.44
> rows=3 width=4) (actual time=0.004..0.004 rows=2 loops=500384)
>                Index Cond: (did = pc.did)
>                Heap Fetches: 1000834
>  Total runtime: 2502.017 ms
>
> Query running time: 2090.388 ms
>
> My question is why is the query plan so different for two datasets that are
> really exactly the same. And how can i force the plan to be nested index
> scan on
> database 1 .
>
>
> -Sandeep
>


Re: demystifying nested loop vs. merge join query plan choice

От
Sandeep Gupta
Дата:
Hi Pavel,

 Yes. The postgresql.conf is exactly the same.  The have the same index and clustering and are on the same compute node as well but running on different ports. 

-Sandeep



On Wed, Jul 31, 2013 at 3:14 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hello

do you have same configuration?

Regards

Pavel

2013/7/31 Sandeep Gupta <gupta.sandeep@gmail.com>:
> I have two  postgres instances each with a database of same schema. The
> dataset in both is ''same'' but for randomness i.e.  both contain two tables
> pc(did) and tc(pid, did) that have almost
> same number of rows and have been generate from same distribution.
>
> However the query plan for the join turns out to be completely different: on
> one join takes 2.3 secs while on the other it takes 7 secs.
>
>
> Here are the statistics:
>
> for database 1:
>      size of tc table:  49987585
>      size of pc table: 499616
>
> join plan:
>
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=1534125.08..1534125.09 rows=1 width=0) (actual
> time=8473.296..8473.296 rows=1 loops=1)
>    ->  Merge Join  (cost=2.48..1514765.90 rows=7743672 width=0) (actual
> time=0.084..8409.065 rows=998038 loops=1)
>          Merge Cond: (pc.did = tc.did)
>          ->  Index Only Scan using pc_did_idx on pc  (cost=0.00..12987.04
> rows=499616 width=4) (actual time=0.016..58.202 rows=499616 loops=1)
>                Heap Fetches: 0
>          ->  Index Only Scan using tc_did_idx on tc  (cost=0.00..1298125.32
> rows=49987616 width=4) (actual time=0.014..5141.809 rows=49997291 loops=1)
>                Heap Fetches: 0
>  Total runtime: 8473.337 ms
> '
>
> Query Running time:  5135
>
>
> for database 2:
>   size of tc table: 50012415
>   size of pc table: 500384
>
>                                                                    QUERY
> PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=35279895.52..35279895.53 rows=1 width=0) (actual
> time=2501.970..2501.970 rows=1 loops=1)
>    ->  Nested Loop  (cost=0.00..35276697.82 rows=1279080 width=0) (actual
> time=0.038..2418.766 rows=1000834 loops=1)
>          ->  Index Only Scan using pc_did_idx on pc  (cost=0.00..15224.56
> rows=500384 width=4) (actual time=0.017..109.080 rows=500384 loops=1)
>                Heap Fetches: 500384
>          ->  Index Only Scan using tc_did_idx on tc  (cost=0.00..70.44
> rows=3 width=4) (actual time=0.004..0.004 rows=2 loops=500384)
>                Index Cond: (did = pc.did)
>                Heap Fetches: 1000834
>  Total runtime: 2502.017 ms
>
> Query running time: 2090.388 ms
>
> My question is why is the query plan so different for two datasets that are
> really exactly the same. And how can i force the plan to be nested index
> scan on
> database 1 .
>
>
> -Sandeep
>

Re: demystifying nested loop vs. merge join query plan choice

От
Sandeep Gupta
Дата:
details regarding buffer usage:

for database 1:

                                                                      QUERY PLAN                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1534125.08..1534125.09 rows=1 width=0) (actual time=9149.366..9149.366 rows=1 loops=1)
   Buffers: shared hit=137991
   ->  Merge Join  (cost=2.48..1514765.90 rows=7743672 width=0) (actual time=0.091..9075.008 rows=998038 loops=1)
         Merge Cond: (pc.did = tc.did)
         Buffers: shared hit=137991
         ->  Index Only Scan using pc_did_idx on pc  (cost=0.00..12987.04 rows=499616 width=4) (actual time=0.017..58.237 rows=499616 loops=1)
               Heap Fetches: 0
               Buffers: shared hit=1369
         ->  Index Only Scan using tc_did_idx on tc  (cost=0.00..1298125.32 rows=49987616 width=4) (actual time=0.015..5301.727 rows=49997291 loops=1)
               Heap Fetches: 0
               Buffers: shared hit=136622
 Total runtime: 9149.414 ms
(12 rows)


for database  2:

                                                                 QUERY PLAN                                                                   
------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=35279895.52..35279895.53 rows=1 width=0) (actual time=2386.865..2386.865 rows=1 loops=1)
   Buffers: shared hit=2235978 read=208446
   ->  Nested Loop  (cost=0.00..35276697.82 rows=1279080 width=0) (actual time=0.049..2292.338 rows=1000834 loops=1)
         Buffers: shared hit=2235978 read=208446
         ->  Index Only Scan using pc_did_idx on pc  (cost=0.00..15224.56 rows=500384 width=4) (actual time=0.016..108.407 rows=500384 loops=1)
               Heap Fetches: 500384
               Buffers: shared hit=6 read=3579
         ->  Index Only Scan using tc_did_idx on tc  (cost=0.00..70.44 rows=3 width=4) (actual time=0.003..0.004 rows=2 loops=500384)
               Index Cond: (did = pc.did)
               Heap Fetches: 1000834
               Buffers: shared hit=2235972 read=204867
 Total runtime: 2386.914 ms
(12 rows)




On Wed, Jul 31, 2013 at 3:16 PM, Sandeep Gupta <gupta.sandeep@gmail.com> wrote:
Hi Pavel,

 Yes. The postgresql.conf is exactly the same.  The have the same index and clustering and are on the same compute node as well but running on different ports. 

-Sandeep



On Wed, Jul 31, 2013 at 3:14 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hello

do you have same configuration?

Regards

Pavel

2013/7/31 Sandeep Gupta <gupta.sandeep@gmail.com>:
> I have two  postgres instances each with a database of same schema. The
> dataset in both is ''same'' but for randomness i.e.  both contain two tables
> pc(did) and tc(pid, did) that have almost
> same number of rows and have been generate from same distribution.
>
> However the query plan for the join turns out to be completely different: on
> one join takes 2.3 secs while on the other it takes 7 secs.
>
>
> Here are the statistics:
>
> for database 1:
>      size of tc table:  49987585
>      size of pc table: 499616
>
> join plan:
>
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=1534125.08..1534125.09 rows=1 width=0) (actual
> time=8473.296..8473.296 rows=1 loops=1)
>    ->  Merge Join  (cost=2.48..1514765.90 rows=7743672 width=0) (actual
> time=0.084..8409.065 rows=998038 loops=1)
>          Merge Cond: (pc.did = tc.did)
>          ->  Index Only Scan using pc_did_idx on pc  (cost=0.00..12987.04
> rows=499616 width=4) (actual time=0.016..58.202 rows=499616 loops=1)
>                Heap Fetches: 0
>          ->  Index Only Scan using tc_did_idx on tc  (cost=0.00..1298125.32
> rows=49987616 width=4) (actual time=0.014..5141.809 rows=49997291 loops=1)
>                Heap Fetches: 0
>  Total runtime: 8473.337 ms
> '
>
> Query Running time:  5135
>
>
> for database 2:
>   size of tc table: 50012415
>   size of pc table: 500384
>
>                                                                    QUERY
> PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=35279895.52..35279895.53 rows=1 width=0) (actual
> time=2501.970..2501.970 rows=1 loops=1)
>    ->  Nested Loop  (cost=0.00..35276697.82 rows=1279080 width=0) (actual
> time=0.038..2418.766 rows=1000834 loops=1)
>          ->  Index Only Scan using pc_did_idx on pc  (cost=0.00..15224.56
> rows=500384 width=4) (actual time=0.017..109.080 rows=500384 loops=1)
>                Heap Fetches: 500384
>          ->  Index Only Scan using tc_did_idx on tc  (cost=0.00..70.44
> rows=3 width=4) (actual time=0.004..0.004 rows=2 loops=500384)
>                Index Cond: (did = pc.did)
>                Heap Fetches: 1000834
>  Total runtime: 2502.017 ms
>
> Query running time: 2090.388 ms
>
> My question is why is the query plan so different for two datasets that are
> really exactly the same. And how can i force the plan to be nested index
> scan on
> database 1 .
>
>
> -Sandeep
>


Re: demystifying nested loop vs. merge join query plan choice

От
Tom Lane
Дата:
Sandeep Gupta <gupta.sandeep@gmail.com> writes:
> details regarding buffer usage:
> [ 100% buffer hit rate ]

Your database is evidently fully cached in memory.  If that's the
operating mode you expect, you need to change the planner's cost
parameters, in particular reduce random_page_cost to equal seq_page_cost.
There is plenty of material about this on the PG wiki or in the
pgsql-performance archives.

            regards, tom lane


Re: demystifying nested loop vs. merge join query plan choice

От
Sandeep Gupta
Дата:
@Jeff : Thanks for pointing this out. Turns out that was the case. 

@Tom: Thank you for the reference to random_page_cost parameters. It would be very useful for us. Would go through the rest of the documentation as well.


On Wed, Jul 31, 2013 at 3:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Sandeep Gupta <gupta.sandeep@gmail.com> writes:
> details regarding buffer usage:
> [ 100% buffer hit rate ]

Your database is evidently fully cached in memory.  If that's the
operating mode you expect, you need to change the planner's cost
parameters, in particular reduce random_page_cost to equal seq_page_cost.
There is plenty of material about this on the PG wiki or in the
pgsql-performance archives.

                        regards, tom lane

Re: demystifying nested loop vs. merge join query plan choice

От
BladeOfLight16
Дата:
On Thu, Aug 1, 2013 at 10:25 AM, Sandeep Gupta <gupta.sandeep@gmail.com> wrote:
@Jeff : Thanks for pointing this out. Turns out that was the case. 

@Tom: Thank you for the reference to random_page_cost parameters. It would be very useful for us. Would go through the rest of the documentation as well.

I can't say what Jeff mentioned; maybe he didn't reply to the user list. Anyhow, sorry if this is repeating information.

I cannot help but point something glaring out in the EXPLAIN, though:

database 1:
Index Only Scan using tc_did_idx on tc  (cost=0.00..1298125.32 rows=49987616 width=4)

database 2:
Index Only Scan using tc_did_idx on tc  (cost=0.00..70.44 rows=3 width=4)

Maybe I just don't know how to read EXPLAIN plans, but it would appear that the estimated rows from the index only scan in the two plans is different by a factor of about 16.7 million. database 1 also processes about 7.7 million rows before the aggregate, where database 2 only processes about 1.3 million. For some reason, it appears that database 2 is able to eliminate far more rows more quickly, resulting in a faster query. Have both databases had VACUUM ANALYZE run on them? Are the statistics collection settings the same?

Re: demystifying nested loop vs. merge join query plan choice

От
Jeff Janes
Дата:
On Thu, Aug 1, 2013 at 4:50 PM, BladeOfLight16 <bladeoflight16@gmail.com> wrote:
> On Thu, Aug 1, 2013 at 10:25 AM, Sandeep Gupta <gupta.sandeep@gmail.com>
> wrote:
>>
>> @Jeff : Thanks for pointing this out. Turns out that was the case.
>>
>> @Tom: Thank you for the reference to random_page_cost parameters. It would
>> be very useful for us. Would go through the rest of the documentation as
>> well.
>
>
> I can't say what Jeff mentioned; maybe he didn't reply to the user list.
> Anyhow, sorry if this is repeating information.

I see that I accidentally didn't reply on list, Sorry.  I had just
pointed out that the tables are in vastly different vacuum states
between instances, based on the different heap fetches needed for the
IOS.  (Presumably this means the rest of the stats used for estimates
are all out of tune as well)

>
> I cannot help but point something glaring out in the EXPLAIN, though:
>
> database 1:
>
> Index Only Scan using tc_did_idx on tc  (cost=0.00..1298125.32 rows=49987616
> width=4)
>
> database 2:
>
> Index Only Scan using tc_did_idx on tc  (cost=0.00..70.44 rows=3 width=4)
>
> Maybe I just don't know how to read EXPLAIN plans, but it would appear that
> the estimated rows from the index only scan in the two plans is different by
> a factor of about 16.7 million.

The IOS on database 2 is inside a nested loop, and the whole thing is
executed 500,384 times.  So the error is only a factor of 30, not a
factor of 16 million.


> database 1 also processes about 7.7 million
> rows before the aggregate,

It thinks it will process 7.7 million, it actually processes less than
1 million.

> where database 2 only processes about 1.3
> million. For some reason, it appears that database 2 is able to eliminate
> far more rows more quickly, resulting in a faster query. Have both databases
> had VACUUM ANALYZE run on them? Are the statistics collection settings the
> same?

Yeah, that is the key.  I'm not sure what Sandeep meant by "that was
the case"--it looks like the one with the freshest stats was the one
that was using the slower plan, so hopefully the problem was not fixed
by converting the fast plan to look like the slow one!

Cheers,

Jeff