Re: estimates for nested loop very wrong?

Поиск
Список
Период
Сортировка
От joostje@komputilo.org
Тема Re: estimates for nested loop very wrong?
Дата
Msg-id 20030410194553.GB28537@co.uea.org
обсуждение исходный текст
Ответ на Re: estimates for nested loop very wrong?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: estimates for nested loop very wrong?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: estimates for nested loop very wrong?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: estimates for nested loop very wrong?  (Antti Haapala <antti.haapala@iki.fi>)
Список pgsql-sql
Je 2003/04/10(4)/12:04, Tom Lane skribis:
> joostje@komputilo.org writes:
> > Unless I'm mistaken, pg_nstats.n_distinct should be (aproximately) the same as
> > count(distinct(id)), but it obviously isn't. Also the most_common_freqs
> > values are about a 100 times higher than in reality, and, even tough about
> > 900 values of id occur more often than 40 times, in the 'most_common_vals'
> > list are 7 (of the 10) vals that occur less than 40 times, and the real
> > top two isn't even represented.
> 
> Please try increasing the statistics target (see ALTER TABLE) for db.id, then
> re-analyze and see if the estimates get better.  The default setting is
> 10 --- try 20, 50, 100 to see what happens.

Well, the n_distinct estimates get better, but the cost estimates still
don't quite add up: `actual cost' is 23.24, cost estimate never gets
below 49930.

stat.targ n_distinct| correlation   cost estimate  5         1917   |     0.43189    3621794.92 10         1998   |
0.3909    3618363.33 20         4330   |   -0.247617    1981594.38 50         9708   |   0.0762642     975847.15100
  14604   |    0.030706     657631.41200        21855   |   0.0446929     204335.70500        39980   |  -0.0497829
121000.31
1000        29468   |   0.0366528      49930.08
1000        29453   |   0.0367673      49954.08 Table 1: various estimates as a function of statistical target
actualdistinct values: 42226   actual cost: varies from 5.0 to 27.8   
 
So, the planner still prefers the mergejoin and hashjoin plans, causing
the select to take tens of seconds (60 for the mergejoin, I beleve), wheras
the Nested Loop takes only 0.024 seconds:

For example, for the stat.targ=500 run:
=> explain analyse SELECT  id  from db, tmp0 WHERE valida AND poseda='uea'  AND tab='pers' AND tmp0.v0=id ;
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..121000.31 rows=28184 width=39) (actual time=1.05..23.24 rows=415 loops=1) ->  Seq Scan on tmp0
(cost=0.00..20.00 rows=1000 width=32) (actual time=0.22..0.40 rows=29 loops=1) ->  Index Scan using db_id_idx on db
(cost=0.00..120.63rows=28 width=7) (actual time=0.27..0.75 rows=14 loops=29)
 
Total runtime: 23.92 msec

In the above example, tmp0 had 29 values, that correspond to 415 rows in table db.
Table db has 586157 rows.


The shown select statement is the one used to get all cost estimates
in table 1.

postgresql:   7.2.1 (debian release 3)



В списке pgsql-sql по дате отправления:

Предыдущее
От: Richard Huxton
Дата:
Сообщение: Re: Trigger
Следующее
От: Tom Lane
Дата:
Сообщение: Re: estimates for nested loop very wrong?