Re: estimates for nested loop very wrong?

Поиск
Список
Период
Сортировка
От joostje@komputilo.org
Тема Re: estimates for nested loop very wrong?
Дата
Msg-id 20030410162902.GA28537@co.uea.org
обсуждение исходный текст
Ответ на Re: estimates for nested loop very wrong?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: estimates for nested loop very wrong?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-sql
Je 2003/04/10(4)/10:04, Tom Lane skribis:

> Have you done an ANALYZE or VACUUM ANALYZE recently?

Jes, both, actually, and the `analyse' quite a few times.

> > Nested Loop  (cost=0.00..208256.60 rows=61140 width=38) (actual time=0.92..18.49 rows=756 loops=1)

> The planner is evidently estimating that each row of tmp1 will match 2600+
> rows of db, whereas in reality there is only one match.  Rather than
> mess with enable_hashjoin, you need to find out why that estimate is so
> badly off.  Are the entries in tmp1 specially selected to correspond to
> unique rows of db?

Well, each entry in tmp1 matches with about 7-80 entries in db, but yes
the problem indeed seems to be that the estimate is so far off.
And no, the entries in tmp1 are not specially selected, they correspond
to `normal' values of id in db (values that are about as frequent as
other values).

I have done VACUUM ANALYSE on the table (and drop index; create index db_id_idx on db(id);).
=> analyse db;
=> select n_distinct from pg_stats where tablename='db' and attname='id';       1996
=> select count(distinct(id)) from db;      42225

Unless I'm mistaken, pg_nstats.n_distinct should be (aproximately) the same as
count(distinct(id)), but it obviously isn't. Also the most_common_freqs
values are about a 100 times higher than in reality, and, even tough about
900 values of id occur more often than 40 times, in the 'most_common_vals'
list are 7 (of the 10) vals that occur less than 40 times, and the real
top two isn't even represented.


(BTW, the table I'm using now is a little smaller, as it turned out that
a few (75%) of the entries in db had only 3 different id values. This
didn't have any effect on the accurateness of the estimates, though).

BTW,
=> select count(id) from db;    586035


Thanks!
joostje

-- what pg_stat thinks about db.id:

=> select n_distinct, most_common_vals, most_common_freqs from pg_stats where tablename='db' and
attname='id';n_distinct|                  most_common_vals                   |
most_common_freqs                                           
 

------------+-----------------------------------------------------+-------------------------------------------------------------------------------------------------------
    1907 | {subo,smys,raha,sjbo,sdai,roal,sooi,stsw,rmwi,snuw} |
{0.00733333,0.007,0.00633333,0.00633333,0.006,0.00566667,0.00566667,0.00566667,0.00533333,0.00533333}


--these estimates are far off:

=> select id, count (id)/586035.0 from db where id='subo' or id='smys' or id='raha' or id='sjbo' or id='sdai' or
id='roal'or id='sooi' or id='stsw' or id='rmwi' or id='snuw' group by id; id  |       ?column?       
 
------+----------------------raha | 0.000156987210661479rmwi | 3.24212717670446e-05roal |  6.3136160809508e-05sdai |
8.70255189536461e-05sjbo|  6.3136160809508e-05smys |  7.5080839881577e-05snuw | 4.26595681145324e-05sooi |
0.000114327642546947stsw| 6.14297780849267e-05subo | 5.11914817374389e-05
 


--and these would be the real most_common_freqs:

=> select id, count(id), count(id)/586035.0 from db group by id order by - count(id) limit 10;  id   | count |
?column?      
 
--------+-------+----------------------indmem |   194 | 0.000331038248568771hton   |    97 | 0.000165519124284386raha
|   92 | 0.000156987210661479simo   |    87 | 0.000148455297038573sugn   |    87 | 0.000148455297038573rjgl   |    85 |
0.00014504253158941hroy   |    84 | 0.000143336148864829jrgv   |    84 | 0.000143336148864829tojo   |    83 |
0.000141629766140248lucy  |    82 | 0.000139923383415666
 


-- the above all done after a 'vacuum analyse';



В списке pgsql-sql по дате отправления:

Предыдущее
От: Franco Bruno Borghesi
Дата:
Сообщение: Re: Trigger
Следующее
От: Tom Lane
Дата:
Сообщение: Re: estimates for nested loop very wrong?