Re: Maximum statistics target

Поиск
Список
Период
Сортировка
От Peter Eisentraut
Тема Re: Maximum statistics target
Дата
Msg-id 200803101136.06204.peter_e@gmx.net
обсуждение исходный текст
Ответ на Re: Maximum statistics target  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Maximum statistics target  (Cédric Villemain <cedric.villemain@dalibo.com>)
Re: Maximum statistics target  ("Guillaume Smet" <guillaume.smet@gmail.com>)
Re: Maximum statistics target  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Am Freitag, 7. März 2008 schrieb Tom Lane:
> I'm not wedded to the number 1000 in particular --- obviously that's
> just a round number.  But it would be good to see some performance tests
> with larger settings before deciding that we don't need a limit.

Well, I'm not saying we should raise the default statistics target.  But 
setting an arbitrary limit on the grounds that larger values might slow the 
system is like limiting the size of tables because larger tables will cause 
slower queries.  Users should have the option of finding out the best balance 
for themselves.  If there are concerns with larger statistics targets, we 
should document them.  I find nothing about this in the documentation at the 
moment.

> IIRC, egjoinsel is one of the weak spots, so tests involving planning of
> joins between two tables with large MCV lists would be a good place to
> start.

I have run tests with joining two and three tables with 10 million rows each, 
and the planning times seem to be virtually unaffected by the statistics 
target, for values between 10 and 800000.  They all look more or less like 
this:

test=# explain select * from test1, test2 where test1.a = test2.b;                                QUERY PLAN
-----------------------------------------------------------------------------Hash Join  (cost=308311.00..819748.00
rows=10000000width=16)  Hash Cond: (test1.a = test2.b)  ->  Seq Scan on test1  (cost=0.00..144248.00 rows=10000000
width=8) ->  Hash  (cost=144248.00..144248.00 rows=10000000 width=8)        ->  Seq Scan on test2
(cost=0.00..144248.00rows=10000000 width=8)
 
(5 rows)

Time: 132,350 ms

and with indexes

test=# explain select * from test1, test2 where test1.a = test2.b;                                        QUERY PLAN
--------------------------------------------------------------------------------------------Merge Join
(cost=210416.65..714072.26rows=10000000 width=16)  Merge Cond: (test1.a = test2.b)  ->  Index Scan using test1_index1
ontest1  (cost=0.00..282036.13 
 
rows=10000000 width=8)  ->  Index Scan using test2_index1 on test2  (cost=0.00..282036.13 
rows=10000000 width=8)
(4 rows)

Time: 168,455 ms

The time to analyze is also quite constant, just before you run out of 
memory. :)  The MaxAllocSize is the limiting factor in all this.  In my 
example, statistics targets larger than about 800000 created pg_statistic 
rows that would have been larger than 1GB, so they couldn't be stored.

I suggest that we get rid of the limit of 1000, adequately document whatever 
issues might exist with large values (possibly not many, see above), and add 
an error message more user-friendly than "invalid memory alloc request size" 
for the cases where the value is too large to be storable.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: Lazy constraints / defaults
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Include Lists for Text Search