Re: count(*) slow on large tables

Поиск

Список

Период

Сортировка

От	Josh Berkus
Тема	Re: count(*) slow on large tables
Дата	5 октября 2003 г. 18:59:42
Msg-id	200310051157.21555.josh@agliodbs.com обсуждение исходный текст
Ответ на	Re: count(*) slow on large tables (Bruce Momjian <pgman@candle.pha.pa.us>)
Ответы	Re: count(*) slow on large tables (Rod Taylor <rbt@rbt.ca>)
Список	pgsql-performance

Дерево обсуждения

Bruce,

> OK, I beefed up the TODO:
>
>     * Use a fixed row count and a +/- count with MVCC visibility rules
>       to allow fast COUNT(*) queries with no WHERE clause(?)
>
> I can always give the details if someone asks.  It doesn't seem complex
> enough for a separate TODO.detail item.

Hmmm ... this doesn't seem effort-worthy to me.   How often does anyone do
COUNT with no where clause, except GUIs that give you a record count?  (of
course, as always, if someone wants to code it, feel free ...)

And for those GUIs, wouldn't it be 97% as good to run an ANALYZE and give the
approximate record counts for large tables?

As for counts with a WHERE clause, this is obviously up to the user.  Joe
Conway and I tested using a C trigger to track some COUNT ... GROUP BY values
for large tables based on additive numbers.   It worked fairly well for
accuracy, but the performance penalty on data writes was significant ... 8%
to 25% penalty for UPDATES, depending on the frequency and batch size (>
frequency > batch size -->  > penalty)

It's possible that this could be improved through some mechanism more tightly
integrated with the source code.   However,the coding effort would be
significant ( 12-20 hours ) and it's possible that there would be no
improvement, which is why we didn't do it.

We also discussed an asynchronous aggregates collector that would work
something like the statistics collector, and keep pre-programmmed aggregate
data, updating during "low-activity" periods.  This would significantly
reduce the performance penalty, but at the cost of accuracy ... that is, a
1%-5% variance on high-activity tables would be unavoidable, and all cached
aggregates would have to be recalculated on database restart, significantly
slowing down startup.   Again, we felt that the effort-result payoff was not
worthwhile.

Overall, I think the stuff we already have planned ... the hash aggregates in
7.4 and Tom's suggestion of adding an indexable flag to pg_aggs ... are far
more likely to yeild useful fruit than any caching plan.

--
Josh Berkus
Aglio Database Solutions
San Francisco

В списке pgsql-performance по дате отправления:

Предыдущее

От: "Matt Clark"
Дата: 05 октября 2003 г., 18:44:06
Сообщение: Re: reindex/vacuum locking/performance?

Следующее

От: Rod Taylor
Дата: 05 октября 2003 г., 19:11:58
Сообщение: Re: count(*) slow on large tables

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: count(*) slow on large tables

Предыдущее

Следующее