Re: vacuum, performance, and MVCC

Поиск
Список
Период
Сортировка
От Mark Woodward
Тема Re: vacuum, performance, and MVCC
Дата
Msg-id 18739.24.91.171.78.1151063858.squirrel@mail.mohawksoft.com
обсуждение исходный текст
Ответ на Re: vacuum, performance, and MVCC  (Csaba Nagy <nagy@ecircle-ag.com>)
Ответы Re: vacuum, performance, and MVCC  (Csaba Nagy <nagy@ecircle-ag.com>)
Список pgsql-hackers
>>     I suppose you have a table memberships (user_id, group_id) or something
>> like it ; it should have as few columns as possible ; then try regularly
>> clustering on group_id (maybe once a week) so that all the records for a
>> particular group are close together. Getting the members of a group to
>> send them an email should be faster (less random seeks).
>
> It is like this, and some more bookkeeping data which must be there...
> we could split the table for smaller records or for updatable/stable
> fields, but at the end of the day it doesn't make much sense, usually
> all the data is needed and I wonder if more big/shallow tables instead
> of one big/wider makes sense...
>
> Regularly clustering is out of question as it would render the system
> unusable for hours. There's no "0 activity hour" we could use for such
> stuff. There's always something happening, only the overall load is
> smaller at night...
>

Let me ask a question, you have this hundred million row table. OK, how
much of that table is "read/write?" Would it be posible to divide the
table into two (or more) tables where one is basically static, only
infrequent inserts and deletes, and the other is highly updated?

The "big" thing in performance is the amount of disk I/O, if you have a
smaller active table with only a single index, then you may be able to cut
your disk I/O time really down. The smaller the row size, the more rows
fit into a block. The fewer blocks the less dissk I/O. The less disk I/O
the bbetter the performance.

Also, and anyone listening correct me if I'm wrong, you NEED to vacuum
frequently because the indexes grow and vacuuming them doesnt remove
everything, sometimes a REINDEX or a drop/recreate is the only way to get
performance back. So if you wait too long between vacuums, your indexes
grow  and spread across more disk blocks than they should and thus use
more disk I/O to search and/or shared memory to cache.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Hallgren
Дата:
Сообщение: Re: Shared library conflicts
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: xlog viewer proposal