Re: When to use PARTITION BY HASH?

Поиск
Список
Период
Сортировка
От Oleksandr Shulgin
Тема Re: When to use PARTITION BY HASH?
Дата
Msg-id CACACo5T4Cq1sx7iWDxQ85RFG5dQxR1i7qS8BKR5JTaPwS-_MtA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: When to use PARTITION BY HASH?  (Jeff Janes <jeff.janes@gmail.com>)
Ответы Re: When to use PARTITION BY HASH?  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-general
On Thu, Jun 4, 2020 at 4:32 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Jun 3, 2020 at 7:55 AM Oleksandr Shulgin <oleksandr.shulgin@zalando.de> wrote:

With hash partitioning you are not expected, in general, to end up with a small number of partitions being accessed more heavily than the rest.  So your indexes will also not fit into memory.

I have the feeling that using a hash function to distribute rows simply contradicts the basic assumption of when you would think of partitioning your table at all: that is to make sure the most active part of the table and indexes is small enough to be cached in memory.

While hash partitioning doesn't appeal to me, I think this may be overly pessimistic.  It would not be all that unusual for your customers to take turns being highly active and less active.  Especially if you do occasional bulk loads all with the same customer_id for any given load, for example.

For a bulk load you'd likely want to go with an empty partition w/o indexes and build them later, after loading the tuples.  While it might not be possible with any given partitioning scheme either, using hash partitioning most certainly precludes that.
 
So while you might not have a permanently hot partition, you could have partitions which are hot in turn.  Of course you could get the same benefit (and probably better) with list or range partitioning rather than hash, but then you have to maintain those lists or ranges when you add new customers.

Why are LRU eviction from the shared buffers and OS disk cache not good enough to handle this?

This actually applies to any partitioning scheme: the hot dataset could be recognized by these caching layers.  Does it not happen in practice?

--
Alex

В списке pgsql-general по дате отправления:

Предыдущее
От: Rob Sargent
Дата:
Сообщение: Re: Multitenent architecture
Следующее
От: Ravi Krishna
Дата:
Сообщение: Re: Multitenent architecture