Обсуждение: Very large table: Partition it or not?

Поиск
Список
Период
Сортировка

Very large table: Partition it or not?

От
Jack Orenstein
Дата:
I have a table in an analytics database (Postgres 12.3), that gathers data continuously. It is at 5B rows, with an average row size of 250 bytes. The table has five indexes, on bigint and varchar columns, all with keys of one or two columns.

There are currently frequent updates and deletions, but the net change in the number of rows is continuously positive. We are rearchitecting the application to avoid the updates and deletes. I.e., the table will soon be append-only, (so vacuuming will be needed only to avoid transaction id wraparound).

I know that the maximum table size is 32TB, which allows for 128B rows. Based on this calculation, and the expected growth rate (2B/year currently), we should be good for quite a while.

What are the pros and cons of partitioning the table? Without partitioning, are we liable to run into trouble as this table keeps growing? I do realize that some query times will grow with table size, and that partitioning, combined with parallel query execution can address that problem. I'm more wondering about problems in maintaining tables and indexes once we have 10B, 20B, ... rows.

Jack Orenstein

Re: Very large table: Partition it or not?

От
Ravi Krishna
Дата:
>
>
> I have a table in an analytics database (Postgres 12.3), that gathers data continuously. It is at 5B rows, with an
averagerow size of 250 bytes. The table has five indexes, on bigint and varchar columns, all with keys of one or two
columns.
>
> There are currently frequent updates and deletions, but the net change in the number of rows is continuously
positive.We are rearchitecting the application to avoid the updates and deletes. I.e., the table will soon be
append-only,(so vacuuming will be needed only to avoid transaction id wraparound). 
>
> I know that the maximum table size is 32TB, which allows for 128B rows. Based on this calculation, and the expected
growthrate (2B/year currently), we should be good for quite a while. 
>
> What are the pros and cons of partitioning the table? Without partitioning, are we liable to run into trouble as this
tablekeeps growing? I do realize that some query times will grow with table size, and that partitioning, combined with
parallelquery execution can address that problem. I'm more wondering about problems in maintaining tables and indexes
oncewe have 10B, 20B, ... rows. 
>
>

Does the table has a primary key/unique constraint which can not be part of the partitioning key.  Then you have a
seriousissue in converting this 
table into a partitioned one. Unlike Oracle or Db2, PG still does not have a concept of global index on a partitioned
table.




Re: Very large table: Partition it or not?

От
Laurenz Albe
Дата:
On Wed, 2020-12-16 at 14:44 -0500, Jack Orenstein wrote:
> What are the pros and cons of partitioning the table?

Pro:

- it bebomes easier to get rid of old data

- autovacuum is more fun with several smaller tables

- a few select queries might become faster

Con:

- administrative overhead (creating and dropping partitions)

- most queries become slightly slower

>  Without partitioning, are we liable to run into trouble as this table keeps growing?

That is hard to answer.  If it works fine with 5*10^9 rows, it might also
work fine with more data.

The deciding factor might be getting rid of old data.  That can be quite
painful with a single large table, but it might be trivial with partitioning.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com