Обсуждение: Best partition type for billions of addresses

Поиск
Список
Период
Сортировка

Best partition type for billions of addresses

От
Arya F
Дата:
I need to store about 600 million rows of property addresses across
multiple counties. I need to have partitioning setup on the table as
there will be updates and inserts performed to the table frequently
and I want the queries to have good performance.

From what I understand hash partitioning would not be the right
approach in this case, since for each query PostgreSQL has to check
the indexes of all partitions?

Would list partitioning be suitable? if I want PostgreSQL to know
which partition the row is it can directly load the relevant index
without having to check other partitions. Should I be including the
partition key in the where clause?

I'd like to hear some recommendations on the best way to approach
this. I'm using PostgreSQL 12



Re: Best partition type for billions of addresses

От
Justin Pryzby
Дата:
On Sat, May 02, 2020 at 09:20:06AM -0400, Arya F wrote:
> I need to store about 600 million rows of property addresses across
> multiple counties. I need to have partitioning setup on the table as
> there will be updates and inserts performed to the table frequently
> and I want the queries to have good performance.

I dug up the last messages about this:
https://www.postgresql.org/message-id/flat/CAFoK1aztep-079Fxmaos6umR8X6m3x1K_aZLGtQGpYxfENh9%3DA%40mail.gmail.com
https://www.postgresql.org/message-id/flat/CAFoK1azkv1Z%3DRr7ZWrJjk0RQSoF6ah%2BQMpLSSyBs1KsEiQ3%3Dvw%40mail.gmail.com
https://www.postgresql.org/message-id/CAFoK1axr_T6nB8ZAq8g2QBcqv_pE%3DdsZsxyjatz8Q67k1VKAnw%40mail.gmail.com


> From what I understand hash partitioning would not be the right
> approach in this case, since for each query PostgreSQL has to check
> the indexes of all partitions?

Indexes are separate from partitioning.  Typically, the partitioned columns are
indexed, but it's not required.

If the partition key isn't used in your typical query, then partitioning didn't
help you, and you chose the wrong partition strategy/key.

> Would list partitioning be suitable? if I want PostgreSQL to know
> which partition the row is it can directly load the relevant index
> without having to check other partitions. Should I be including the
> partition key in the where clause?

It sounds like you're thinking about this backwards.

What are your typical queries ?  That should determines the partition strategy
and key, not the other way around.  You should maybe think about whether there
are views/functions/joins of the partitioned column.

For example, at telsasoft, our report queries *always* say "tbl.start_time >=
t1 AND tbl.start_time < t2", so I partitioned our tables BY RANGE(start_time),
so a typical report hits only a single table.  And, start_time has an index on
it, so a typical query over 1-2 days will only hit a fraction of that table.

-- 
Justin



Re: Best partition type for billions of addresses

От
Stephen Frost
Дата:
Greetings,

* Arya F (arya6000@gmail.com) wrote:
> I need to store about 600 million rows of property addresses across
> multiple counties. I need to have partitioning setup on the table as
> there will be updates and inserts performed to the table frequently
> and I want the queries to have good performance.

That's not what partitioning is for, and 600m rows isn't all *that*
many.

> >From what I understand hash partitioning would not be the right
> approach in this case, since for each query PostgreSQL has to check
> the indexes of all partitions?
>
> Would list partitioning be suitable? if I want PostgreSQL to know
> which partition the row is it can directly load the relevant index
> without having to check other partitions. Should I be including the
> partition key in the where clause?
>
> I'd like to hear some recommendations on the best way to approach
> this. I'm using PostgreSQL 12

In this case, it sounds like "don't" is probably the best option.

Partitioning is good for data management, particularly when you have
data that "ages out" or should be removed/dropped at some point,
provided your queries use the partition key.  Partitioning doesn't speed
up routine inserts and updates that are using a proper index and only
updating a small set of rows at a time.

Thanks,

Stephen

Вложения

Re: Best partition type for billions of addresses

От
Arya F
Дата:
> * Arya F (arya6000@gmail.com) wrote:
> > I need to store about 600 million rows of property addresses across
> > multiple counties. I need to have partitioning setup on the table as
> > there will be updates and inserts performed to the table frequently
> > and I want the queries to have good performance.
>
> That's not what partitioning is for, and 600m rows isn't all *that*
> many.
>

But I have noticed that my updates and inserts have slowed down
dramatically when I started going over about 20 million rows and the
reason was because every time it has to update the index. When I
removed the index, my insert performance stayed good no matter the
size of the table.

So I should be able to achieve good performance with just one
partition? Maybe I just need to get hardware with more memory?



Re: Best partition type for billions of addresses

От
Stephen Frost
Дата:
Greetings,

* Arya F (arya6000@gmail.com) wrote:
> > * Arya F (arya6000@gmail.com) wrote:
> > > I need to store about 600 million rows of property addresses across
> > > multiple counties. I need to have partitioning setup on the table as
> > > there will be updates and inserts performed to the table frequently
> > > and I want the queries to have good performance.
> >
> > That's not what partitioning is for, and 600m rows isn't all *that*
> > many.
>
> But I have noticed that my updates and inserts have slowed down
> dramatically when I started going over about 20 million rows and the
> reason was because every time it has to update the index. When I
> removed the index, my insert performance stayed good no matter the
> size of the table.

Sure it does.

> So I should be able to achieve good performance with just one
> partition? Maybe I just need to get hardware with more memory?

Instead of jumping to partitioning, I'd suggest you post your actual
table structures, queries, and explain results here and ask for help.

https://wiki.postgresql.org/wiki/Slow_Query_Questions

Thanks,

Stephen

Вложения