Обсуждение: What's a reasonable maximum number for table partitions?

Поиск
Список
Период
Сортировка

What's a reasonable maximum number for table partitions?

От
Tim Uckun
Дата:
The documentation says having too many partitions will end up being unproductive as it will cause the optimizer to examine all the tables for query planning.  So I am wondering what's a reasonable upper limit?

If I was to partition a table by day I would have 365 tables per year. Is that too many? What if I used a different criteria that would cause a thousand tables?

Does anybody have experience with huge number of partitions if so where did you start running into trouble? 

Thanks.

Re: What's a reasonable maximum number for table partitions?

От
Vick Khera
Дата:

On Thu, Feb 12, 2015 at 7:44 PM, Tim Uckun <timuckun@gmail.com> wrote:
Does anybody have experience with huge number of partitions if so where did you start running into trouble? 

I use an arbitrary 100-way split for a lot of tracking info. Just modulo 100 on the ID column. I've never had any issues with that. If you can adjust your queries to pick the right partition ahead of time, which I am able to do for many queries, the number of partitions shouldn't matter much. Only rarely do I need to query the primary table.

I don't think your plan for 365 partitions is outrageous on modern large hardware. For 1000 partitions, I don't know. It will depend on how you can optimize your queries before giving them to postgres.

Re: What's a reasonable maximum number for table partitions?

От
Bill Moran
Дата:
On Fri, 13 Feb 2015 11:12:13 -0500
Vick Khera <vivek@khera.org> wrote:

> On Thu, Feb 12, 2015 at 7:44 PM, Tim Uckun <timuckun@gmail.com> wrote:
>
> > Does anybody have experience with huge number of partitions if so where
> > did you start running into trouble?
>
> I use an arbitrary 100-way split for a lot of tracking info. Just modulo
> 100 on the ID column. I've never had any issues with that. If you can
> adjust your queries to pick the right partition ahead of time, which I am
> able to do for many queries, the number of partitions shouldn't matter
> much. Only rarely do I need to query the primary table.
>
> I don't think your plan for 365 partitions is outrageous on modern large
> hardware. For 1000 partitions, I don't know. It will depend on how you can
> optimize your queries before giving them to postgres.

Worked on a project last year where we did 256 partitions. Didn't experience
any problems, but I don't recall if 256 was an arbitrary number or if we
did any significant testing into whether it was the sweet spot. In any event,
we did a LOT of performance testing and found that 256 partitions performed
very well. I second Vick's comments on selecting the partition ahead of time,
in particular, we realized HUGE performance gains on inserts when our code
determined the partition ahead of time and inserted directly into the
partition instead of into the primary table.

--
Bill Moran


Re: What's a reasonable maximum number for table partitions?

От
Seref Arikan
Дата:
Hi Bill, 
Could you point at some resource(s) that discuss inserting directly into the partition? Would it be possible to read directly from the partition as well? 

Regards
Seref


On Fri, Feb 13, 2015 at 4:15 PM, Bill Moran <wmoran@potentialtech.com> wrote:
On Fri, 13 Feb 2015 11:12:13 -0500
Vick Khera <vivek@khera.org> wrote:

> On Thu, Feb 12, 2015 at 7:44 PM, Tim Uckun <timuckun@gmail.com> wrote:
>
> > Does anybody have experience with huge number of partitions if so where
> > did you start running into trouble?
>
> I use an arbitrary 100-way split for a lot of tracking info. Just modulo
> 100 on the ID column. I've never had any issues with that. If you can
> adjust your queries to pick the right partition ahead of time, which I am
> able to do for many queries, the number of partitions shouldn't matter
> much. Only rarely do I need to query the primary table.
>
> I don't think your plan for 365 partitions is outrageous on modern large
> hardware. For 1000 partitions, I don't know. It will depend on how you can
> optimize your queries before giving them to postgres.

Worked on a project last year where we did 256 partitions. Didn't experience
any problems, but I don't recall if 256 was an arbitrary number or if we
did any significant testing into whether it was the sweet spot. In any event,
we did a LOT of performance testing and found that 256 partitions performed
very well. I second Vick's comments on selecting the partition ahead of time,
in particular, we realized HUGE performance gains on inserts when our code
determined the partition ahead of time and inserted directly into the
partition instead of into the primary table.

--
Bill Moran


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: What's a reasonable maximum number for table partitions?

От
Bill Moran
Дата:
On Fri, 13 Feb 2015 16:29:02 +0000
Seref Arikan <serefarikan@kurumsalteknoloji.com> wrote:

> Hi Bill,
> Could you point at some resource(s) that discuss inserting directly into
> the partition?

Not off the top of my head, I would have to google just like
you would.

> Would it be possible to read directly from the partition as
> well?

Certainly. Whatever process you use to determine the name of the
partition in the partition config will work anywhere else. For example,
if the partition config is picking a table partition based on % 256
(which is what we were doing, based on an integer entity ID) you can
do that same math in whatever programming language the application is
written in (in our case it was PHP).

> On Fri, Feb 13, 2015 at 4:15 PM, Bill Moran <wmoran@potentialtech.com>
> wrote:
>
> > On Fri, 13 Feb 2015 11:12:13 -0500
> > Vick Khera <vivek@khera.org> wrote:
> >
> > > On Thu, Feb 12, 2015 at 7:44 PM, Tim Uckun <timuckun@gmail.com> wrote:
> > >
> > > > Does anybody have experience with huge number of partitions if so where
> > > > did you start running into trouble?
> > >
> > > I use an arbitrary 100-way split for a lot of tracking info. Just modulo
> > > 100 on the ID column. I've never had any issues with that. If you can
> > > adjust your queries to pick the right partition ahead of time, which I am
> > > able to do for many queries, the number of partitions shouldn't matter
> > > much. Only rarely do I need to query the primary table.
> > >
> > > I don't think your plan for 365 partitions is outrageous on modern large
> > > hardware. For 1000 partitions, I don't know. It will depend on how you
> > can
> > > optimize your queries before giving them to postgres.
> >
> > Worked on a project last year where we did 256 partitions. Didn't
> > experience
> > any problems, but I don't recall if 256 was an arbitrary number or if we
> > did any significant testing into whether it was the sweet spot. In any
> > event,
> > we did a LOT of performance testing and found that 256 partitions performed
> > very well. I second Vick's comments on selecting the partition ahead of
> > time,
> > in particular, we realized HUGE performance gains on inserts when our code
> > determined the partition ahead of time and inserted directly into the
> > partition instead of into the primary table.
> >
> > --
> > Bill Moran
> >
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general
> >


--
Bill Moran


Re: What's a reasonable maximum number for table partitions?

От
Seref Arikan
Дата:
Thanks, google it is then ;)



On Fri, Feb 13, 2015 at 4:31 PM, Bill Moran <wmoran@potentialtech.com> wrote:
On Fri, 13 Feb 2015 16:29:02 +0000
Seref Arikan <serefarikan@kurumsalteknoloji.com> wrote:

> Hi Bill,
> Could you point at some resource(s) that discuss inserting directly into
> the partition?

Not off the top of my head, I would have to google just like
you would.

> Would it be possible to read directly from the partition as
> well?

Certainly. Whatever process you use to determine the name of the
partition in the partition config will work anywhere else. For example,
if the partition config is picking a table partition based on % 256
(which is what we were doing, based on an integer entity ID) you can
do that same math in whatever programming language the application is
written in (in our case it was PHP).

> On Fri, Feb 13, 2015 at 4:15 PM, Bill Moran <wmoran@potentialtech.com>
> wrote:
>
> > On Fri, 13 Feb 2015 11:12:13 -0500
> > Vick Khera <vivek@khera.org> wrote:
> >
> > > On Thu, Feb 12, 2015 at 7:44 PM, Tim Uckun <timuckun@gmail.com> wrote:
> > >
> > > > Does anybody have experience with huge number of partitions if so where
> > > > did you start running into trouble?
> > >
> > > I use an arbitrary 100-way split for a lot of tracking info. Just modulo
> > > 100 on the ID column. I've never had any issues with that. If you can
> > > adjust your queries to pick the right partition ahead of time, which I am
> > > able to do for many queries, the number of partitions shouldn't matter
> > > much. Only rarely do I need to query the primary table.
> > >
> > > I don't think your plan for 365 partitions is outrageous on modern large
> > > hardware. For 1000 partitions, I don't know. It will depend on how you
> > can
> > > optimize your queries before giving them to postgres.
> >
> > Worked on a project last year where we did 256 partitions. Didn't
> > experience
> > any problems, but I don't recall if 256 was an arbitrary number or if we
> > did any significant testing into whether it was the sweet spot. In any
> > event,
> > we did a LOT of performance testing and found that 256 partitions performed
> > very well. I second Vick's comments on selecting the partition ahead of
> > time,
> > in particular, we realized HUGE performance gains on inserts when our code
> > determined the partition ahead of time and inserted directly into the
> > partition instead of into the primary table.
> >
> > --
> > Bill Moran
> >
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general
> >


--
Bill Moran

Re: What's a reasonable maximum number for table partitions?

От
Vick Khera
Дата:
On Fri, Feb 13, 2015 at 11:29 AM, Seref Arikan <serefarikan@kurumsalteknoloji.com> wrote:
Hi Bill, 
Could you point at some resource(s) that discuss inserting directly into the partition? Would it be possible to read directly from the partition as well? 

When preparing your SQL statement, you just specify the partition directly like this. Here's a snippet from my code in perl.

my $msg_recipients_modulo = 100; # number of partitions

sub msg_recipients_part($) {
 use integer;
 my $id = shift;
 my $part = $id % $msg_recipients_modulo;
 return 'msg_recipients_' . sprintf('%02d',$part);
}

then in when generating sql you do

$table =  msg_recipients_part($msg_id);
$sql = "SELECT FROM $table WHERE ..."

or something similar for insert/update.


Re: What's a reasonable maximum number for table partitions?

От
Seref Arikan
Дата:
Ah, I should have thought that it would be simple. Thanks a lot Vick. 

Regards
Seref


On Fri, Feb 13, 2015 at 4:54 PM, Vick Khera <vivek@khera.org> wrote:
On Fri, Feb 13, 2015 at 11:29 AM, Seref Arikan <serefarikan@kurumsalteknoloji.com> wrote:
Hi Bill, 
Could you point at some resource(s) that discuss inserting directly into the partition? Would it be possible to read directly from the partition as well? 

When preparing your SQL statement, you just specify the partition directly like this. Here's a snippet from my code in perl.

my $msg_recipients_modulo = 100; # number of partitions

sub msg_recipients_part($) {
 use integer;
 my $id = shift;
 my $part = $id % $msg_recipients_modulo;
 return 'msg_recipients_' . sprintf('%02d',$part);
}

then in when generating sql you do

$table =  msg_recipients_part($msg_id);
$sql = "SELECT FROM $table WHERE ..."

or something similar for insert/update.



Re: What's a reasonable maximum number for table partitions?

От
Tim Uckun
Дата:

If I used modulo arithmetic how would the query optimizer know which table to include and exclude? For example say I did modulo 100 based on the field client_id.  I create a base table with the trigger to insert the data into the proper child table. Each table has the constraint (client_id % 100) = X

So if I do select from base table where client_id = 10  would postgres know to only select from client_table_10? Normally I would always have a client_id in my queries so hopefully the this could be very efficient.





On Sat, Feb 14, 2015 at 5:12 AM, Vick Khera <vivek@khera.org> wrote:

On Thu, Feb 12, 2015 at 7:44 PM, Tim Uckun <timuckun@gmail.com> wrote:
Does anybody have experience with huge number of partitions if so where did you start running into trouble? 

I use an arbitrary 100-way split for a lot of tracking info. Just modulo 100 on the ID column. I've never had any issues with that. If you can adjust your queries to pick the right partition ahead of time, which I am able to do for many queries, the number of partitions shouldn't matter much. Only rarely do I need to query the primary table.

I don't think your plan for 365 partitions is outrageous on modern large hardware. For 1000 partitions, I don't know. It will depend on how you can optimize your queries before giving them to postgres.

Re: What's a reasonable maximum number for table partitions?

От
Bill Moran
Дата:
On Sat, 14 Feb 2015 11:14:10 +1300
Tim Uckun <timuckun@gmail.com> wrote:

> If I used modulo arithmetic how would the query optimizer know which table
> to include and exclude? For example say I did modulo 100 based on the field
> client_id.  I create a base table with the trigger to insert the data into
> the proper child table. Each table has the constraint (client_id % 100) = X
>
> So if I do select from base table where client_id = 10  would postgres know
> to only select from client_table_10? Normally I would always have a
> client_id in my queries so hopefully the this could be very efficient.

Unless the newest versions of PostgreSQL has improved on this, you have to
give the planner just a bit of a hint ... you're query should look like:

SELET ... WHERE client_id = 10 AND client_id % 100 = 10;

The part after the AND looks silly and redundant, but it guarantees that
the planner will consider the partition layout when it plans the query,
and in every test that I've run the result will be that the planner only
looks at the one child table.

> On Sat, Feb 14, 2015 at 5:12 AM, Vick Khera <vivek@khera.org> wrote:
>
> >
> > On Thu, Feb 12, 2015 at 7:44 PM, Tim Uckun <timuckun@gmail.com> wrote:
> >
> >> Does anybody have experience with huge number of partitions if so where
> >> did you start running into trouble?
> >>
> >
> > I use an arbitrary 100-way split for a lot of tracking info. Just modulo
> > 100 on the ID column. I've never had any issues with that. If you can
> > adjust your queries to pick the right partition ahead of time, which I am
> > able to do for many queries, the number of partitions shouldn't matter
> > much. Only rarely do I need to query the primary table.
> >
> > I don't think your plan for 365 partitions is outrageous on modern large
> > hardware. For 1000 partitions, I don't know. It will depend on how you can
> > optimize your queries before giving them to postgres.
> >


--
Bill Moran


Re: What's a reasonable maximum number for table partitions?

От
Tim Uckun
Дата:
This might get pretty crazy if I am doing queries like WHERE client_id in () or when I am trying to join some table with the client table.   Maybe I can precalculate the ids that are going to go into each partition and set the constraint as where client_id in (some_huge_list).



On Sat, Feb 14, 2015 at 11:15 AM, Bill Moran <wmoran@potentialtech.com> wrote:
On Sat, 14 Feb 2015 11:14:10 +1300
Tim Uckun <timuckun@gmail.com> wrote:

> If I used modulo arithmetic how would the query optimizer know which table
> to include and exclude? For example say I did modulo 100 based on the field
> client_id.  I create a base table with the trigger to insert the data into
> the proper child table. Each table has the constraint (client_id % 100) = X
>
> So if I do select from base table where client_id = 10  would postgres know
> to only select from client_table_10? Normally I would always have a
> client_id in my queries so hopefully the this could be very efficient.

Unless the newest versions of PostgreSQL has improved on this, you have to
give the planner just a bit of a hint ... you're query should look like:

SELET ... WHERE client_id = 10 AND client_id % 100 = 10;

The part after the AND looks silly and redundant, but it guarantees that
the planner will consider the partition layout when it plans the query,
and in every test that I've run the result will be that the planner only
looks at the one child table.

> On Sat, Feb 14, 2015 at 5:12 AM, Vick Khera <vivek@khera.org> wrote:
>
> >
> > On Thu, Feb 12, 2015 at 7:44 PM, Tim Uckun <timuckun@gmail.com> wrote:
> >
> >> Does anybody have experience with huge number of partitions if so where
> >> did you start running into trouble?
> >>
> >
> > I use an arbitrary 100-way split for a lot of tracking info. Just modulo
> > 100 on the ID column. I've never had any issues with that. If you can
> > adjust your queries to pick the right partition ahead of time, which I am
> > able to do for many queries, the number of partitions shouldn't matter
> > much. Only rarely do I need to query the primary table.
> >
> > I don't think your plan for 365 partitions is outrageous on modern large
> > hardware. For 1000 partitions, I don't know. It will depend on how you can
> > optimize your queries before giving them to postgres.
> >


--
Bill Moran

Re: What's a reasonable maximum number for table partitions?

От
Bill Moran
Дата:
On Sat, 14 Feb 2015 11:27:52 +1300
Tim Uckun <timuckun@gmail.com> wrote:

> This might get pretty crazy if I am doing queries like WHERE client_id in
> () or when I am trying to join some table with the client table.   Maybe I
> can precalculate the ids that are going to go into each partition and set
> the constraint as where client_id in (some_huge_list).

The general caveat is if client_id IN (some huge list) then the list is
probably going to include most or all of the partitions, PostgreSQL will
have to look at all of them anyway, so giving it the hint probably isn't
helpful anyway.

Obviously, there are corner cases to every generalization, but overall,
the research I did last year showed that partitioning was a big win when
frequent queries could narrow the number of partitions down to just a few,
and in those cases, constructing a WHERE clause that created the desired
effect was pretty easy.  In our case, since the case of selecting from
more than one table partition was unusual, we decided not to put the
effort into optimizing those queries.

This is why it's SO important to partition on the right column. Getting
the wrong column can have you doing all the work of partitioning with
none or few of the benefits. Think of it this way, if I have an orders
table, I can partition on order_id or customer_id or product_id. There
are probably other options, but let's just consider those for this
exercise.

If I partition on order_id, then the case of selecting a single order
when the order_id is known will be very fast, and queries that select
based on product or customer will benefit little if any at all.

If I partition on customer_id, then customers will be able to find all
of their orders very efficiently, but queries that look for all orders
of a specific product or searches for a specific order_id won't benefit
much, if any.

If I partition on product_id, then queries to find out who's buying a
particular product will be very efficient, but queries based on customer
or order_id won't benefit much, if any.

Which one you choose is dependent on your expected workload, and your
performance requirements. You may decide that customers need speed the
most, and that crunching data on products is something that it's OK
to be a little slower. You may find that the most frequent query is to
search by order_id, so overall your application will see the most
benefit by super-optimizing that request. There's no one answer that's
right for every situation ... at least not in my experience.

> On Sat, Feb 14, 2015 at 11:15 AM, Bill Moran <wmoran@potentialtech.com>
> wrote:
>
> > On Sat, 14 Feb 2015 11:14:10 +1300
> > Tim Uckun <timuckun@gmail.com> wrote:
> >
> > > If I used modulo arithmetic how would the query optimizer know which
> > table
> > > to include and exclude? For example say I did modulo 100 based on the
> > field
> > > client_id.  I create a base table with the trigger to insert the data
> > into
> > > the proper child table. Each table has the constraint (client_id % 100)
> > = X
> > >
> > > So if I do select from base table where client_id = 10  would postgres
> > know
> > > to only select from client_table_10? Normally I would always have a
> > > client_id in my queries so hopefully the this could be very efficient.
> >
> > Unless the newest versions of PostgreSQL has improved on this, you have to
> > give the planner just a bit of a hint ... you're query should look like:
> >
> > SELET ... WHERE client_id = 10 AND client_id % 100 = 10;
> >
> > The part after the AND looks silly and redundant, but it guarantees that
> > the planner will consider the partition layout when it plans the query,
> > and in every test that I've run the result will be that the planner only
> > looks at the one child table.
> >
> > > On Sat, Feb 14, 2015 at 5:12 AM, Vick Khera <vivek@khera.org> wrote:
> > >
> > > >
> > > > On Thu, Feb 12, 2015 at 7:44 PM, Tim Uckun <timuckun@gmail.com> wrote:
> > > >
> > > >> Does anybody have experience with huge number of partitions if so
> > where
> > > >> did you start running into trouble?
> > > >>
> > > >
> > > > I use an arbitrary 100-way split for a lot of tracking info. Just
> > modulo
> > > > 100 on the ID column. I've never had any issues with that. If you can
> > > > adjust your queries to pick the right partition ahead of time, which I
> > am
> > > > able to do for many queries, the number of partitions shouldn't matter
> > > > much. Only rarely do I need to query the primary table.
> > > >
> > > > I don't think your plan for 365 partitions is outrageous on modern
> > large
> > > > hardware. For 1000 partitions, I don't know. It will depend on how you
> > can
> > > > optimize your queries before giving them to postgres.
> > > >
> >
> >
> > --
> > Bill Moran
> >


--
Bill Moran