Обсуждение: Partitioning vs. View of a UNION ALL

Поиск
Список
Период
Сортировка

Partitioning vs. View of a UNION ALL

От
Ron Johnson
Дата:
Hi,

I've gotten preliminary approval to buy a server and load a *lot* of
data into it.  One table will eventually have 4.5Bn 330 bytes rows,
the other 9Bn 300 byte rows.  Other will "only" have a billion rows.
 They are easily partitioned by yyyymm, which we call FISCAL_PERIOD.
 (In fact, the app creates the integer FISCAL_PERIOD by extracting
year and month from transaction date: YEAR*100+MONTH.)

Even though using a view means that it would have to be recreated
each period as the oldest table is dropped, it seems that it would
still be easier to work with, since you wouldn't have to worry about
preventing a badly behaving user from inserting into the DDL
partition's parent table and create 588 CHECK constraints (12 per
year x 7 years x 7 base tables).

The most important issue, though, is query speed.  Assuming
excellent index support for query WHERE clauses, regardless of
whether partitioning or a "viewed UNION ALL", which will the query
optimizer and constraint_exclusion be more friendly towards?

Thanks,
Ron
--
Ron Johnson, Jr.
Jefferson LA  USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.

Re: Partitioning vs. View of a UNION ALL

От
"Harald Armin Massa"
Дата:
Ron,

Even though using a view means that it would have to be recreated
each period as the oldest table is dropped,

please keep in mind:
views are not really "created" ... also the command is named "create view"

VIEWS, at least in PostgreSQL (and Oracle) are nothing else then "macros" for Queries - the views get substituted within the query with their creation rule.

That is: the "recreation" of a VIEW is a nearly instant process (it just is frustrating to connections using this view)

it seems that it would
still be easier to work with, since you wouldn't have to worry about
preventing a badly behaving user from inserting into the DDL
partition's parent table and create 588 CHECK constraints (12 per
year x 7 years x 7 base tables).

That is true only if you trust your users not to insert into the wrong table of your 12*7*7 tables.
If you have the appropriate check constraints on your parent table, the pushing data into the inherited tables should happen automagically (at least on my databases it does :) )

So... to make sure nobody inserts rubbish you will have to have those 588 check constraints one way or another. 
a) to make your partitioning work
b) to ensure nobody inserts data for 2000 into the table for 1900

The most important issue, though, is query speed.  Assuming
excellent index support for query WHERE clauses, regardless of
whether partitioning or a "viewed UNION ALL", which will the query
optimizer and constraint_exclusion be more friendly towards?

in an optimal world, should'nt those two options be exactly the same?

a) the partition solution:
  query planner decides which of your 12*7*7 tables to access and only scans those. To my undestanding, " constraint_exclusion" only applies to this solution.

b) the "union all" - or "partitioning by hand":
at the beginning of each partial query there will be an index scan on your date-column, learning that no data comes from that partial query and planner skipping on to the next.

Harald

--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
-
Python: the only language with more web frameworks than keywords.

Re: Partitioning vs. View of a UNION ALL

От
"Merlin Moncure"
Дата:
On 10/13/06, Ron Johnson <ron.l.johnson@cox.net> wrote:
> Hi,
>
> I've gotten preliminary approval to buy a server and load a *lot* of
> data into it.  One table will eventually have 4.5Bn 330 bytes rows,
> the other 9Bn 300 byte rows.  Other will "only" have a billion rows.
>  They are easily partitioned by yyyymm, which we call FISCAL_PERIOD.
>  (In fact, the app creates the integer FISCAL_PERIOD by extracting
> year and month from transaction date: YEAR*100+MONTH.)

wow. if I do my math correctly, this will put you in the multi
terabyte range.  i'm sure the postgresql community (talk to Josh
Berkus) would love to hear about your experiences in this project.

anyways, regarding built in/manual partitioning, I have to admit I am
not a big fan of the built in table partitioning.  It was kind of
fiddly to set up, and constraint exclusion only worked on select
queries, which was the real deal killer for me.  however, the latter
issue this has been addressed in 8.2
(http://developer.postgresql.org/pgdocs/postgres/release-8-2.html).
With built in, foreign key behavior is a little bit different and
perhaps problematic, which you you should consider if you plan to
enforce constraints via RI. however, you can always drop down to a
trigger calling a dynamic pl/sql function which is almost as good
(sometimes better) to enforce constraints.  another issue is that
sometimes the plans generated on non trivial queries involving joins
to partitioned tables were not what i would have expected, resorting
to seq scans or not using constraint_exclusion conditions in certain
cases  obviously, this is a moving target and may improve in later
versions of postgresql, so test your sql carefully.

one thing that is interesting is that if your data divisions is very
strictly regimented so that most of your operations work on exactly
one schema, you can put your partions in separate schemas.  why do
this? well your table names are uniform for starters.  if you are into
pl/pgsql functions you can then keep one function/operation which
operates over all your partitions without excessive use of dynamic sql
(which is not terrible, but I'd prefer not to use it if possible.).
so long as you have a pretty good idea of when function plans are
generated, you can enter into your 'namespace' by manipulating
search_path and go to work.

with a database of your size you really have to work out some test
data and try both approaches. what works is going to be a combination
of pracical factors and personal style...and great feedback for the
community should you be persuaded to give regular updates on your
progress.

as with all partitioning strategies, keep an eye out for worst case behavior.

merlin

Re: Partitioning vs. View of a UNION ALL

От
"Jim C. Nasby"
Дата:
The only case I can think of where view partitioning makes more sense is
if it's list partitioning where you can also drop a field from your
tables. IE: if you have 10 projects, create 10 project_xx tables where
xx is the ID of the project, UNION ALL them together in a view, and
create rules on that view to handle DML.

Note I haven't actually tested to see if this is better than inherited
tables...

On Fri, Oct 13, 2006 at 05:00:23AM -0500, Ron Johnson wrote:
> Hi,
>
> I've gotten preliminary approval to buy a server and load a *lot* of
> data into it.  One table will eventually have 4.5Bn 330 bytes rows,
> the other 9Bn 300 byte rows.  Other will "only" have a billion rows.
>  They are easily partitioned by yyyymm, which we call FISCAL_PERIOD.
>  (In fact, the app creates the integer FISCAL_PERIOD by extracting
> year and month from transaction date: YEAR*100+MONTH.)
>
> Even though using a view means that it would have to be recreated
> each period as the oldest table is dropped, it seems that it would
> still be easier to work with, since you wouldn't have to worry about
> preventing a badly behaving user from inserting into the DDL
> partition's parent table and create 588 CHECK constraints (12 per
> year x 7 years x 7 base tables).
>
> The most important issue, though, is query speed.  Assuming
> excellent index support for query WHERE clauses, regardless of
> whether partitioning or a "viewed UNION ALL", which will the query
> optimizer and constraint_exclusion be more friendly towards?
>
> Thanks,
> Ron
> --
> Ron Johnson, Jr.
> Jefferson LA  USA
>
> Is "common sense" really valid?
> For example, it is "common sense" to white-power racists that
> whites are superior to blacks, and that those with brown skins
> are mud people.
> However, that "common sense" is obviously wrong.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>

--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)