Обсуждение: inheritance vs performance

Поиск
Список
Период
Сортировка

inheritance vs performance

От
Pascal Polleunus
Дата:
Hi,

I'm wondering if there could be problems related to inheritance in the
following scenario (with PostgreSQL 7.4.1)...

1 A-table, abstract.

Max 10 B-tables that inherit from A, with sometimes some more columns
than A. These are also abstracts.

"n" C-tables that inherit from 1 B-table, without more columns.
Each C-table could contain quite a lot of rows (500K, 1M, ...).

Could there be problems, or performance issues, related to inheritance
if there is "too much" C-tables (in combination with the number of
rows)? And what would be that "too much"?

Remarks:
A-table could be removed as it's not that important/relevant.
The purpose of this structure is not to be able to easily select through
the parent in all children tables, though it would be appreciated.
The purpose of this is just to be able to easily create C-tables, and
maybe also to easily handle structure changes of A or B-tables.
The master words here are "performance" and "reliability".


Thanks,
Pascal


Re: inheritance vs performance

От
Richard Huxton
Дата:
On Friday 13 February 2004 09:01, Pascal Polleunus wrote:
> Hi,
>
> I'm wondering if there could be problems related to inheritance in the
> following scenario (with PostgreSQL 7.4.1)...
>
> 1 A-table, abstract.
>
> Max 10 B-tables that inherit from A, with sometimes some more columns
> than A. These are also abstracts.
>
> "n" C-tables that inherit from 1 B-table, without more columns.
> Each C-table could contain quite a lot of rows (500K, 1M, ...).

What is the point of having multiple C tables with the same structure?

> Could there be problems, or performance issues, related to inheritance
> if there is "too much" C-tables (in combination with the number of
> rows)? And what would be that "too much"?

Well, thousands of tables is probably "too much", but a hundred tables or two
in a database shouldn't cause problems. Don't see why you'd want them though.

> Remarks:
> A-table could be removed as it's not that important/relevant.
> The purpose of this structure is not to be able to easily select through
> the parent in all children tables, though it would be appreciated.
> The purpose of this is just to be able to easily create C-tables, and
> maybe also to easily handle structure changes of A or B-tables.

I don't see how inheritance makes it easier to create C tables.

> The master words here are "performance" and "reliability".

Don't see how either of these are affected by what you're talking about doing
here. Can you explain more closely what it is you're trying to do?

--
  Richard Huxton
  Archonet Ltd

Re: inheritance vs performance

От
Karsten Hilbert
Дата:
> Well, thousands of tables is probably "too much", but a hundred tables or two
> in a database shouldn't cause problems. Don't see why you'd want them though.
If that's your general advice (a hundred or more tables in a
database not making sense) I should like to learn why. Is that
a sure sign of overdesign ? Excess normalization ? Bad
separation of duty ? I am asking since our schema is at
about 200 relations and growing.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: inheritance vs performance

От
Richard Huxton
Дата:
On Friday 13 February 2004 10:59, Karsten Hilbert wrote:
> > Well, thousands of tables is probably "too much", but a hundred tables or
> > two in a database shouldn't cause problems. Don't see why you'd want them
> > though.
>
> If that's your general advice (a hundred or more tables in a
> database not making sense) I should like to learn why. Is that
> a sure sign of overdesign ? Excess normalization ? Bad
> separation of duty ? I am asking since our schema is at
> about 200 relations and growing.

The original mail mentioned many "C tables" all with the same columns.
Obviously you need as many different tables as required to model your data,
but many tables all with identical schema?

--
  Richard Huxton
  Archonet Ltd

Re: inheritance vs performance

От
Csaba Nagy
Дата:
Hi Pascal,

As other answers to this topic pointed out, it's kind of pointless to
use more tables with the same structure. In the long run it will become
a PITA to manage them, I'm talking from experience here. In our company
we adopted a solution with dynamically created tables (with dynamic
schema), thinking it would be more performant (which actually might be
true in our case). The alternative would have been some kind of generic
"param_name", "param_value" table, holding all the data from all these
dynamic tables. While performance might have been gained using the
dynamic tables, a lot of flexibility was lost, and a maintainance
nightmare was created (just think about migrating all those tables
between versions of the system). Not to mention that you can't easily
create queries which have as parameter a table name... (actually you
can, but I think it's not really recommended). In our case however the
schema is different for all those tables, so it makes sense in a way,
but from maintainance POV I wouldn't chose again dynamic tables, they
are more trouble than worth.

Anyway, I think you would be better off by adding an additional column
to your B tables which holds the "table name" the C tables would have
had. In fact that would be an ID of some sort for efficiency reasons
(and I bet you already have those IDs there ;-). Then you can select the
content of a C table based on those IDs, and have a lot more
flexibility. Performance wise I think the one table solution is actually
better, but that's just a guess from my part.

And also reconsider using separate B tables if that means they are
dynamically created... or be prepared for some hard times later with
maintainance ;-)

Just my 2c,
Csaba.


On Fri, 2004-02-13 at 10:01, Pascal Polleunus wrote:
> Hi,
>
> I'm wondering if there could be problems related to inheritance in the
> following scenario (with PostgreSQL 7.4.1)...
>
> 1 A-table, abstract.
>
> Max 10 B-tables that inherit from A, with sometimes some more columns
> than A. These are also abstracts.
>
> "n" C-tables that inherit from 1 B-table, without more columns.
> Each C-table could contain quite a lot of rows (500K, 1M, ...).
>
> Could there be problems, or performance issues, related to inheritance
> if there is "too much" C-tables (in combination with the number of
> rows)? And what would be that "too much"?
>
> Remarks:
> A-table could be removed as it's not that important/relevant.
> The purpose of this structure is not to be able to easily select through
> the parent in all children tables, though it would be appreciated.
> The purpose of this is just to be able to easily create C-tables, and
> maybe also to easily handle structure changes of A or B-tables.
> The master words here are "performance" and "reliability".
>
>
> Thanks,
> Pascal
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html


Re: inheritance vs performance

От
Steve Atkins
Дата:
On Fri, Feb 13, 2004 at 01:51:24PM +0000, Richard Huxton wrote:
> On Friday 13 February 2004 10:59, Karsten Hilbert wrote:
> > > Well, thousands of tables is probably "too much", but a hundred tables or
> > > two in a database shouldn't cause problems. Don't see why you'd want them
> > > though.
> >
> > If that's your general advice (a hundred or more tables in a
> > database not making sense) I should like to learn why. Is that
> > a sure sign of overdesign ? Excess normalization ? Bad
> > separation of duty ? I am asking since our schema is at
> > about 200 relations and growing.
>
> The original mail mentioned many "C tables" all with the same columns.
> Obviously you need as many different tables as required to model your data,
> but many tables all with identical schema?

Poor mans tablespaces. It's a trick I've had to resort to a few times
when on a write-heavy steady-state system there's just not enough I/O
bandwidth to delete and vacuum old data, or not enough to maintain an
index. Segregate the incoming data by, say, day and put one days worth
of data into each 'C' table. At the end of each day, index the days table.
If you're maintaining six months of data, drop the 180th table.

If most of the queries on the data are constrained by date it's
reasonably efficient to search too. And if you have rare queries which
aren't constrained by date you can just apply them to the parent table
- not terribly efficient, but quite workable.

Hideous hack, but it works.

Cheers,
  Steve


Re: inheritance vs performance

От
Pascal Polleunus
Дата:
Hi,

Here's a deeper explanation about what I'm trying to achieve...

> I'm wondering if there could be problems related to inheritance in the
> following scenario (with PostgreSQL 7.4.1)...

In fact, the main concern is not really about inheritance but more about
how to handle large amounts of data.


> 1 A-table, abstract.
A-table contains the common columns for each type of customer.

> Max 10 B-tables that inherit from A, with sometimes some more columns
> than A. These are also abstracts.
A B-table is created for each type of customer, some of them need more
columns (currently only 3 types, but maybe some more that's why I said
max 10. These 3 types have a different structure).

> "n" C-tables that inherit from 1 B-table, without more columns.
> Each C-table could contain quite a lot of rows (500K, 1M, ...).
A C-table is created for each customer, that inherits from the B-table
of their "customer type".

I hope their will be some hundreds of customers/C-tables.

My concern is that each C-table would contain around 500K records per
year... and I hope more ;-)
(1M or more was probably targeting too high)


> Could there be problems, or performance issues, related to inheritance
> if there is "too much" C-tables (in combination with the number of
> rows)? And what would be that "too much"?

Let's take for example 100 customers with 500K records per year.
Grouping them together will lead to have, at the end of the year, 50M
records in a single table. Isn't that too much?

A solution could be to broken out the data per month and to keep only
the current & previous months... 50M / 6 = 8.3M of records at the end of
each month.

Or to do the same weekly, so 50M / 26 = 1.9M records at the end of each
week, for the last 2 weeks.

With these data, I need to be able to:
- generate daily, monthly and yearly reports per customer.
- providing a list of records per customer for a given period (for a
given day could be enough).


Inheritance is maybe not really what is needed here.
To handle table creation, I could store the structures somewhere instead
of using inheritance. And to handle hypothetical structure changes, I
could create bulk procedures.


> Remarks:
> A-table could be removed as it's not that important/relevant.
> The purpose of this structure is not to be able to easily select through
> the parent in all children tables, though it would be appreciated.
I don't really need to do cross-customer queries, though that would be
appreciable for generating global reports.
Without inheritance, how to handle that will depend on the reports that
need to be generated.

> The purpose of this is just to be able to easily create C-tables, and
> maybe also to easily handle structure changes of A or B-tables.
Sorry, that was stupid :-/
What I wanted to say is that the purpose was mainly to distribute the
amount of data between several tables. And, secondly, to easily handle
structure changes.


Thanks again for your advice,
Pascal