Обсуждение: Partitioned tables as a poor mans columnar index?

Поиск
Список
Период
Сортировка

Partitioned tables as a poor mans columnar index?

От
Peter Hunsberger
Дата:
I just realized that my replies to my previous question on sparse
arrays went off list due to the way this list server  is set up
(sigh).  It has occurred to me that for my problem, one possible
solution is columnar indexes and that, in a way, partitioned tables in
Postgres might give me somewhat the same capability.

The basic problem I have is that I have some tables that are
potentially very long (100,000's to millions of rows) and very skinny,
essentially just a FK to a parent table and about 12 bits, maybe less,
of actual data.  Now if I use a traditional table an int for FK is as
large or even larger than the data itself which is why I had been
looking at a sparse array (in which case I don't even need 12 bits for
the data, since part of the information is positional in nature).
However, building the routines to manipulate the arrays could be
painful and it's been suggested that their performance will be
non-optimal.

One alternative might be to use partitioned tables. If I partition the
tables in such a way that the table name can be associated directly to
the FK then I no longer need to actually store the FK in the table,
and I end up with maybe a total of 12 bits of data in each row.
Normally, I'd shy away from such a scheme since it strikes me as
potentially problematic:

1) you've got to join through DB metadata back to the actual data in question;

2) some process has to have create table auths if any of this is to be automated

My bias against 1) might just be because I come from an Oracle
background and it seems that in Postgres this is not as cumbersome or
as frowned on as it might be elsewhere?  The second issue seems
problematic no matter how I look at it.  However, I might be willing
to live with it for this particular problem, particulary if I can talk
myself into believing that I'm building a proxy for columnar indexes
:-)....

So I guess two sets of questions:

1) In general how comfortable are members of the Postgres community in
using table names for partitioned tables where the table name itself
conveys some indirect join relationship?  It seems there is no extra
performance penalty in joining back to the system tables to do a look
up with Postgres so this is perhaps more of a best practices question
(or even a religious issue for relational purists) than anything
else...

2) If you're comfortable with the idea, how far would you go in
embracing it?  In particular, would you add code to Postgres to hide
the fact that you are joining via table name? Would you go as far as
to do it at the system level or would you stick to just wrapping it in
some functions (in which case the create table privilege is still
needed)?

--
Peter Hunsberger

Re: Partitioned tables as a poor mans columnar index?

От
marcin mank
Дата:
On Fri, Oct 16, 2009 at 9:19 PM, Peter Hunsberger
<peter.hunsberger@gmail.com> wrote:

> The basic problem I have is that I have some tables that are
> potentially very long (100,000's to millions of rows) and very skinny,

> and I end up with maybe a total of 12 bits of data in each row.

Are You aware that there are some 20-ish bytes of metadata for each
row? saving 4 bytes buys You nothing. Give it up.

Also, these are actually pretty small tables (i.e. they fit in memory
of any non-toy server).

Greetings
Marcin

Re: Partitioned tables as a poor mans columnar index?

От
Peter Hunsberger
Дата:
On Fri, Oct 16, 2009 at 3:31 PM, marcin mank <marcin.mank@gmail.com> wrote:
> On Fri, Oct 16, 2009 at 9:19 PM, Peter Hunsberger
> <peter.hunsberger@gmail.com> wrote:
>
>> The basic problem I have is that I have some tables that are
>> potentially very long (100,000's to millions of rows) and very skinny,
>
>> and I end up with maybe a total of 12 bits of data in each row.
>
> Are You aware that there are some 20-ish bytes of metadata for each
> row? saving 4 bytes buys You nothing. Give it up.

No, the metadata is a whole 'nother problem.  I'm just talking about
fk relationships here.  This isn't an isolated issue within this
particular domain.  If you where to use a conventional table design,
then once the rest of the associated tables get built along with their
associated indexes you'd be looking at in the order of a terabyte for
this half of the DB...

--
Peter Hunsberger