Обсуждение: index organized tables use case

Поиск
Список
Период
Сортировка

index organized tables use case

От
Enrico Sirola
Дата:
Hello,
I'm thinking about migrating from another DBMS to postgresql. I have an
almost working proof of concept, but still have some doubts about the
following use case.

I have a table like the following

CREATE TABLE test
(
  code character varying(32) NOT NULL,
  tag integer NOT NULL,
  value double precision,
  CONSTRAINT test_pkey PRIMARY KEY (code, tag)
);

It represents a sequence (with holes) of values associated with a code.
The application code usually performs selection queries like

select tag, value from test where code='XXX';

also, deletions are like

delete from test where code='XXX';

and insertions follow the same pattern (all the data for a code is
inserted using a loop in a single transaction). That's more or less all.

so this type of workload is greatly enhanced by an index-organized table
(oracle) or a clustered index (SQL Server/Sybase).

From what I understood this kind of table is presently not supported by
postgresql (is it?) so, what do you advice?

I would think about something like

CREATE TABLE testnew
(
  code character varying(32) NOT NULL,
  first_tag integer, /* the tag value associated with the first value */
  "values" double precision[], /* the datum, or NaN if not valid */
  valid_values bit(1)[], /* true if a datum is present */
  CONSTRAINT testnew_pkey PRIMARY KEY (code)
);

but this would require an application refactoring. Any idea?

TIA,
e.


Re: index organized tables use case

От
"Isak Hansen"
Дата:
On 12/12/07, Enrico Sirola <enrico.sirola@gmail.com> wrote:
> Hello Isak,
>
> Isak Hansen ha scritto:
>
> > Have a look at the cluster operation;
> > <http://www.postgresql.org/docs/8.3/static/sql-cluster.html>.
> >
> > AFAIK it does lock & duplicate the whole table during reordering,
> > which may or may not be an issue for you.
>

Sorry Enrico and list, the respond-to setting on this list gets me every time..


> thanks for the reply; I was aware about this option, anyway I think
> probably it won't be practical: the table is very big and this exclusive
> lock would probably be a pain. I think probably I will start with the
> present relation and then redesign the application at a second stage

This is a long shot, but if the table is huge you could also consider
partitioning. That really depends on the contents of 'code', though..

See <http://www.postgresql.org/docs/8.3/static/ddl-partitioning.html>.


Isak

> Thanks,
> e.
>

Re: index organized tables use case

От
Richard Huxton
Дата:
Enrico Sirola wrote:
> The application code usually performs selection queries like
>
> select tag, value from test where code='XXX';
>
> also, deletions are like
>
> delete from test where code='XXX';
>
> and insertions follow the same pattern (all the data for a code is
> inserted using a loop in a single transaction). That's more or less all.
>
> so this type of workload is greatly enhanced by an index-organized table
> (oracle) or a clustered index (SQL Server/Sybase).

Hmm - I'm not sure it does benefit that much. I mean, if you're going to
be accessing XXXA, then XXXB, XXXC etc. in order then it clearly helps
to have the table with the same order as your primary key. Otherwise,
I'd be doubtful you'd see that much benefit.

--
   Richard Huxton
   Archonet Ltd

Re: index organized tables use case

От
Thomas Kellerer
Дата:
Richard Huxton, 12.12.2007 16:12:
> Hmm - I'm not sure it does benefit that much. I mean, if you're going to
> be accessing XXXA, then XXXB, XXXC etc. in order then it clearly helps
> to have the table with the same order as your primary key. Otherwise,
> I'd be doubtful you'd see that much benefit.
>

At least for Oracle it's not mainly the order that improves the
performance, but the fact that all the data is kept in the index, so
Oracle does not need to go back to the table data after looking up the
index entry. There is no "table data" for an index-organized table in
Oracle, so only a single lookup is needed.

Thomas

Re: index organized tables use case

От
Robert Treat
Дата:
On Wednesday 12 December 2007 10:53, Thomas Kellerer wrote:
> Richard Huxton, 12.12.2007 16:12:
> > Hmm - I'm not sure it does benefit that much. I mean, if you're going to
> > be accessing XXXA, then XXXB, XXXC etc. in order then it clearly helps
> > to have the table with the same order as your primary key. Otherwise,
> > I'd be doubtful you'd see that much benefit.
>
> At least for Oracle it's not mainly the order that improves the
> performance, but the fact that all the data is kept in the index, so
> Oracle does not need to go back to the table data after looking up the
> index entry. There is no "table data" for an index-organized table in
> Oracle, so only a single lookup is needed.
>

Yeah, thats a nice feature, and one thats not directly available in Postgres.
The thing to concentrate on here is the techniques that are available in
Postgres that might help keep the same schema performant. I think looking at
partitioning or at partial indexing (making a new index on the code clause)
could help keep performance with minimal impact to the schema.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

Re: index organized tables use case

От
"Scott Marlowe"
Дата:
On Dec 12, 2007 9:16 PM, Robert Treat <xzilla@users.sourceforge.net> wrote:
> On Wednesday 12 December 2007 10:53, Thomas Kellerer wrote:

> > At least for Oracle it's not mainly the order that improves the
> > performance, but the fact that all the data is kept in the index, so
> > Oracle does not need to go back to the table data after looking up the
> > index entry. There is no "table data" for an index-organized table in
> > Oracle, so only a single lookup is needed.
> >
>
> Yeah, thats a nice feature, and one thats not directly available in Postgres.
> The thing to concentrate on here is the techniques that are available in
> Postgres that might help keep the same schema performant. I think looking at
> partitioning or at partial indexing (making a new index on the code clause)
> could help keep performance with minimal impact to the schema.

And keep in mind, there's no such thing as a free lunch here.  Oracle
makes tradeoffs to do this that mean that rolling back a transaction
is MUCH more expensive than it is in pgsql.  I'm sure there are other
tradeoffs as well.  BTW, apparently, InnoDB does the same type of
thing, and also suffers from the VERY expensive rollback issues as
well.