Re: SQL - planet redundant data

Поиск
Список
Период
Сортировка
От Jim C. Nasby
Тема Re: SQL - planet redundant data
Дата
Msg-id 20050918003110.GQ7630@pervasive.com
обсуждение исходный текст
Ответ на Re: SQL - planet redundant data  (Poul Jensen <flyvholm@gfy.ku.dk>)
Список pgsql-general
On Sun, Sep 11, 2005 at 11:00:02PM -0800, Poul Jensen wrote:
> Tom Lane wrote:
>
> >No, tableoid is sort of a virtual column ... it doesn't exist on disk.
> >When you query it you get a value fetched from the internal data
> >structure representing the table.
> >
> >
> So virtual columns are possible - THIS is a way to clear redundant data!
> Is it possible for a user to create a virtual column? If not, this would
> make a big improvement.
>
> What I really need are "partial virtual columns". I'm imagining an
> alternative version of VACUUM ANALYZE that could do the following:
> 1) Order the rows in the table so that for each column, identical values
> are placed next to each other for as far as possible (the row order that
> optimizes one column will probably not be optimal for other columns).
> 2) For each column, identify the stretches that contain only one
> distinct value. Save that value together with ID of start and end row
> and delete stretch.
> It is not obvious how to do a perfect optimization process in 1), at
> least not to me - I'm sure a skilled mathematician would know exactly
> how to do it. But here's a simple approach that would get us part of the
> way:
> 1.1) Grab the column w. most redundancy (fewest distinct values) and
> sort it into groups according to the distinct values.
> 1.2) For each of these groups, grab the column w. next most redundancy
> and sort into groups according to the distinct values.
> And so on. Stop whenever groups become so small that there's nothing to
> gain.
> Such an analysis would make it much less expensive to combine
> same-schema tables, and having everything in the same table is really
> convenient. It would obviously save a lot of storage space, but I
> imagine it would enable more efficient queries too - having to check
> just 3 values instead of the thousands (or even millions) they may
> replace must give a considerable gain.
>
> 'What is the big benefit of not having ordered rows? I imagine it could
> be a disadvantage for dynamic databases, but for a static database like
> mine which won't be modified, except for maybe adding new data once a
> year, I imagine an optimization including row ordering could be highly
> beneficial.

Oracle supports something akin to this. On an Index Organized Table you
can tell it to pull part of the index key out of individual rows. IE:

CREATE TABLE (a, b, c, d, e, f)
    PRIMARY KEY(a, b, c, d)
    INDEX ORGANIZED COMBINE(a, b)
;

In this case, every time the combination of (a, b) changes, Oracle
stores a special record of some kind that indicates the change, and it
doesn't store a or b with each row. (Note that I'm guessing on syntax,
it's been a while since I've used Oracle).

> ##############################
>
> Jim C. Nasby wrote:
>
> >What you seem to be looking for is a form of partitioning. PostgreSQL
> >doesn't currently support partitioning of this form, but there's work in
> >progress to change that.
> >
> >
> Any idea how far out in the future this is? Would it make the
> optimization process described above (reply to Tom Lane) obsolete? Well,
> maybe my ideas about an optimal solution just illustrate lack of
> knowledge about SQL, but I'm hoping somebody can see what I'm trying to
> suggest.

Well, the firslt important step is slated for 8.1; effective partition
elimination. You can search for more info, especially in the BizGres
archives. Next step is creating actual partitioning syntax to make
managing partitions easier.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

В списке pgsql-general по дате отправления:

Предыдущее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: Duplicate Values or Not?!
Следующее
От: "Warren Bell"
Дата:
Сообщение: PDF Documentation?