Обсуждение: Problem with query on history table

Поиск
Список
Период
Сортировка

Problem with query on history table

От
Andreas Joseph Krogh
Дата:
Hi all!

I don't know if there's a standard solution to the kind of problem I'm trying 
to solve, but I will appreciate your thougts(and maybe solution:) on this 
problem of mine:

I have 2 tables: hist and curr which hold numbers for "history-data" and 
"current-data" respectivly. Here is a simplified version of the schema:

CREATE TABLE curr (   id integer NOT NULL,   etc integer NOT NULL,   created timestamp without time zone NOT NULL,
modifiedtimestamp without time zone
 
);

CREATE TABLE hist (   id serial NOT NULL,   curr_id integer NOT NULL REFERENCES curr(id),   etc integer NOT NULL,
modifiedtimestamp without time zone NOT NULL
 
);

andreak=# SELECT * from curr;id | etc |       created       |      modified
----+-----+---------------------+--------------------- 1 |   5 | 2006-02-01 00:00:00 | 2 |  10 | 2006-01-15 00:00:00 |
2006-01-2600:00:00 3 |  10 | 2006-01-08 00:00:00 | 2006-01-25 00:00:00
 
(3 rows)

andreak=# SELECT * from hist;id | curr_id | etc |      modified
----+--------+-----+--------------------- 1 |       3 |  30 | 2006-01-16 00:00:00 2 |       3 |  20 | 2006-01-25
00:00:003 |       2 |  20 | 2006-01-26 00:00:00
 
(3 rows)

Now - I would like to get a report on what the "ETC" is on a given entry in 
"curr" in a given "point in time". Let me explain. If I want status for 17. 
jan.(17.01.2006) I would like to get these numbers out from the query:
id |       created       |    curr_modified    |    hist_modified    | etc
----+---------------------+---------------------+---------------------+----- 3 | 2006-01-08 00:00:00 | 2006-01-25
00:00:00| 2006-01-16 00:00:00 |  30 2 | 2006-01-15 00:00:00 | 2006-01-26 00:00:00 | 2006-01-26 00:00:00 |  20 1 |
2006-02-0100:00:00 |                     |                     |   5
 


That is; If the entry is modified after it's created, a snapshot of the "old 
version" is copied to table "hist" with the hist.modified field set to the 
"modified-timestamp". So there will exist several entries in "hist" for each 
time an entry in "curr" is modified.

If I want status for the 27. jan. I would like the query to return the 
following rows:
id |       created       |    curr_modified    |    hist_modified    | etc
----+---------------------+---------------------+---------------------+----- 3 | 2006-01-08 00:00:00 | 2006-01-25
00:00:00| 2006-01-25 00:00:00 |  10 2 | 2006-01-15 00:00:00 | 2006-01-26 00:00:00 | 2006-01-26 00:00:00 |  10 1 |
2006-02-0100:00:00 |                     |                     |   5
 

select curr.id, curr.created, curr.modified as curr_modified, hist.modified as 
hist_modified, coalesce(hist.etc, curr.etc) as etc FROM curr LEFT OUTER JOIN 
hist ON(curr.id = hist.curr_id) WHERE ...

I'm really stuck here. It seems to me that I need a lot of 
CASE...WHEN...ELSE.. statements in the query, but is there an easier way?

--
Andreas Joseph Krogh <andreak@officenet.no>


Re: Problem with query on history table

От
"Jim C. Nasby"
Дата:
Probably the easiest way is to switch to using table partitioning and
switch to using start_timestamp and end_timestamp, so that when you
modify a row you update the old one setting end_timestamp to now() and
insert the new row (all within one transaction).

There are other ways to do it, but they'll probably be much slower. I
don't think they require a lot of CASE statements though.

Show us what you were planning on doing and maybe I'll have more ideas.

On Mon, Feb 27, 2006 at 08:19:30AM +0100, Andreas Joseph Krogh wrote:
> Hi all!
> 
> I don't know if there's a standard solution to the kind of problem I'm trying 
> to solve, but I will appreciate your thougts(and maybe solution:) on this 
> problem of mine:
> 
> I have 2 tables: hist and curr which hold numbers for "history-data" and 
> "current-data" respectivly. Here is a simplified version of the schema:
> 
> CREATE TABLE curr (
>     id integer NOT NULL,
>     etc integer NOT NULL,
>     created timestamp without time zone NOT NULL,
>     modified timestamp without time zone
> );
> 
> CREATE TABLE hist (
>     id serial NOT NULL,
>     curr_id integer NOT NULL REFERENCES curr(id),
>     etc integer NOT NULL,
>     modified timestamp without time zone NOT NULL
> );
> 
> andreak=# SELECT * from curr;
>  id | etc |       created       |      modified
> ----+-----+---------------------+---------------------
>   1 |   5 | 2006-02-01 00:00:00 |
>   2 |  10 | 2006-01-15 00:00:00 | 2006-01-26 00:00:00
>   3 |  10 | 2006-01-08 00:00:00 | 2006-01-25 00:00:00
> (3 rows)
> 
> andreak=# SELECT * from hist;
>  id | curr_id | etc |      modified
> ----+--------+-----+---------------------
>   1 |       3 |  30 | 2006-01-16 00:00:00
>   2 |       3 |  20 | 2006-01-25 00:00:00
>   3 |       2 |  20 | 2006-01-26 00:00:00
> (3 rows)
> 
> Now - I would like to get a report on what the "ETC" is on a given entry in 
> "curr" in a given "point in time". Let me explain. If I want status for 17. 
> jan.(17.01.2006) I would like to get these numbers out from the query:
> 
>  id |       created       |    curr_modified    |    hist_modified    | etc
> ----+---------------------+---------------------+---------------------+-----
>   3 | 2006-01-08 00:00:00 | 2006-01-25 00:00:00 | 2006-01-16 00:00:00 |  30
>   2 | 2006-01-15 00:00:00 | 2006-01-26 00:00:00 | 2006-01-26 00:00:00 |  20
>   1 | 2006-02-01 00:00:00 |                     |                     |   5
> 
> 
> That is; If the entry is modified after it's created, a snapshot of the "old 
> version" is copied to table "hist" with the hist.modified field set to the 
> "modified-timestamp". So there will exist several entries in "hist" for each 
> time an entry in "curr" is modified.
> 
> If I want status for the 27. jan. I would like the query to return the 
> following rows:
> 
>  id |       created       |    curr_modified    |    hist_modified    | etc
> ----+---------------------+---------------------+---------------------+-----
>   3 | 2006-01-08 00:00:00 | 2006-01-25 00:00:00 | 2006-01-25 00:00:00 |  10
>   2 | 2006-01-15 00:00:00 | 2006-01-26 00:00:00 | 2006-01-26 00:00:00 |  10
>   1 | 2006-02-01 00:00:00 |                     |                     |   5
> 
> select curr.id, curr.created, curr.modified as curr_modified, hist.modified as 
> hist_modified, coalesce(hist.etc, curr.etc) as etc FROM curr LEFT OUTER JOIN 
> hist ON(curr.id = hist.curr_id) WHERE ...
> 
> I'm really stuck here. It seems to me that I need a lot of 
> CASE...WHEN...ELSE.. statements in the query, but is there an easier way?
> 
> --
> Andreas Joseph Krogh <andreak@officenet.no>
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
> 
>                http://www.postgresql.org/docs/faq
> 

-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Problem with query on history table

От
Andreas Joseph Krogh
Дата:
On Wednesday 01 March 2006 23:19, Jim C. Nasby wrote:
> Probably the easiest way is to switch to using table partitioning and
> switch to using start_timestamp and end_timestamp, so that when you
> modify a row you update the old one setting end_timestamp to now() and
> insert the new row (all within one transaction).
>
> There are other ways to do it, but they'll probably be much slower. I
> don't think they require a lot of CASE statements though.
>
> Show us what you were planning on doing and maybe I'll have more ideas.
>
> On Mon, Feb 27, 2006 at 08:19:30AM +0100, Andreas Joseph Krogh wrote:
> > Hi all!
> >
> > I don't know if there's a standard solution to the kind of problem I'm
> > trying to solve, but I will appreciate your thougts(and maybe solution:)
> > on this problem of mine:
> >
> > I have 2 tables: hist and curr which hold numbers for "history-data" and
> > "current-data" respectivly. Here is a simplified version of the schema:
> >
> > CREATE TABLE curr (
> >     id integer NOT NULL,
> >     etc integer NOT NULL,
> >     created timestamp without time zone NOT NULL,
> >     modified timestamp without time zone
> > );
> >
> > CREATE TABLE hist (
> >     id serial NOT NULL,
> >     curr_id integer NOT NULL REFERENCES curr(id),
> >     etc integer NOT NULL,
> >     modified timestamp without time zone NOT NULL
> > );
> >
> > andreak=# SELECT * from curr;
> >  id | etc |       created       |      modified
> > ----+-----+---------------------+---------------------
> >   1 |   5 | 2006-02-01 00:00:00 |
> >   2 |  10 | 2006-01-15 00:00:00 | 2006-01-26 00:00:00
> >   3 |  10 | 2006-01-08 00:00:00 | 2006-01-25 00:00:00
> > (3 rows)
> >
> > andreak=# SELECT * from hist;
>

The scenario I'm having is this:
I have some data in table "curr", and whenever that data changes, a copy of 
the "old data" for that entry is copied over to "hist" so that "hist" always 
holds all entries but the newest one with their values "one before" "curr".

I solved my problem with the following query:

SELECT curr.id,
COALESCE(
(SELECT h.etc FROM history h WHERE h.history_modified =
(SELECT MIN(h2.history_modified)FROM history h2 WHERE h2.history_modified >= ?AND h2.curr_id = curr.id)AND curr.created
<=? AND curr.id = h.curr_id),CASE WHEN curr.created > ? THEN NULL ELSE curr.etc END) AS etc
 
FROM curr;

I don't know how well it performes on larger data, but it work for me for the 
moment. I very much welcome some feedback on my "solution", and some 
enlightenment on what the impact of subqueries like this have on performance 
on larger data-sets.

--
AJK