Обсуждение: Problem with query on history table
Hi all! I don't know if there's a standard solution to the kind of problem I'm trying to solve, but I will appreciate your thougts(and maybe solution:) on this problem of mine: I have 2 tables: hist and curr which hold numbers for "history-data" and "current-data" respectivly. Here is a simplified version of the schema: CREATE TABLE curr ( id integer NOT NULL, etc integer NOT NULL, created timestamp without time zone NOT NULL, modifiedtimestamp without time zone ); CREATE TABLE hist ( id serial NOT NULL, curr_id integer NOT NULL REFERENCES curr(id), etc integer NOT NULL, modifiedtimestamp without time zone NOT NULL ); andreak=# SELECT * from curr;id | etc | created | modified ----+-----+---------------------+--------------------- 1 | 5 | 2006-02-01 00:00:00 | 2 | 10 | 2006-01-15 00:00:00 | 2006-01-2600:00:00 3 | 10 | 2006-01-08 00:00:00 | 2006-01-25 00:00:00 (3 rows) andreak=# SELECT * from hist;id | curr_id | etc | modified ----+--------+-----+--------------------- 1 | 3 | 30 | 2006-01-16 00:00:00 2 | 3 | 20 | 2006-01-25 00:00:003 | 2 | 20 | 2006-01-26 00:00:00 (3 rows) Now - I would like to get a report on what the "ETC" is on a given entry in "curr" in a given "point in time". Let me explain. If I want status for 17. jan.(17.01.2006) I would like to get these numbers out from the query: id | created | curr_modified | hist_modified | etc ----+---------------------+---------------------+---------------------+----- 3 | 2006-01-08 00:00:00 | 2006-01-25 00:00:00| 2006-01-16 00:00:00 | 30 2 | 2006-01-15 00:00:00 | 2006-01-26 00:00:00 | 2006-01-26 00:00:00 | 20 1 | 2006-02-0100:00:00 | | | 5 That is; If the entry is modified after it's created, a snapshot of the "old version" is copied to table "hist" with the hist.modified field set to the "modified-timestamp". So there will exist several entries in "hist" for each time an entry in "curr" is modified. If I want status for the 27. jan. I would like the query to return the following rows: id | created | curr_modified | hist_modified | etc ----+---------------------+---------------------+---------------------+----- 3 | 2006-01-08 00:00:00 | 2006-01-25 00:00:00| 2006-01-25 00:00:00 | 10 2 | 2006-01-15 00:00:00 | 2006-01-26 00:00:00 | 2006-01-26 00:00:00 | 10 1 | 2006-02-0100:00:00 | | | 5 select curr.id, curr.created, curr.modified as curr_modified, hist.modified as hist_modified, coalesce(hist.etc, curr.etc) as etc FROM curr LEFT OUTER JOIN hist ON(curr.id = hist.curr_id) WHERE ... I'm really stuck here. It seems to me that I need a lot of CASE...WHEN...ELSE.. statements in the query, but is there an easier way? -- Andreas Joseph Krogh <andreak@officenet.no>
Probably the easiest way is to switch to using table partitioning and switch to using start_timestamp and end_timestamp, so that when you modify a row you update the old one setting end_timestamp to now() and insert the new row (all within one transaction). There are other ways to do it, but they'll probably be much slower. I don't think they require a lot of CASE statements though. Show us what you were planning on doing and maybe I'll have more ideas. On Mon, Feb 27, 2006 at 08:19:30AM +0100, Andreas Joseph Krogh wrote: > Hi all! > > I don't know if there's a standard solution to the kind of problem I'm trying > to solve, but I will appreciate your thougts(and maybe solution:) on this > problem of mine: > > I have 2 tables: hist and curr which hold numbers for "history-data" and > "current-data" respectivly. Here is a simplified version of the schema: > > CREATE TABLE curr ( > id integer NOT NULL, > etc integer NOT NULL, > created timestamp without time zone NOT NULL, > modified timestamp without time zone > ); > > CREATE TABLE hist ( > id serial NOT NULL, > curr_id integer NOT NULL REFERENCES curr(id), > etc integer NOT NULL, > modified timestamp without time zone NOT NULL > ); > > andreak=# SELECT * from curr; > id | etc | created | modified > ----+-----+---------------------+--------------------- > 1 | 5 | 2006-02-01 00:00:00 | > 2 | 10 | 2006-01-15 00:00:00 | 2006-01-26 00:00:00 > 3 | 10 | 2006-01-08 00:00:00 | 2006-01-25 00:00:00 > (3 rows) > > andreak=# SELECT * from hist; > id | curr_id | etc | modified > ----+--------+-----+--------------------- > 1 | 3 | 30 | 2006-01-16 00:00:00 > 2 | 3 | 20 | 2006-01-25 00:00:00 > 3 | 2 | 20 | 2006-01-26 00:00:00 > (3 rows) > > Now - I would like to get a report on what the "ETC" is on a given entry in > "curr" in a given "point in time". Let me explain. If I want status for 17. > jan.(17.01.2006) I would like to get these numbers out from the query: > > id | created | curr_modified | hist_modified | etc > ----+---------------------+---------------------+---------------------+----- > 3 | 2006-01-08 00:00:00 | 2006-01-25 00:00:00 | 2006-01-16 00:00:00 | 30 > 2 | 2006-01-15 00:00:00 | 2006-01-26 00:00:00 | 2006-01-26 00:00:00 | 20 > 1 | 2006-02-01 00:00:00 | | | 5 > > > That is; If the entry is modified after it's created, a snapshot of the "old > version" is copied to table "hist" with the hist.modified field set to the > "modified-timestamp". So there will exist several entries in "hist" for each > time an entry in "curr" is modified. > > If I want status for the 27. jan. I would like the query to return the > following rows: > > id | created | curr_modified | hist_modified | etc > ----+---------------------+---------------------+---------------------+----- > 3 | 2006-01-08 00:00:00 | 2006-01-25 00:00:00 | 2006-01-25 00:00:00 | 10 > 2 | 2006-01-15 00:00:00 | 2006-01-26 00:00:00 | 2006-01-26 00:00:00 | 10 > 1 | 2006-02-01 00:00:00 | | | 5 > > select curr.id, curr.created, curr.modified as curr_modified, hist.modified as > hist_modified, coalesce(hist.etc, curr.etc) as etc FROM curr LEFT OUTER JOIN > hist ON(curr.id = hist.curr_id) WHERE ... > > I'm really stuck here. It seems to me that I need a lot of > CASE...WHEN...ELSE.. statements in the query, but is there an easier way? > > -- > Andreas Joseph Krogh <andreak@officenet.no> > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Wednesday 01 March 2006 23:19, Jim C. Nasby wrote: > Probably the easiest way is to switch to using table partitioning and > switch to using start_timestamp and end_timestamp, so that when you > modify a row you update the old one setting end_timestamp to now() and > insert the new row (all within one transaction). > > There are other ways to do it, but they'll probably be much slower. I > don't think they require a lot of CASE statements though. > > Show us what you were planning on doing and maybe I'll have more ideas. > > On Mon, Feb 27, 2006 at 08:19:30AM +0100, Andreas Joseph Krogh wrote: > > Hi all! > > > > I don't know if there's a standard solution to the kind of problem I'm > > trying to solve, but I will appreciate your thougts(and maybe solution:) > > on this problem of mine: > > > > I have 2 tables: hist and curr which hold numbers for "history-data" and > > "current-data" respectivly. Here is a simplified version of the schema: > > > > CREATE TABLE curr ( > > id integer NOT NULL, > > etc integer NOT NULL, > > created timestamp without time zone NOT NULL, > > modified timestamp without time zone > > ); > > > > CREATE TABLE hist ( > > id serial NOT NULL, > > curr_id integer NOT NULL REFERENCES curr(id), > > etc integer NOT NULL, > > modified timestamp without time zone NOT NULL > > ); > > > > andreak=# SELECT * from curr; > > id | etc | created | modified > > ----+-----+---------------------+--------------------- > > 1 | 5 | 2006-02-01 00:00:00 | > > 2 | 10 | 2006-01-15 00:00:00 | 2006-01-26 00:00:00 > > 3 | 10 | 2006-01-08 00:00:00 | 2006-01-25 00:00:00 > > (3 rows) > > > > andreak=# SELECT * from hist; > The scenario I'm having is this: I have some data in table "curr", and whenever that data changes, a copy of the "old data" for that entry is copied over to "hist" so that "hist" always holds all entries but the newest one with their values "one before" "curr". I solved my problem with the following query: SELECT curr.id, COALESCE( (SELECT h.etc FROM history h WHERE h.history_modified = (SELECT MIN(h2.history_modified)FROM history h2 WHERE h2.history_modified >= ?AND h2.curr_id = curr.id)AND curr.created <=? AND curr.id = h.curr_id),CASE WHEN curr.created > ? THEN NULL ELSE curr.etc END) AS etc FROM curr; I don't know how well it performes on larger data, but it work for me for the moment. I very much welcome some feedback on my "solution", and some enlightenment on what the impact of subqueries like this have on performance on larger data-sets. -- AJK