Обсуждение: Need SQL Help Finding Current Status of members
I have a table which keeps track of the status of members. In the table is member_id int(8) status_code char(1) status_date date KEY member_id (member_id,status_code,status_date) Each member can have multiple records because a record is added each time the status changes but the old record is kept for history. What I want to do is find the latest status for each member. Actually I want to find all those with an status of "A". But it must be the current (latest) status. How do I find the most current date for each member in a pile of many records for many members with many status settings with one SQL statement? This is a bit beyond my capabilities so I am asking for help. My initial SQL is SELECT * FROM memberstatus WHERE status_code = 'a' but that is my limit. I know an AND comes next but need help after that! I know that MAX is not for dates so that is out. Is there a LATEST DATE? I did not see one when I was looking at the date functions. Appreciate the help. Mike
Вложения
On Thu, Dec 15, 2005 at 08:31:09PM -0500, Michael Avila wrote: > What I want to do is find the latest status for each member. Actually I want > to find all those with an status of "A". But it must be the current (latest) > status. How do I find the most current date for each member in a pile of > many records for many members with many status settings with one SQL > statement? Suppose you have this table: SELECT * FROM memberstatus; member_id | status_code | status_date -----------+-------------+------------- 1 | a | 2005-01-01 2 | x | 2005-01-01 3| x | 2005-01-01 4 | x | 2005-01-01 1 | x | 2005-12-15 2 | a | 2005-12-15 3 | y | 2005-12-15 4 | a | 2005-12-15 (8 rows) Let's order the data so all of a member's records are shown together, with the latest one first: SELECT * FROM memberstatus ORDER BY member_id, status_date DESC; member_id | status_code | status_date -----------+-------------+------------- 1 | x | 2005-12-15 1 | a | 2005-01-01 2| a | 2005-12-15 2 | x | 2005-01-01 3 | y | 2005-12-15 3 | x | 2005-01-01 4 | a | 2005-12-15 4 | x | 2005-01-01 (8 rows) One way to get only the first record for each member is to use PostgreSQL's nonstandard DISTINCT ON construct: SELECT DISTINCT ON (member_id) * FROM memberstatus ORDER BY member_id, status_date DESC; member_id | status_code | status_date -----------+-------------+------------- 1 | x | 2005-12-15 2 | a | 2005-12-15 3| y | 2005-12-15 4 | a | 2005-12-15 (4 rows) We could put the above in a subquery and restrict the output to the records we want: SELECT * FROM ( SELECT DISTINCT ON (member_id) * FROM memberstatus ORDER BY member_id, status_date DESC ) AS s WHERE status_code = 'a' ORDER BY member_id; member_id | status_code | status_date -----------+-------------+------------- 2 | a | 2005-12-15 4 | a | 2005-12-15 (2 rows) This isn't the only way; search the archives for alternatives. -- Michael Fuhr
Michael Avila wrote: > I have a table which keeps track of the status of members. In the table is > > > member_id int(8) > status_code char(1) > status_date date > > KEY member_id (member_id,status_code,status_date) > > > Each member can have multiple records because a record is added each time > the status changes but the old record is kept for history. > > What I want to do is find the latest status for each member. Michael Fuhr has already described on solution, but if you can alter the table definition then there might be a neater solution. Replace "status_date" with "status_expires" and make it a "timestamp with time zone". Set the expiry to 'infinity' for the current record and you then have a simple select to find the mostrecent. If you regularly want to find which record was active on a particular time you'll want two columns: valid_from and valid_to. This makes it much easier to find a row for a specific date. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > Michael Avila wrote: > >> I have a table which keeps track of the status of members. In the >> table is >> >> member_id int(8) >> status_code char(1) >> status_date date >> KEY member_id (member_id,status_code,status_date) >> >> >> Each member can have multiple records because a record is added each >> time >> the status changes but the old record is kept for history. >> >> What I want to do is find the latest status for each member. > > > Michael Fuhr has already described on solution, but if you can alter > the table definition then there might be a neater solution. > > Replace "status_date" with "status_expires" and make it a "timestamp > with time zone". Set the expiry to 'infinity' for the current record > and you then have a simple select to find the most recent. > > If you regularly want to find which record was active on a particular > time you'll want two columns: valid_from and valid_to. This makes it > much easier to find a row for a specific date. There is a standard way : Select * from memberstatus A where not exists (select * from emberstatus B where B.member_id=A.member_id and B.status_date >A.status_date)
Interesting. I think I understand that. I have never worked with a SELECT within a SELECT (I think that is called a subquery). I am guessing that it works its way through the member status records until the latest date "floats" to the top (nothing is > than it). Will that be a problem performance-wise if there are thousands of records? Thanks for the help. Mike > -----Original Message----- > From: pgsql-sql-owner@postgresql.org > [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Patrick JACQUOT > Sent: Friday, December 16, 2005 5:12 AM > Cc: SQL PostgreSQL MailList > Subject: Re: [SQL] Need SQL Help Finding Current Status of members > > > Richard Huxton wrote: > > > Michael Avila wrote: > > > >> I have a table which keeps track of the status of members. In the > >> table is > >> > >> member_id int(8) > >> status_code char(1) > >> status_date date > >> KEY member_id (member_id,status_code,status_date) > >> > >> > >> Each member can have multiple records because a record is added each > >> time > >> the status changes but the old record is kept for history. > >> > >> What I want to do is find the latest status for each member. > > > > > > Michael Fuhr has already described on solution, but if you can alter > > the table definition then there might be a neater solution. > > > > Replace "status_date" with "status_expires" and make it a "timestamp > > with time zone". Set the expiry to 'infinity' for the current record > > and you then have a simple select to find the most recent. > > > > If you regularly want to find which record was active on a particular > > time you'll want two columns: valid_from and valid_to. This makes it > > much easier to find a row for a specific date. > > There is a standard way : > > Select * from memberstatus A where not exists > (select * from emberstatus B where B.member_id=A.member_id and > B.status_date >A.status_date) > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings
Michael Avila wrote: > Interesting. I think I understand that. I have never worked with a SELECT > within a SELECT (I think that is called a subquery). I am guessing that it > works its way through the member status records until the latest date > "floats" to the top (nothing is > than it). > > Will that be a problem performance-wise if there are thousands of records? The actual execution of the query may be something completely different, depending on what the optimizer decides. I doubt it will suck much, because the optimizer is quite good. Try executing EXPLAIN query; to see how it would be executed, and EXPLAIN ANALYZE query; to see the above plus the real numbers that the executor got after executing it (number of times each loop was executed, time spent at each execution step, etc). Read the ANALYZE page to see how are these things supposed to be read. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Just tried it and it returned nothing. > > Select * from memberstatus A where not exists > > (select * from emberstatus B where B.member_id=A.member_id and > > B.status_date >A.status_date) Why the WHERE NOT EXISTS? Here is a copy and paste of my code $query = "SELECT * FROM memberstatus A WHERE NOT EXISTS (SELECT * from memberstatus B WHERE B.member_id=A.member_id AND B.status_date > A.status_date)"; Mike > -----Original Message----- > From: Michael Avila [mailto:Michael.Avila.1@sbcglobal.net] > Sent: Friday, December 16, 2005 8:46 AM > To: SQL PostgreSQL MailList > Subject: RE: [SQL] Need SQL Help Finding Current Status of members > > > Interesting. I think I understand that. I have never worked with > a SELECT within a SELECT (I think that is called a subquery). I > am guessing that it works its way through the member status > records until the latest date "floats" to the top (nothing is > than it). > > Will that be a problem performance-wise if there are thousands of records? > > Thanks for the help. > > Mike > > > > -----Original Message----- > > From: pgsql-sql-owner@postgresql.org > > [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Patrick JACQUOT > > Sent: Friday, December 16, 2005 5:12 AM > > Cc: SQL PostgreSQL MailList > > Subject: Re: [SQL] Need SQL Help Finding Current Status of members > > > > > > Richard Huxton wrote: > > > > > Michael Avila wrote: > > > > > >> I have a table which keeps track of the status of members. In the > > >> table is > > >> > > >> member_id int(8) > > >> status_code char(1) > > >> status_date date > > >> KEY member_id (member_id,status_code,status_date) > > >> > > >> > > >> Each member can have multiple records because a record is added each > > >> time > > >> the status changes but the old record is kept for history. > > >> > > >> What I want to do is find the latest status for each member. > > > > > > > > > Michael Fuhr has already described on solution, but if you can alter > > > the table definition then there might be a neater solution. > > > > > > Replace "status_date" with "status_expires" and make it a "timestamp > > > with time zone". Set the expiry to 'infinity' for the current record > > > and you then have a simple select to find the most recent. > > > > > > If you regularly want to find which record was active on a particular > > > time you'll want two columns: valid_from and valid_to. This makes it > > > much easier to find a row for a specific date. > > > > There is a standard way : > > > > Select * from memberstatus A where not exists > > (select * from emberstatus B where B.member_id=A.member_id and > > B.status_date >A.status_date) > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 5: don't forget to increase your free space map settings
On Fri, Dec 16, 2005 at 07:44:46PM -0500, Michael Avila wrote: > Just tried it and it returned nothing. > > > > Select * from memberstatus A where not exists > > > (select * from emberstatus B where B.member_id=A.member_id and > > > B.status_date >A.status_date) > > Why the WHERE NOT EXISTS? The query selects each row in memberstatus for which no other rows (WHERE NOT EXISTS) with the same member_id have a later status_date; in other words, the row(s) with the latest status_date for each member_id. For example, given member_id | status_code | status_date -----------+-------------+------------- 1 | a | 2005-12-01 1 | b | 2005-12-02 1| c | 2005-12-03 2 | x | 2005-12-11 2 | y | 2005-12-12 2 | z | 2005-12-13 the query should return member_id | status_code | status_date -----------+-------------+------------- 1 | c | 2005-12-03 2 | z | 2005-12-13 Offhand I can't think of why the query would return nothing unless the table is empty, but maybe I'm overlooking something or making unwarranted assumptions about the data. Can you post a sample data set for which the query returns no rows? -- Michael Fuhr
The table is not empty. I did some playing around with the SQL Statement but got no where. So I added a record status column that will be non-NULL when the status becomes history because a new status is added. Yeah, I cheated to make it easy!! LOL But it now works and that is what counts! Thanks for the help. Mike > -----Original Message----- > From: Michael Fuhr [mailto:mike@fuhr.org] > Sent: Saturday, December 17, 2005 7:26 PM > To: Michael Avila > Cc: SQL PostgreSQL MailList > Subject: Re: [SQL] Need SQL Help Finding Current Status of members > > > On Fri, Dec 16, 2005 at 07:44:46PM -0500, Michael Avila wrote: > > Just tried it and it returned nothing. > > > > > > Select * from memberstatus A where not exists > > > > (select * from emberstatus B where B.member_id=A.member_id and > > > > B.status_date >A.status_date) > > > > Why the WHERE NOT EXISTS? > > The query selects each row in memberstatus for which no other rows > (WHERE NOT EXISTS) with the same member_id have a later status_date; > in other words, the row(s) with the latest status_date for each > member_id. For example, given > > member_id | status_code | status_date > -----------+-------------+------------- > 1 | a | 2005-12-01 > 1 | b | 2005-12-02 > 1 | c | 2005-12-03 > 2 | x | 2005-12-11 > 2 | y | 2005-12-12 > 2 | z | 2005-12-13 > > the query should return > > member_id | status_code | status_date > -----------+-------------+------------- > 1 | c | 2005-12-03 > 2 | z | 2005-12-13 > > Offhand I can't think of why the query would return nothing unless > the table is empty, but maybe I'm overlooking something or making > unwarranted assumptions about the data. Can you post a sample data > set for which the query returns no rows? > > -- > Michael Fuhr
On Sat, Dec 17, 2005 at 07:34:22PM -0500, Michael Avila wrote: > The table is not empty. I did some playing around with the SQL Statement but > got no where. So I added a record status column that will be non-NULL when > the status becomes history because a new status is added. Yeah, I cheated to > make it easy!! LOL But it now works and that is what counts! Even if you're satisfied with whatever you ended up doing, it might still be educational to find out why the query didn't work. If you post a test case we'll take a look; we might even be able to suggest a better solution than the "cheat" you used. -- Michael Fuhr