Обсуждение: Aggregate query for multiple records

Поиск
Список
Период
Сортировка

Aggregate query for multiple records

От
Scott Gerhardt
Дата:
Hello, I am new to the list, my apology if this question is beyond the 
scope or charter of this list.

My questions is:
What is the best method to perform an aggregate query to calculate 
sum() values for each distinct wid as in the example below, but except 
for all wid's (not just WHERE wid='01/1-6-1-30w1/0').

Also, performance wise, would it be better to build a function for this 
query.  The table has 9 million records and these aggregate queries 
take hours.


SELECT  SUM(oil) as sumoil, SUM(hours) as sumhours,
FROM  (SELECT oil, hours prd_data WHERE wid='01/1-6-1-30w1/0'   ORDER BY date LIMIT 6) subtable
;


Table description:              Table "prd_data" Column |         Type          | Modifiers
--------+-----------------------+----------- date   | integer               | hours  | real                  | oil    |
real                 | gas    | real                  | water  | real                  | pwid   | integer
|wid    | character varying(20) | year   | smallint              |
 
Indexes: wid_index6


Actual table (prd_data), 9 million records:
  date  | hours |  oil  | gas  | water | pwid |       wid       | year
--------+-------+-------+------+-------+------+-----------------+------ 196507 |   360 | 159.4 | 11.3 |  40.9 |  413 |
01/1-1-1-31w1/0| 1965 196508 |   744 |   280 |   20 |  27.2 |  413 | 01/1-1-1-31w1/0 | 1965 196509 |   360 | 171.1 |
11.4|  50.4 |  413 | 01/1-1-1-31w1/0 | 1965 196510 |   744 | 202.1 |   25 |  89.8 |  413 | 01/1-1-1-31w1/0 | 1965
196512|   744 | 201.3 | 23.8 |  71.9 |  413 | 01/1-1-1-31w1/0 | 1965 196511 |   720 |   184 | 17.6 |  78.9 |  413 |
01/1-1-1-31w1/0| 1965 196610 |   744 |  99.8 | 15.4 |  53.7 |  413 | 01/1-1-1-31w1/0 | 1966 196612 |   744 |    86 |
12.8|  36.1 |  413 | 01/1-1-1-31w1/0 | 1966 196611 |   720 |    86 | 12.6 |  41.7 |  413 | 01/1-1-1-31w1/0 | 1966
196601|   744 | 191.6 | 22.6 |  50.7 |  413 | 01/1-1-1-31w1/0 | 1966 200301 |   461 |  68.8 |   0 | 186.3 | 47899 |
9G/6-1-50-24w3/0| 2003 200310 |   740 | 446.3 |   0 | 563.1 | 47899 | 9G/6-1-50-24w3/0 | 2003 200306 |   667 |  92.1 |
0 | 968.8 | 47899 | 9G/6-1-50-24w3/0 | 2003 200304 |     0 |     0 |   0 |     0 | 47899 | 9G/6-1-50-24w3/0 | 2003
200308|   457 | 100.7 |   0 |  82.8 | 47899 | 9G/6-1-50-24w3/0 | 2003 200307 |   574 |    78 |   0 |   752 | 47899 |
9G/6-1-50-24w3/0| 2003 200312 |   582 | 360.9 |   0 |   569 | 47899 | 9G/6-1-50-24w3/0 | 2003 200311 |   681 | 260.8 |
0 | 563.9 | 47899 | 9G/6-1-50-24w3/0 | 2003 200305 |   452 |     0 |   0 |     0 | 47899 | 9G/6-1-50-24w3/0 | 2003
200309|   637 | 244.6 |   0 | 193.8 | 47899 | 9G/6-1-50-24w3/0 | 2003
 
(20 rows)



Thanks,

--
Scott A. Gerhardt, P.Geo.
Gerhardt Information Technologies



Re: Aggregate query for multiple records

От
Josh Berkus
Дата:
Scott,

> Hello, I am new to the list, my apology if this question is beyond the
> scope or charter of this list.

We have a charter?   Why didn't anyone tell me?   

> My questions is:
> What is the best method to perform an aggregate query to calculate
> sum() values for each distinct wid as in the example below, but except
> for all wid's (not just WHERE wid='01/1-6-1-30w1/0').

> SELECT
>    SUM(oil) as sumoil, SUM(hours) as sumhours,
> FROM
>    (SELECT oil, hours prd_data WHERE wid='01/1-6-1-30w1/0'
>     ORDER BY date LIMIT 6) subtable

Your problem here is the "limit 6"  There isn't any good+fast way to take only 
6 of each thing and total them.   Also the above query is missing a FROM 
clause, so I;ve had to guess at your table name below.  Oh, and the word 
"date" is a reserved word, better to quote it. 

This is valid, but it won't be fast:

SELECT wid, (SELECT SUM(oil) FROM  prd_data pd2 WHERE pd2.wid = prd_data.wid ORDER BY "date" LIMIT 6) as
tot_oil,(SELECTSUM(hours) FROM  prd_data pd2 WHERE pd2.wid = prd_data.wid ORDER BY "date" LIMIT 6) as tot_hours
 
FROM prd_data
ORDER BY wid;

However, that will be querying the prd_data table about (distinct wid)*2+1 
times.   Don't hold your breath.

> Also, performance wise, would it be better to build a function for this
> query.  The table has 9 million records and these aggregate queries
> take hours.

It might.  Not for the summaries themselves (which are fastest as aggregates), 
but to build a table that has only 6 records per WID.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


Re: Aggregate query for multiple records

От
"Troy"
Дата:
shouldn't take that long, I would think.
You have indexes on wid and date?


Troy

> 
> Hello, I am new to the list, my apology if this question is beyond the 
> scope or charter of this list.
> 
> My questions is:
> What is the best method to perform an aggregate query to calculate 
> sum() values for each distinct wid as in the example below, but except 
> for all wid's (not just WHERE wid='01/1-6-1-30w1/0').
> 
> Also, performance wise, would it be better to build a function for this 
> query.  The table has 9 million records and these aggregate queries 
> take hours.
> 
> 
> SELECT
>    SUM(oil) as sumoil, SUM(hours) as sumhours,
> FROM
>    (SELECT oil, hours prd_data WHERE wid='01/1-6-1-30w1/0'
>     ORDER BY date LIMIT 6) subtable
> ;
> 
> 
> Table description:
>                Table "prd_data"
>   Column |         Type          | Modifiers
> --------+-----------------------+-----------
>   date   | integer               |
>   hours  | real                  |
>   oil    | real                  |
>   gas    | real                  |
>   water  | real                  |
>   pwid   | integer               |
>   wid    | character varying(20) |
>   year   | smallint              |
> Indexes: wid_index6
> 
> 
> Actual table (prd_data), 9 million records:
> 
>    date  | hours |  oil  | gas  | water | pwid |       wid       | year
> --------+-------+-------+------+-------+------+-----------------+------
>   196507 |   360 | 159.4 | 11.3 |  40.9 |  413 | 01/1-1-1-31w1/0 | 1965
>   196508 |   744 |   280 |   20 |  27.2 |  413 | 01/1-1-1-31w1/0 | 1965
>   196509 |   360 | 171.1 | 11.4 |  50.4 |  413 | 01/1-1-1-31w1/0 | 1965
>   196510 |   744 | 202.1 |   25 |  89.8 |  413 | 01/1-1-1-31w1/0 | 1965
>   196512 |   744 | 201.3 | 23.8 |  71.9 |  413 | 01/1-1-1-31w1/0 | 1965
>   196511 |   720 |   184 | 17.6 |  78.9 |  413 | 01/1-1-1-31w1/0 | 1965
>   196610 |   744 |  99.8 | 15.4 |  53.7 |  413 | 01/1-1-1-31w1/0 | 1966
>   196612 |   744 |    86 | 12.8 |  36.1 |  413 | 01/1-1-1-31w1/0 | 1966
>   196611 |   720 |    86 | 12.6 |  41.7 |  413 | 01/1-1-1-31w1/0 | 1966
>   196601 |   744 | 191.6 | 22.6 |  50.7 |  413 | 01/1-1-1-31w1/0 | 1966
>   200301 |   461 |  68.8 |   0 | 186.3 | 47899 | 9G/6-1-50-24w3/0 | 2003
>   200310 |   740 | 446.3 |   0 | 563.1 | 47899 | 9G/6-1-50-24w3/0 | 2003
>   200306 |   667 |  92.1 |   0 | 968.8 | 47899 | 9G/6-1-50-24w3/0 | 2003
>   200304 |     0 |     0 |   0 |     0 | 47899 | 9G/6-1-50-24w3/0 | 2003
>   200308 |   457 | 100.7 |   0 |  82.8 | 47899 | 9G/6-1-50-24w3/0 | 2003
>   200307 |   574 |    78 |   0 |   752 | 47899 | 9G/6-1-50-24w3/0 | 2003
>   200312 |   582 | 360.9 |   0 |   569 | 47899 | 9G/6-1-50-24w3/0 | 2003
>   200311 |   681 | 260.8 |   0 | 563.9 | 47899 | 9G/6-1-50-24w3/0 | 2003
>   200305 |   452 |     0 |   0 |     0 | 47899 | 9G/6-1-50-24w3/0 | 2003
>   200309 |   637 | 244.6 |   0 | 193.8 | 47899 | 9G/6-1-50-24w3/0 | 2003
> (20 rows)
> 
> 
> 
> Thanks,
> 
> --
> Scott A. Gerhardt, P.Geo.
> Gerhardt Information Technologies
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
> 



Re: Aggregate query for multiple records

От
Josh Berkus
Дата:
Scott,

> > SELECT wid,
> >     (SELECT SUM(oil) FROM  prd_data pd2
> >     WHERE pd2.wid = prd_data.wid ORDER BY "date" LIMIT 6) as tot_oil,
> >     (SELECT SUM(hours) FROM  prd_data pd2
> >     WHERE pd2.wid = prd_data.wid ORDER BY "date" LIMIT 6) as tot_hours
> > FROM prd_data
> > ORDER BY wid;
>
> There is only one table being queried (prd_data). What is the purpose
> of "pd2"?

It's a table alias.   There may be only one table, but to do this in a single 
query you need to reference 3 "instances" of that same table.

Hmmm ... let me make the distinction more clear:

SELECT wid,        (SELECT SUM(oil) FROM  prd_data prd2        WHERE prd2.wid = prd1.wid ORDER BY "date" LIMIT 6) as
tot_oil,      (SELECT SUM(hours) FROM  prd3        WHERE prd3.wid = prd1.wid ORDER BY "date" LIMIT 6) as tot_hours
 
FROM prd_data prd1
ORDER BY wid;

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


Re: Aggregate query for multiple records

От
Josh Berkus
Дата:
Scott,

> >> I tried your query but it doesn't seem to work.  The interpreter
> >> expects prd2.date to be grouped but this doesn't make sence, see
> >> below:
> >
> > Oh, yeah, darn it.
> >
> > Hmmm ... maybe you could explain the purpose of selecting just 6?
> > This query
> > is going to get a *lot* uglier if there isn't another way to
> > accomplish it.
>
> The purpose of selecting the first 6 is we need sum values for just the
> first 3, 6 and 12 months oil/water production for every oil well in the
> database to perform statistical calculations.  Optionally, we also need
> the _last_ 6 months of oil and water production.
> The prd_data table has total oil for each month but some wells have 500
> months of data, we just want the first 6.

Well, here's my suggestion, to make this query both easier and faster:

1)  Add a column called "month_prod", integer, to the table.  This is the 
"number of months in production".
2) Populate this query with a calculated difference on your "date" column 
against the first appearance of that WID (I'm assuming that each increment of 
"date" = 1 month)

UPDATE prd_data SET months_prod = prd_data."date" - prd2."date" + 1
FROM prd_data prd2
WHERE prd_data.wid = prd2.wid
AND prd2."date" = ( SELECT "date" FROM prd_data prd3WHERE prd3.wid = prd2.widORDER BY "date" LIMIT 1 );

(warning: the above will take a long time and floor your server.  Make sure to 
do VACUUM FULL ANALYZE prd_data afterwards.)
(if you can't get the above to complete, which is possible depending on your 
hardware, you could do it procedurally in a programmming language)

3) Then running your aggregate becomes very easy/fast:

SELECT wid, sum(oil) as tot_oil, sum(hours) as tot_hours
FROM prd_data
WHERE months_prod < 7
GROUP BY wid
ORDER BY wid;

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


Re: Aggregate query for multiple records

От
Greg Stark
Дата:
Scott Gerhardt <scott@g-it.ca> writes:

> Hello, I am new to the list, my apology if this question is beyond the scope or
> charter of this list.
> 
> My questions is:
> What is the best method to perform an aggregate query to calculate sum() values
> for each distinct wid as in the example below, but except for all wid's (not
> just WHERE wid='01/1-6-1-30w1/0').
> 
> Also, performance wise, would it be better to build a function for this query.
> The table has 9 million records and these aggregate queries take hours.

The "top n" type query (or in this case "first n" or "last n" but it's the
same thing) is actually very tricky to do in standard SQL. The best solution
seen here for postgres is to use arrays and custom aggregate functions. 

The following is based on a previous answer from Tom Lane to a similar
question. (I thought I already posted this for you on pgsql-general but the
list archives are down and you don't seem to have seen it, so I'm resending
it)

It allows you to do the whole query with a single sort for the grouping and
the ordering by date together. You would have to use it with something like:

SELECT sum_first_6(oil)  FROM (SELECT oil from prd_data ORDER BY wid, "date") GROUP BY wid

If you pump up sort_mem enough -- you can do it within the session for the
single connection using "SET sort_mem" it should be pretty fast.

I think it's the best you're going to get. If you're absolutely sure the data
is physically stored in chronological order -- which I would only feel
comfortable with if you've never done any updates or deletes, only inserts and
perhaps occasional truncates, then you might be able to get by without
ordering and convince it to do a hash aggregate. That would be the optimal
result, no sorts at all. But it would be hard to make sure it would always
work.

test=> create or replace function first_6_accum (integer[], integer) returns integer[] language sql immutable as
'selectcase when array_upper($1,1)>=6 then $1 else $1||$2 end';
 
CREATE FUNCTION

test=> create function sum_6(integer[]) returns integer immutable language sql as 'select
$1[1]+$1[2]+$1[3]+$1[4]+$1[5]+$1[6]';
CREATE FUNCTION

test=> create aggregate sum_first_6 (basetype=integer, sfunc=first_6_accum,
stype=integer[],initcond='{}',finalfunc=sum_6);
CREATE AGGREGATE

test=> select sum_first_6(i) from (select i from (select 1 as i union select 2 union select 3 union select 4 union
select5 union select 6 union select 7 union select 8) as x order by i desc) as x;sum_first_6 
 
-------------         33
(1 row)

test=> select sum_first_6(i) from (select i from (select 1 as i union select 2 union select 3 union select 4 union
select5 union select 6 union select 7 union select 8) as x order by i asc) as x;sum_first_6 
 
-------------         21
(1 row)

This can easily be switched around to make it "last_6" and you can write
functions to handle 6 records or 9 records. And all of these could be combined
in a single query, so you only have to do the sort once.

Unfortunately you cannot make aggregate functions that take multiple
parameters, nor can you pass extra parameters to the state function. So you'll
have to create a separate set of functions for each variant.

Also, you'll have to change it to use reals.


-- 
greg



Re: Aggregate query for multiple records

От
Tom Lane
Дата:
Greg Stark <gsstark@mit.edu> writes:
> [ nice example snipped ]
> ... Also, you'll have to change it to use reals.

That part, at least, can be worked around as of 7.4: use polymorphic
functions.  You can declare the functions and aggregate as working on
anyelement/anyarray, and then they will automatically work on any
datatype that has a + operator.

regression=# create or replace function first_6_accum (anyarray,anyelement) returns anyarray
regression-# language sql immutable as 'select case when array_upper($1,1)>=6 then $1 else $1||$2 end';
CREATE FUNCTION
regression=# create function sum_6(anyarray) returns anyelement immutable language sql as 'select
$1[1]+$1[2]+$1[3]+$1[4]+$1[5]+$1[6]';
CREATE FUNCTION
regression=# create aggregate sum_first_6 (basetype=anyelement, sfunc=first_6_accum,
stype=anyarray,initcond='{}',finalfunc=sum_6);
CREATE AGGREGATE
regression=# select sum_first_6(i) from (select i from (select 1 as i union select 2 union select 3 union select 4
unionselect 5 union select 6 union select 7 union select 8) as x order by i desc) as x;sum_first_6
 
-------------         33
(1 row)

regression=# select sum_first_6(i) from (select i from (select 1.1 as i union select 2 union select 3 union select 4
unionselect 5 union select 6 union select 7.7 union select 8) as x order by i desc) as x;sum_first_6
 
-------------       33.7
(1 row)

regression=#
    regards, tom lane


Re: Aggregate query for multiple records

От
Scott Gerhardt
Дата:
> Greg Stark <gsstark@mit.edu> writes:
>> [ nice example snipped ]
>> ... Also, you'll have to change it to use reals.
>
> That part, at least, can be worked around as of 7.4: use polymorphic
> functions.  You can declare the functions and aggregate as working on
> anyelement/anyarray, and then they will automatically work on any
> datatype that has a + operator.
>
> regression=# create or replace function first_6_accum  
> (anyarray,anyelement) returns anyarray
> regression-# language sql immutable as 'select case when  
> array_upper($1,1)>=6 then $1 else $1||$2 end';
> CREATE FUNCTION
> regression=# create function sum_6(anyarray) returns anyelement  
> immutable language sql as 'select  
> $1[1]+$1[2]+$1[3]+$1[4]+$1[5]+$1[6]';
> CREATE FUNCTION
> regression=# create aggregate sum_first_6 (basetype=anyelement,  
> sfunc=first_6_accum, stype=anyarray,initcond='{}',finalfunc=sum_6);
> CREATE AGGREGATE
> regression=# select sum_first_6(i) from (select i from (select 1 as i  
> union select 2 union select 3 union select 4 union select 5 union  
> select 6 union select 7 union select 8) as x order by i desc) as x;
>  sum_first_6
> -------------
>           33
> (1 row)
>
> regression=# select sum_first_6(i) from (select i from (select 1.1 as  
> i union select 2 union select 3 union select 4 union select 5 union  
> select 6 union select 7.7 union select 8) as x order by i desc) as x;
>  sum_first_6
> -------------
>         33.7
> (1 row)
>
> regression=#
>
>         regards, tom lane
>



An alternate solution I'm thinking is to add column to hold a  
"total_months" value that could be used to simplify queries and speed  
queries ( i.e. first month of oil productin = 1, second = 2 etc.)  That  
way I can use select the first 6 months by using "where < 6", or any  
month interval for that matter.

The following query, suggested by another list member (thanks Josh  
Berkus), to populate the "total_months" column sort of work but doesn't  
handle the year wrapping as it adds 88 when the year wraps (see output  
below).


UPDATE prd_data_test SET months_prod = prd_data_test."date" -  
prd2."date" + 1
FROM prd_data_test prd2
WHERE prd_data_test.wid = prd2.wid
AND prd2."date" = ( SELECT "date" FROM prd_data_test prd3WHERE prd3.wid = prd2.widORDER BY "date" LIMIT 1 );


The results are:
SEM=# select * from prd_data_test order by wid, date limit 20;  date  | hours |  oil  | gas  | water | pwid |       wid
     | year  
 
| month_prd | months_prod
--------+-------+-------+------+-------+------+-----------------+------ 
+-----------+------------- 196505 |   480 | 194.3 | 10.3 |     0 |    1 | 01/1-6-1-30w1/0 | 1965  
|           |           1 196506 |   600 | 279.4 | 13.1 |     0 |    1 | 01/1-6-1-30w1/0 | 1965  
|           |           2 196507 |   744 | 288.1 |  4.5 |     0 |    1 | 01/1-6-1-30w1/0 | 1965  
|           |           3 196508 |   720 | 234.6 |  9.4 |   2.9 |    1 | 01/1-6-1-30w1/0 | 1965  
|           |           4 196509 |   648 | 208.2 | 12.5 |     6 |    1 | 01/1-6-1-30w1/0 | 1965  
|           |           5 196510 |   744 | 209.8 | 15.3 |     0 |    1 | 01/1-6-1-30w1/0 | 1965  
|           |           6 196511 |   720 | 180.5 | 13.9 |  27.7 |    1 | 01/1-6-1-30w1/0 | 1965  
|           |           7 196512 |   744 | 227.4 | 22.8 |   5.2 |    1 | 01/1-6-1-30w1/0 | 1965  
|           |           8 196601 |   744 | 230.3 | 22.7 |    10 |    1 | 01/1-6-1-30w1/0 | 1966  
|           |          97 196602 |   672 | 173.2 | 16.5 |    17 |    1 | 01/1-6-1-30w1/0 | 1966  
|           |          98 196603 |   744 | 197.2 | 18.7 |   9.2 |    1 | 01/1-6-1-30w1/0 | 1966  
|           |          99 196604 |   720 | 168.1 | 14.1 |     3 |    1 | 01/1-6-1-30w1/0 | 1966  
|           |         100


Table description:              Table "prd_data" Column |         Type          | Modifiers
--------+-----------------------+----------- date   | integer               | hours  | real                  | oil    |
real                 | gas    | real                  | water  | real                  | pwid   | integer
|wid    | character varying(20) | year   | smallint              |
 
Indexes: wid_index6


Actual table (prd_data), 9 million records:
  date  | hours |  oil  | gas  | water | pwid |       wid       | year
--------+-------+-------+------+-------+------+-----------------+------ 196507 |   360 | 159.4 | 11.3 |  40.9 |  413 |
01/1-1-1-31w1/0| 1965 196508 |   744 |   280 |   20 |  27.2 |  413 | 01/1-1-1-31w1/0 | 1965 196509 |   360 | 171.1 |
11.4|  50.4 |  413 | 01/1-1-1-31w1/0 | 1965 196510 |   744 | 202.1 |   25 |  89.8 |  413 | 01/1-1-1-31w1/0 | 1965
196512|   744 | 201.3 | 23.8 |  71.9 |  413 | 01/1-1-1-31w1/0 | 1965 196511 |   720 |   184 | 17.6 |  78.9 |  413 |
01/1-1-1-31w1/0| 1965 196610 |   744 |  99.8 | 15.4 |  53.7 |  413 | 01/1-1-1-31w1/0 | 1966 196612 |   744 |    86 |
12.8|  36.1 |  413 | 01/1-1-1-31w1/0 | 1966 196611 |   720 |    86 | 12.6 |  41.7 |  413 | 01/1-1-1-31w1/0 | 1966
196601|   744 | 191.6 | 22.6 |  50.7 |  413 | 01/1-1-1-31w1/0 | 1966 200301 |   461 |  68.8 |   0 | 186.3 | 47899 |
9G/6-1-50-24w3/0| 2003 200310 |   740 | 446.3 |   0 | 563.1 | 47899 | 9G/6-1-50-24w3/0 | 2003 200306 |   667 |  92.1 |
0 | 968.8 | 47899 | 9G/6-1-50-24w3/0 | 2003 200304 |     0 |     0 |   0 |     0 | 47899 | 9G/6-1-50-24w3/0 | 2003
200308|   457 | 100.7 |   0 |  82.8 | 47899 | 9G/6-1-50-24w3/0 | 2003 200307 |   574 |    78 |   0 |   752 | 47899 |
9G/6-1-50-24w3/0| 2003 200312 |   582 | 360.9 |   0 |   569 | 47899 | 9G/6-1-50-24w3/0 | 2003 200311 |   681 | 260.8 |
0 | 563.9 | 47899 | 9G/6-1-50-24w3/0 | 2003 200305 |   452 |     0 |   0 |     0 | 47899 | 9G/6-1-50-24w3/0 | 2003
200309|   637 | 244.6 |   0 | 193.8 | 47899 | 9G/6-1-50-24w3/0 | 2003
 


Thanks,

--
Scott A. Gerhardt, P.Geo.
Gerhardt Information Technologies






Re: Aggregate query for multiple records

От
Josh Berkus
Дата:
Scott,

> Unfortunately, your revised query works like a charm except for the
> fact that prd_data."date" - prd2."date" + 1 give incorrect values when
> the year wraps, see in the output below.  Need to conditionally
> subtract 88 from the date or use an incrementing count() function
> instead of date math to get the correct values.

Oh, that column is text, not a serial number.   Well, it's your fault for 
using wierd custom data types; you figure it out.

> I'm also wondering if using the date functions of Postgres would be
> helpful since the date column does have the YYYY-year and MM-month
> parts.

Well, you'd want to convert the column to a timestamp, and then you could 
compute months.   Or you could break it in seperate integer "year" and 
"month" columns and do the same thing.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco