Обсуждение: Oracle Analytical Functions

Поиск

Список

Период

Сортировка

Oracle Analytical Functions

От

Willem Buitendyk

Дата:

30 января 2008 г., 20:15:30

I'm trying to replicate the use of Oracle's 'lag' and 'over partition
by' analytical functions in my query.  I have a table (all_client_times)
such as:

client_id, datetime
122, 2007-05-01 12:00:00
122, 2007-05-01 12:01:00
455, 2007-05-01 12:02:00
455, 2007-05-01 12:03:00
455, 2007-05-01 12:08:00
299, 2007-05-01 12:10:00
299, 2007-05-01 12:34:00

and I would like to create a new view that takes the first table and
calculates the time difference in minutes between each row so that the
result is something like:

client_id,datetime, previousTime, difftime
122,2007-05-01 12:01:00, 2007-05-01 12:00:00, 1
455,2007-05-01 12:03:00, 2007-05-01 12:02:00, 1
455,2007-05-01 12:08:00, 2007-05-01 12:03:00, 5
299,2007-05-01 12:34:00, 2007-05-01 12:10:00, 24

In Oracle I can achieve this with:

 CREATE OR REPLACE VIEW client_time_diffs AS SELECT client_id,datetime,
LAG(datetime, 1) OVER (partition by client_id ORDER BY
client_id,datetime) AS previoustime from all_client_times;

Any idea how I could replicate this in SQL from PG.  Would this be an
easy thing to do in Pl/pgSQL?  If so could anyone give any directions as
to where to start?

Appreciate the help,

Willem

Re: Oracle Analytical Functions

От

"Dann Corbit"

Дата:

30 января 2008 г., 20:34:07

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Willem Buitendyk
> Sent: Wednesday, January 30, 2008 1:15 PM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Oracle Analytical Functions
>
> I'm trying to replicate the use of Oracle's 'lag' and 'over partition
> by' analytical functions in my query.  I have a table
(all_client_times)
> such as:
>
> client_id, datetime
> 122, 2007-05-01 12:00:00
> 122, 2007-05-01 12:01:00
> 455, 2007-05-01 12:02:00
> 455, 2007-05-01 12:03:00
> 455, 2007-05-01 12:08:00
> 299, 2007-05-01 12:10:00
> 299, 2007-05-01 12:34:00
>
> and I would like to create a new view that takes the first table and
> calculates the time difference in minutes between each row so that the
> result is something like:
>
> client_id,datetime, previousTime, difftime
> 122,2007-05-01 12:01:00, 2007-05-01 12:00:00, 1
> 455,2007-05-01 12:03:00, 2007-05-01 12:02:00, 1
> 455,2007-05-01 12:08:00, 2007-05-01 12:03:00, 5
> 299,2007-05-01 12:34:00, 2007-05-01 12:10:00, 24
>
> In Oracle I can achieve this with:
>
>  CREATE OR REPLACE VIEW client_time_diffs AS SELECT
client_id,datetime,
> LAG(datetime, 1) OVER (partition by client_id ORDER BY
> client_id,datetime) AS previoustime from all_client_times;
>
> Any idea how I could replicate this in SQL from PG.  Would this be an
> easy thing to do in Pl/pgSQL?  If so could anyone give any directions
as
> to where to start?

You could certainly create a cursor and then just use age() or other
time difference extraction method as appropriate:
http://www.postgresql.org/docs/8.2/static/sql-declare.html
http://www.postgresql.org/docs/8.2/static/functions-datetime.html

Re: Oracle Analytical Functions

От

"Adam Rich"

Дата:

30 января 2008 г., 20:47:33

> and I would like to create a new view that takes the first table and
> calculates the time difference in minutes between each row so that the
> result is something like:
>
> client_id,datetime, previousTime, difftime
> 122,2007-05-01 12:01:00, 2007-05-01 12:00:00, 1
> 455,2007-05-01 12:03:00, 2007-05-01 12:02:00, 1
> 455,2007-05-01 12:08:00, 2007-05-01 12:03:00, 5
> 299,2007-05-01 12:34:00, 2007-05-01 12:10:00, 24
>
> Any idea how I could replicate this in SQL from PG.  Would this be an
> easy thing to do in Pl/pgSQL?  If so could anyone give any directions
> as to where to start?

You can create a set-returning function, that cursors over the table,
like this:


CREATE OR REPLACE FUNCTION lagfunc(
    OUT client_id INT,
    OUT datetime timestamp,
    OUT previousTime timestamp,
    OUT difftime interval)
RETURNS SETOF RECORD as $$
DECLARE
    thisrow RECORD;
    last_client_id INT;
    last_datetime timestamp;
BEGIN

    FOR thisrow IN SELECT * FROM all_client_times ORDER BY client_id,
datetime LOOP
    IF thisrow.client_id = last_client_id THEN
        client_id := thisrow.datetime;
        datetime := thisrow.datetime;
        previousTime := last_datetime;
        difftime = datetime-previousTime;
        RETURN NEXT;
    END IF;
    last_client_id := thisrow.client_id;
    last_datetime := thisrow.datetime;
    END LOOP;

   RETURN;
END;
$$ LANGUAGE plpgsql;

select * from lagfunc() limit 10;
select * from lagfunc() where client_id = 455;


Here I used an interval, but you get the idea.

Re: Oracle Analytical Functions

От

Lewis Cunningham

Дата:

30 января 2008 г., 20:58:25

How about something like this:

SELECT
    client_id
    , datetime
    , lagged as previoustime
    , datetime - lagged difftime
FROM (
  SELECT
    client_id
    ,datetime
    ,(SELECT MAX(datetime)
        FROM all_client_times def
        WHERE def.client_id = abc.client_id
          AND def.datetime < abc.datetime) as lagged
    FROM all_client_times abc
)
WHERE lagged is not null

If you have records with no previous data or multiple rows, you'll
need to play with this to get it to work but it should point in the
right direction.

Hope that helps,

LewisC


--- Willem Buitendyk <willem@pcfish.ca> wrote:

> I'm trying to replicate the use of Oracle's 'lag' and 'over
> partition
> by' analytical functions in my query.  I have a table
> (all_client_times)
> such as:
>
> client_id, datetime
> 122, 2007-05-01 12:00:00
> 122, 2007-05-01 12:01:00
> 455, 2007-05-01 12:02:00
> 455, 2007-05-01 12:03:00
> 455, 2007-05-01 12:08:00
> 299, 2007-05-01 12:10:00
> 299, 2007-05-01 12:34:00
>
> and I would like to create a new view that takes the first table
> and
> calculates the time difference in minutes between each row so that
> the
> result is something like:
>
> client_id,datetime, previousTime, difftime
> 122,2007-05-01 12:01:00, 2007-05-01 12:00:00, 1
> 455,2007-05-01 12:03:00, 2007-05-01 12:02:00, 1
> 455,2007-05-01 12:08:00, 2007-05-01 12:03:00, 5
> 299,2007-05-01 12:34:00, 2007-05-01 12:10:00, 24
>
> In Oracle I can achieve this with:
>
>  CREATE OR REPLACE VIEW client_time_diffs AS SELECT
> client_id,datetime,
> LAG(datetime, 1) OVER (partition by client_id ORDER BY
> client_id,datetime) AS previoustime from all_client_times;
>
> Any idea how I could replicate this in SQL from PG.  Would this be
> an
> easy thing to do in Pl/pgSQL?  If so could anyone give any
> directions as
> to where to start?
>
> Appreciate the help,
>
> Willem
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire
> to
>        choose an index scan if your joining column's datatypes do
> not
>        match
>



Lewis R Cunningham

An Expert's Guide to Oracle Technology
http://blogs.ittoolbox.com/oracle/guide/

LewisC's Random Thoughts
http://lewiscsrandomthoughts.blogspot.com/

Re: Oracle Analytical Functions

От

Reece Hart

Дата:

30 января 2008 г., 21:29:28

create table data (
    client_id integer,
    datetime timestamp not null
    );
create index data_client_id on data(client_id);

copy data from STDIN DELIMITER ',';
122,2007-05-01 12:00:00
122,2007-05-01 12:01:00
455,2007-05-01 12:02:00
455,2007-05-01 12:03:00
455,2007-05-01 12:08:00
299,2007-05-01 12:10:00
299,2007-05-01 12:34:00
\.

CREATE OR REPLACE FUNCTION visits (
    OUT client_id INTEGER,
    OUT datetime_1 TIMESTAMP,
    OUT datetime_2 TIMESTAMP,
    OUT dur INTERVAL )
RETURNS SETOF RECORD
LANGUAGE plpgsql
AS $_$
DECLARE
    rp data%ROWTYPE;            -- previous data table record
    r data%ROWTYPE;             -- data table record, more recent than
rp
BEGIN
    rp = (NULL,NULL);
    FOR r IN SELECT * FROM data ORDER BY client_id,datetime LOOP
       IF rp.client_id = r.client_id THEN
          client_id = r.client_id;
          datetime_1 = r.datetime;
          datetime_2 = rp.datetime;
          dur = r.datetime-rp.datetime;
          RETURN NEXT;
        END IF;
        rp = r;
    END LOOP;
    RETURN;
END;
$_$;


rkh@rkh=> select * from visits() order by client_id,datetime_1;
 client_id |     datetime_1      |     datetime_2      |   dur
-----------+---------------------+---------------------+----------
       122 | 2007-05-01 12:01:00 | 2007-05-01 12:00:00 | 00:01:00
       299 | 2007-05-01 12:34:00 | 2007-05-01 12:10:00 | 00:24:00
       455 | 2007-05-01 12:03:00 | 2007-05-01 12:02:00 | 00:01:00
       455 | 2007-05-01 12:08:00 | 2007-05-01 12:03:00 | 00:05:00
(4 rows)


-Reece

--
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0

Вложения

visits.sql

Re: Oracle Analytical Functions

От

Willem Buitendyk

Дата:

30 января 2008 г., 22:38:57

I tried this function but it keeps returning an error such as:

ERROR: invalid input syntax for integer: "2007-05-05 00:34:08"
SQL state: 22P02
Context: PL/pgSQL function "lagfunc" line 10 at assignment

I checked and there are no datetime values in the client_id field
anywhere in my table 'all_client_times'

I have no idea what is going on here ...

Thanks for the code though - it has taught me a lot all ready; such as
using, OUT and SETOF Record

Willem

Adam Rich wrote:
>> and I would like to create a new view that takes the first table and
>> calculates the time difference in minutes between each row so that the
>> result is something like:
>>
>> client_id,datetime, previousTime, difftime
>> 122,2007-05-01 12:01:00, 2007-05-01 12:00:00, 1
>> 455,2007-05-01 12:03:00, 2007-05-01 12:02:00, 1
>> 455,2007-05-01 12:08:00, 2007-05-01 12:03:00, 5
>> 299,2007-05-01 12:34:00, 2007-05-01 12:10:00, 24
>>
>> Any idea how I could replicate this in SQL from PG.  Would this be an
>> easy thing to do in Pl/pgSQL?  If so could anyone give any directions
>> as to where to start?
>>
>
> You can create a set-returning function, that cursors over the table,
> like this:
>
>
> CREATE OR REPLACE FUNCTION lagfunc(
>     OUT client_id INT,
>     OUT datetime timestamp,
>     OUT previousTime timestamp,
>     OUT difftime interval)
> RETURNS SETOF RECORD as $$
> DECLARE
>     thisrow RECORD;
>     last_client_id INT;
>     last_datetime timestamp;
> BEGIN
>
>     FOR thisrow IN SELECT * FROM all_client_times ORDER BY client_id,
> datetime LOOP
>     IF thisrow.client_id = last_client_id THEN
>         client_id := thisrow.datetime;
>         datetime := thisrow.datetime;
>         previousTime := last_datetime;
>         difftime = datetime-previousTime;
>         RETURN NEXT;
>     END IF;
>     last_client_id := thisrow.client_id;
>     last_datetime := thisrow.datetime;
>     END LOOP;
>
>    RETURN;
> END;
> $$ LANGUAGE plpgsql;
>
> select * from lagfunc() limit 10;
> select * from lagfunc() where client_id = 455;
>
>
> Here I used an interval, but you get the idea.
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>
>

Re: Oracle Analytical Functions

От

Willem Buitendyk

Дата:

30 января 2008 г., 22:59:44

Thanks Reece,

I got this to work for me.  The only problem was with the ORDER BY
clause which did not seem to work properly.  I took it out and instead
used a sorted view for the data table.

Cheers,

Willem

Reece Hart wrote:
> create table data (
>     client_id integer,
>     datetime timestamp not null
>     );
> create index data_client_id on data(client_id);
>
> copy data from STDIN DELIMITER ',';
> 122,2007-05-01 12:00:00
> 122,2007-05-01 12:01:00
> 455,2007-05-01 12:02:00
> 455,2007-05-01 12:03:00
> 455,2007-05-01 12:08:00
> 299,2007-05-01 12:10:00
> 299,2007-05-01 12:34:00
> \.
>
> CREATE OR REPLACE FUNCTION visits (
>     OUT client_id INTEGER,
>     OUT datetime_1 TIMESTAMP,
>     OUT datetime_2 TIMESTAMP,
>     OUT dur INTERVAL )
> RETURNS SETOF RECORD
> LANGUAGE plpgsql
> AS $_$
> DECLARE
>     rp data%ROWTYPE;            -- previous data table record
>     r data%ROWTYPE;             -- data table record, more recent than
> rp
> BEGIN
>     rp = (NULL,NULL);
>     FOR r IN SELECT * FROM data ORDER BY client_id,datetime LOOP
>        IF rp.client_id = r.client_id THEN
>           client_id = r.client_id;
>           datetime_1 = r.datetime;
>           datetime_2 = rp.datetime;
>           dur = r.datetime-rp.datetime;
>           RETURN NEXT;
>         END IF;
>         rp = r;
>     END LOOP;
>     RETURN;
> END;
> $_$;
>
>
> rkh@rkh=> select * from visits() order by client_id,datetime_1;
>  client_id |     datetime_1      |     datetime_2      |   dur
> -----------+---------------------+---------------------+----------
>        122 | 2007-05-01 12:01:00 | 2007-05-01 12:00:00 | 00:01:00
>        299 | 2007-05-01 12:34:00 | 2007-05-01 12:10:00 | 00:24:00
>        455 | 2007-05-01 12:03:00 | 2007-05-01 12:02:00 | 00:01:00
>        455 | 2007-05-01 12:08:00 | 2007-05-01 12:03:00 | 00:05:00
> (4 rows)
>
>
> -Reece
>
>
> ------------------------------------------------------------------------
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

Re: Oracle Analytical Functions

От

Willem Buitendyk

Дата:

30 января 2008 г., 23:06:08

Found the error:

client_id := thisrow.datetime;

should be

client_id := thisrow.client_id;

All works well now,

Thanks very much,

Willem

Willem Buitendyk wrote:
> I tried this function but it keeps returning an error such as:
>
> ERROR: invalid input syntax for integer: "2007-05-05 00:34:08"
> SQL state: 22P02
> Context: PL/pgSQL function "lagfunc" line 10 at assignment
>
> I checked and there are no datetime values in the client_id field
> anywhere in my table 'all_client_times'
>
> I have no idea what is going on here ...
>
> Thanks for the code though - it has taught me a lot all ready; such as
> using, OUT and SETOF Record
>
> Willem
>
> Adam Rich wrote:
>>> and I would like to create a new view that takes the first table and
>>> calculates the time difference in minutes between each row so that the
>>> result is something like:
>>>
>>> client_id,datetime, previousTime, difftime
>>> 122,2007-05-01 12:01:00, 2007-05-01 12:00:00, 1
>>> 455,2007-05-01 12:03:00, 2007-05-01 12:02:00, 1
>>> 455,2007-05-01 12:08:00, 2007-05-01 12:03:00, 5
>>> 299,2007-05-01 12:34:00, 2007-05-01 12:10:00, 24
>>>
>>> Any idea how I could replicate this in SQL from PG.  Would this be an
>>> easy thing to do in Pl/pgSQL?  If so could anyone give any directions
>>> as to where to start?
>>>
>>
>> You can create a set-returning function, that cursors over the table,
>> like this:
>>
>>
>> CREATE OR REPLACE FUNCTION lagfunc(
>>     OUT client_id INT,     OUT datetime timestamp,     OUT
>> previousTime timestamp,     OUT difftime interval)
>> RETURNS SETOF RECORD as $$ DECLARE
>>     thisrow RECORD;
>>     last_client_id INT;
>>     last_datetime timestamp;
>> BEGIN
>>
>>     FOR thisrow IN SELECT * FROM all_client_times ORDER BY client_id,
>> datetime LOOP
>>     IF thisrow.client_id = last_client_id THEN
>>         client_id := thisrow.datetime;
>>         datetime := thisrow.datetime;
>>         previousTime := last_datetime;
>>         difftime = datetime-previousTime;
>>         RETURN NEXT;
>>     END IF;
>>     last_client_id := thisrow.client_id;
>>     last_datetime := thisrow.datetime;
>>     END LOOP;
>>
>>    RETURN;
>> END;
>> $$ LANGUAGE plpgsql;
>>
>> select * from lagfunc() limit 10;
>> select * from lagfunc() where client_id = 455;
>>
>>
>> Here I used an interval, but you get the idea.
>>
>>
>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 6: explain analyze is your friend
>>
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Re: Oracle Analytical Functions

От

"Adam Rich"

Дата:

30 января 2008 г., 23:13:15

> I tried this function but it keeps returning an error such as:
>
> ERROR: invalid input syntax for integer: "2007-05-05 00:34:08"
> SQL state: 22P02
> Context: PL/pgSQL function "lagfunc" line 10 at assignment

Whoops, this line:

> >         client_id := thisrow.datetime;

Should be:

client_id := thisrow.client_id;

Re: Oracle Analytical Functions

От

Enrico Sirola

Дата:

31 января 2008 г., 12:49:58

Hi Willem,

Il giorno 30/gen/08, alle ore 22:15, Willem Buitendyk ha scritto:

> I'm trying to replicate the use of Oracle's 'lag' and 'over
> partition by' analytical functions in my query.  I have a table
> (all_client_times) such as:
>
> client_id, datetime
> 122, 2007-05-01 12:00:00
> 122, 2007-05-01 12:01:00
> 455, 2007-05-01 12:02:00
> 455, 2007-05-01 12:03:00
> 455, 2007-05-01 12:08:00
> 299, 2007-05-01 12:10:00
> 299, 2007-05-01 12:34:00
>
> and I would like to create a new view that takes the first table and
> calculates the time difference in minutes between each row so that
> the result is something like:
>
> client_id,datetime, previousTime, difftime
> 122,2007-05-01 12:01:00, 2007-05-01 12:00:00, 1
> 455,2007-05-01 12:03:00, 2007-05-01 12:02:00, 1
> 455,2007-05-01 12:08:00, 2007-05-01 12:03:00, 5
> 299,2007-05-01 12:34:00, 2007-05-01 12:10:00, 24

I'd create a "previousTime" column and manage it using a trigger.
Anyway, it depends on the time-dependancy of the table
Then you can perform "temporal" in a much easier way.
You could be interested in taking a look at the following link

http://www.cs.arizona.edu/~rts/tdbbook.pdf

Cheers,
e.

Re: Oracle Analytical Functions

От

"Adam Rich"

Дата:

31 января 2008 г., 14:16:16

> > I'm trying to replicate the use of Oracle's 'lag' and 'over
> > partition by' analytical functions in my query.  I have a table
> > (all_client_times) such as:
> > and I would like to create a new view that takes the first table and
> > calculates the time difference in minutes between each row so that
> > the result is something like:

I thought of a another way of doing this.  In my tests, it's a little
faster, too.

DROP SEQUENCE if exists seq1;
DROP SEQUENCE if exists seq2;
CREATE TEMPORARY SEQUENCE seq1 CACHE 1000;
CREATE TEMPORARY SEQUENCE seq2 CACHE 1000;

select a.client_id, b.datetime, a.datetime as previousTime, (b.datetime -
a.datetime) as difftime from
(select nextval('seq1') as s, client_id, datetime from all_client_times
order by client_id, datetime OFFSET 0) as a
inner join
(select nextval('seq2') as s, client_id, datetime from all_client_times
order by client_id, datetime OFFSET 0) as b
on a.s=(b.s-1) where a.client_id=b.client_id

Re: Oracle Analytical Functions

От

Enrico Sirola

Дата:

31 января 2008 г., 15:47:16

Hi Adam,

Il giorno 31/gen/08, alle ore 16:13, Adam Rich ha scritto:

>>> I'm trying to replicate the use of Oracle's 'lag' and 'over
>>> partition by' analytical functions in my query.  I have a table
>>> (all_client_times) such as:
>>> and I would like to create a new view that takes the first table and
>>> calculates the time difference in minutes between each row so that
>>> the result is something like:
>
> I thought of a another way of doing this.  In my tests, it's a little
> faster, too.
>
> DROP SEQUENCE if exists seq1;
> DROP SEQUENCE if exists seq2;
> CREATE TEMPORARY SEQUENCE seq1 CACHE 1000;
> CREATE TEMPORARY SEQUENCE seq2 CACHE 1000;
>
> select a.client_id, b.datetime, a.datetime as previousTime,
> (b.datetime -
> a.datetime) as difftime from
> (select nextval('seq1') as s, client_id, datetime from
> all_client_times
> order by client_id, datetime OFFSET 0) as a
> inner join
> (select nextval('seq2') as s, client_id, datetime from
> all_client_times
> order by client_id, datetime OFFSET 0) as b
> on a.s=(b.s-1) where a.client_id=b.client_id

very interesting indeed. I guess this strategy is more interesting than
the trigger (or rule) based one when you perform much more inserts on
the table that
the select you proposed above. It sounds strange that the select
proposed
is faster than a (single) select on the same table with an additional
previousTime
column populated via trigger/rule.
Bye,
e.

Re: Oracle Analytical Functions

От

Willem Buitendyk

Дата:

31 января 2008 г., 17:38:23

Hey Adam,

I tried your sequence method this morning on an unsorted table and for
some reason the order by's aren't working.  If I create a sorted view
(client_id, datetime) on the 'all_client_times' table and then use that
view with your sequence method all works fine.  The strange thing is
that my table which has about 750K rows only ends up returning 658 rows
with your sequence method using the unsorted table.  In fact, when I
tried the same thing with the lagfunc() method you wrote earlier on an
unsorted table the same thing occurs - only returning 658 rows instead
of the 750K. Again, all works well with lagfunc() if I use it on a
sorted view and I remove the order by in the function.  This is not too
much of a problem as I can use a sorted view first but I don't
understand why this is happening.  Perhaps this is a bug?

As well, I am finding that the lagfunc() is consistently faster than the
sequence method.

cheers,

Willem

Adam Rich wrote:
>>> I'm trying to replicate the use of Oracle's 'lag' and 'over
>>> partition by' analytical functions in my query.  I have a table
>>> (all_client_times) such as:
>>> and I would like to create a new view that takes the first table and
>>> calculates the time difference in minutes between each row so that
>>> the result is something like:
>>>
>
> I thought of a another way of doing this.  In my tests, it's a little
> faster, too.
>
> DROP SEQUENCE if exists seq1;
> DROP SEQUENCE if exists seq2;
> CREATE TEMPORARY SEQUENCE seq1 CACHE 1000;
> CREATE TEMPORARY SEQUENCE seq2 CACHE 1000;
>
> select a.client_id, b.datetime, a.datetime as previousTime, (b.datetime -
> a.datetime) as difftime from
> (select nextval('seq1') as s, client_id, datetime from all_client_times
> order by client_id, datetime OFFSET 0) as a
> inner join
> (select nextval('seq2') as s, client_id, datetime from all_client_times
> order by client_id, datetime OFFSET 0) as b
> on a.s=(b.s-1) where a.client_id=b.client_id
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>
>

Re: Oracle Analytical Functions

От

"Adam Rich"

Дата:

31 января 2008 г., 17:55:40

Hi Willem,

> for some reason the order by's aren't working.

Could you provide more details?  Do you get a specific error message?

> only returning 658 rows instead of the 750K.

You should not expect the same row count in both source table and
result set.  Even in your example -- you provided 8 source rows, and
4 result rows.  You can determine the correct number of results via
"the number of records, related to client_ids having two or more records
in all_client_times, minus one".  It may be true that you have 750k
records but only 658 rows that satisfy this requirement.

What do you get for this query?

select count(*) from ( select client_id, count(*) as rows
from all_client_times group by client_id having count(*) > 1 ) as x


Adam

Re: Oracle Analytical Functions

От

Willem Buitendyk

Дата:

31 января 2008 г., 18:31:57

The 'all_client_times' table has 753698 rows.  The lagfunc() on the
sorted view returns 753576 rows and appears to work exactly as needed.
Using the function on an unsorted table returns only 686 rows and is
missing a whole lot of data.  Running the count query returns 122 -
which is correct as the amount of clients that I have.  Each client has
between 5 - 7K  records each.

The way I see it is for each client there will be one row, namely, the
first in the series, that will not be included in the final results as
it would not have a previous time.  With that in mind, if I take my
table row count as 753698 and minus the amount of clients I have, 122,
then I should get the number of results as 753576 which is correct when
I use your methods on a sorted table but which is not correct when I
use your methods on an unsorted table.

willem

Adam Rich wrote:
> Hi Willem,
>
>
>> for some reason the order by's aren't working.
>>
>
> Could you provide more details?  Do you get a specific error message?
>
>
>> only returning 658 rows instead of the 750K.
>>
>
> You should not expect the same row count in both source table and
> result set.  Even in your example -- you provided 8 source rows, and
> 4 result rows.  You can determine the correct number of results via
> "the number of records, related to client_ids having two or more records
> in all_client_times, minus one".  It may be true that you have 750k
> records but only 658 rows that satisfy this requirement.
>
> What do you get for this query?
>
> select count(*) from ( select client_id, count(*) as rows
> from all_client_times group by client_id having count(*) > 1 ) as x
>
>
> Adam
>
>
>
>
>

Re: Oracle Analytical Functions

От

"Adam Rich"

Дата:

01 февраля 2008 г., 00:16:13

Ah, ok.  I see what's happening.  The data is retrieved from the tables,
and the sequence values are added, PRIOR to the order by, so that after
the order by, they are no longer sorted.  (The same thing can happen
in Oracle with ROWNUM).

You can go the sorted view route, or just an inline view, like this:

select a.client_id, b.arbnum, a.arbnum as previousarbnum, (b.arbnum -
a.arbnum) as diffarbnum from
(select nextval('seq1') as s, * from (select client_id, arbnum from arb_test
order by client_id, arbnum OFFSET 0) as y OFFSET 0) as a
inner join (select nextval('seq2') as s, * from (select client_id, arbnum
from arb_test order by client_id, arbnum OFFSET 0)as z OFFSET 0) as b
on a.s=(b.s-1) where a.client_id=b.client_id;



> -----Original Message-----
> From: Willem Buitendyk [mailto:willem@pcfish.ca]
> Sent: Thursday, January 31, 2008 2:48 PM
> To: Adam Rich
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Oracle Analytical Functions
>
> Here is a little test example.  It seems that the second order by
> condition is not working - in this case datetime.
>
> create table arb_test (
> client_id integer,
> arbnum integer);
>
> insert into arb_test values (2,1);
> insert into arb_test values (2,33);
> insert into arb_test values (2,6);
> insert into arb_test values (2,76);
> insert into arb_test values (2,111);
> insert into arb_test values (2,10);
> insert into arb_test values (2,55);
> insert into arb_test values (7,12);
> insert into arb_test values (7,6);
> insert into arb_test values (7,144);
> insert into arb_test values (7,63);
> insert into arb_test values (7,87);
> insert into arb_test values (7,24);
> insert into arb_test values (7,22);
> insert into arb_test values (1,14);
> insert into arb_test values (1,23);
> insert into arb_test values (1,67);
> insert into arb_test values (1,90);
> insert into arb_test values (1,2);
> insert into arb_test values (1,5);
> insert into arb_test values (5,8);
> insert into arb_test values (5,42);
> insert into arb_test values (5,77);
> insert into arb_test values (5,9);
> insert into arb_test values (5,89);
> insert into arb_test values (5,23);
> insert into arb_test values (5,11);
>
> DROP SEQUENCE if exists seq1;
> DROP SEQUENCE if exists seq2;
> CREATE TEMPORARY SEQUENCE seq1 CACHE 1000;
> CREATE TEMPORARY SEQUENCE seq2 CACHE 1000;
>
> select a.client_id, b.arbnum, a.arbnum as previousarbnum, (b.arbnum -
> a.arbnum) as diffarbnum from
> (select nextval('seq1') as s, client_id, arbnum from arb_test
> order by client_id, arbnum OFFSET 0) as a
> inner join
> (select nextval('seq2') as s, client_id, arbnum from arb_test
> order by client_id, arbnum OFFSET 0) as b
> on a.s=(b.s-1) where a.client_id=b.client_id;
>
> --create or replace view arb_view as select * from arb_test order by
> client_id, arbnum;
>
> Here are the results:
>
>  client_id | arbnum | previousarbnum | diffarbnum
> -----------+--------+----------------+------------
>          1 |     23 |             14 |          9
>          1 |     67 |             23 |         44
>          1 |     90 |             67 |         23
>          1 |      2 |             90 |        -88
>          1 |      5 |              2 |          3
>          2 |     33 |              1 |         32
>          2 |      6 |             33 |        -27
>          2 |     76 |              6 |         70
>          2 |    111 |             76 |         35
>          2 |     10 |            111 |       -101
>          2 |     55 |             10 |         45
>          5 |     42 |              8 |         34
>          5 |     77 |             42 |         35
>          5 |      9 |             77 |        -68
>          5 |     89 |              9 |         80
>          5 |     23 |             89 |        -66
>          5 |     11 |             23 |        -12
>          7 |      6 |             12 |         -6
>          7 |    144 |              6 |        138
>          7 |     63 |            144 |        -81
>          7 |     87 |             63 |         24
>          7 |     24 |             87 |        -63
>
> When I used a sorted view:
>
> create or replace view arb_view as select * from arb_test order by
> client_id, arbnum;
>
> and redid it the results are:
>
>  client_id | arbnum | previousarbnum | diffarbnum
> -----------+--------+----------------+------------
>          1 |      5 |              2 |          3
>          1 |     14 |              5 |          9
>          1 |     23 |             14 |          9
>          1 |     67 |             23 |         44
>          1 |     90 |             67 |         23
>          2 |      6 |              1 |          5
>          2 |     10 |              6 |          4
>          2 |     33 |             10 |         23
>          2 |     55 |             33 |         22
>          2 |     76 |             55 |         21
>          2 |    111 |             76 |         35
>          5 |      9 |              8 |          1
>          5 |     11 |              9 |          2
>          5 |     23 |             11 |         12
>          5 |     42 |             23 |         19
>          5 |     77 |             42 |         35
>          5 |     89 |             77 |         12
>          7 |     12 |              6 |          6
>          7 |     22 |             12 |         10
>          7 |     24 |             22 |          2
>          7 |     63 |             24 |         39
>          7 |     87 |             63 |         24
>          7 |    144 |             87 |         57
> (23 rows)
>
>
> This works the way it should.
>
> --drop table arb_test;
> --drop view arb_view;
>
> willem
> > The 'all_client_times' table has 753698 rows.  The lagfunc() on the
> > sorted view returns 753576 rows and appears to work exactly as
> > needed.  Using the function on an unsorted table returns only 686
> rows
> > and is missing a whole lot of data.  Running the count query returns
> > 122 - which is correct as the amount of clients that I have.  Each
> > client has between 5 - 7K  records each.
> >
> > The way I see it is for each client there will be one row, namely,
> the
> > first in the series, that will not be included in the final results
> as
> > it would not have a previous time.  With that in mind, if I take my
> > table row count as 753698 and minus the amount of clients I have,
> > 122,  then I should get the number of results as 753576 which is
> > correct when I use your methods on a sorted table but which is not
> > correct when I  use your methods on an unsorted table.
> >
> > willem
> >
> > Adam Rich wrote:
> >> Hi Willem,
> >>
> >>
> >>> for some reason the order by's aren't working.
> >>>
> >>
> >> Could you provide more details?  Do you get a specific error
> message?
> >>
> >>
> >>> only returning 658 rows instead of the 750K.
> >>>
> >>
> >> You should not expect the same row count in both source table and
> >> result set.  Even in your example -- you provided 8 source rows, and
> >> 4 result rows.  You can determine the correct number of results via
> >> "the number of records, related to client_ids having two or more
> >> records in all_client_times, minus one".  It may be true that you
> >> have 750k
> >> records but only 658 rows that satisfy this requirement.
> >>
> >> What do you get for this query?
> >>
> >> select count(*) from ( select client_id, count(*) as rows from
> >> all_client_times group by client_id having count(*) > 1 ) as x
> >>
> >>
> >> Adam
> >>
> >>
> >>
> >>
> >>
> >
> >
> > ---------------------------(end of broadcast)------------------------
> ---
> > TIP 4: Have you searched our list archives?
> >
> >               http://archives.postgresql.org/
> >

Re: Oracle Analytical Functions

От

Willem Buitendyk

Дата:

01 февраля 2008 г., 00:57:26

Here is a little test example.  It seems that the second order by
condition is not working - in this case datetime.

create table arb_test (
client_id integer,
arbnum integer);

insert into arb_test values (2,1);
insert into arb_test values (2,33);
insert into arb_test values (2,6);
insert into arb_test values (2,76);
insert into arb_test values (2,111);
insert into arb_test values (2,10);
insert into arb_test values (2,55);
insert into arb_test values (7,12);
insert into arb_test values (7,6);
insert into arb_test values (7,144);
insert into arb_test values (7,63);
insert into arb_test values (7,87);
insert into arb_test values (7,24);
insert into arb_test values (7,22);
insert into arb_test values (1,14);
insert into arb_test values (1,23);
insert into arb_test values (1,67);
insert into arb_test values (1,90);
insert into arb_test values (1,2);
insert into arb_test values (1,5);
insert into arb_test values (5,8);
insert into arb_test values (5,42);
insert into arb_test values (5,77);
insert into arb_test values (5,9);
insert into arb_test values (5,89);
insert into arb_test values (5,23);
insert into arb_test values (5,11);

DROP SEQUENCE if exists seq1;
DROP SEQUENCE if exists seq2;
CREATE TEMPORARY SEQUENCE seq1 CACHE 1000;
CREATE TEMPORARY SEQUENCE seq2 CACHE 1000;

select a.client_id, b.arbnum, a.arbnum as previousarbnum, (b.arbnum -
a.arbnum) as diffarbnum from
(select nextval('seq1') as s, client_id, arbnum from arb_test
order by client_id, arbnum OFFSET 0) as a
inner join
(select nextval('seq2') as s, client_id, arbnum from arb_test
order by client_id, arbnum OFFSET 0) as b
on a.s=(b.s-1) where a.client_id=b.client_id;

--create or replace view arb_view as select * from arb_test order by
client_id, arbnum;

Here are the results:

 client_id | arbnum | previousarbnum | diffarbnum
-----------+--------+----------------+------------
         1 |     23 |             14 |          9
         1 |     67 |             23 |         44
         1 |     90 |             67 |         23
         1 |      2 |             90 |        -88
         1 |      5 |              2 |          3
         2 |     33 |              1 |         32
         2 |      6 |             33 |        -27
         2 |     76 |              6 |         70
         2 |    111 |             76 |         35
         2 |     10 |            111 |       -101
         2 |     55 |             10 |         45
         5 |     42 |              8 |         34
         5 |     77 |             42 |         35
         5 |      9 |             77 |        -68
         5 |     89 |              9 |         80
         5 |     23 |             89 |        -66
         5 |     11 |             23 |        -12
         7 |      6 |             12 |         -6
         7 |    144 |              6 |        138
         7 |     63 |            144 |        -81
         7 |     87 |             63 |         24
         7 |     24 |             87 |        -63

When I used a sorted view:

create or replace view arb_view as select * from arb_test order by
client_id, arbnum;

and redid it the results are:

 client_id | arbnum | previousarbnum | diffarbnum
-----------+--------+----------------+------------
         1 |      5 |              2 |          3
         1 |     14 |              5 |          9
         1 |     23 |             14 |          9
         1 |     67 |             23 |         44
         1 |     90 |             67 |         23
         2 |      6 |              1 |          5
         2 |     10 |              6 |          4
         2 |     33 |             10 |         23
         2 |     55 |             33 |         22
         2 |     76 |             55 |         21
         2 |    111 |             76 |         35
         5 |      9 |              8 |          1
         5 |     11 |              9 |          2
         5 |     23 |             11 |         12
         5 |     42 |             23 |         19
         5 |     77 |             42 |         35
         5 |     89 |             77 |         12
         7 |     12 |              6 |          6
         7 |     22 |             12 |         10
         7 |     24 |             22 |          2
         7 |     63 |             24 |         39
         7 |     87 |             63 |         24
         7 |    144 |             87 |         57
(23 rows)


This works the way it should.

--drop table arb_test;
--drop view arb_view;

willem
> The 'all_client_times' table has 753698 rows.  The lagfunc() on the
> sorted view returns 753576 rows and appears to work exactly as
> needed.  Using the function on an unsorted table returns only 686 rows
> and is missing a whole lot of data.  Running the count query returns
> 122 - which is correct as the amount of clients that I have.  Each
> client has between 5 - 7K  records each.
>
> The way I see it is for each client there will be one row, namely, the
> first in the series, that will not be included in the final results as
> it would not have a previous time.  With that in mind, if I take my
> table row count as 753698 and minus the amount of clients I have,
> 122,  then I should get the number of results as 753576 which is
> correct when I use your methods on a sorted table but which is not
> correct when I  use your methods on an unsorted table.
>
> willem
>
> Adam Rich wrote:
>> Hi Willem,
>>
>>
>>> for some reason the order by's aren't working.
>>>
>>
>> Could you provide more details?  Do you get a specific error message?
>>
>>
>>> only returning 658 rows instead of the 750K.
>>>
>>
>> You should not expect the same row count in both source table and
>> result set.  Even in your example -- you provided 8 source rows, and
>> 4 result rows.  You can determine the correct number of results via
>> "the number of records, related to client_ids having two or more
>> records in all_client_times, minus one".  It may be true that you
>> have 750k
>> records but only 658 rows that satisfy this requirement.
>>
>> What do you get for this query?
>>
>> select count(*) from ( select client_id, count(*) as rows from
>> all_client_times group by client_id having count(*) > 1 ) as x
>>
>>
>> Adam
>>
>>
>>
>>
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org/
>

Re: Oracle Analytical Functions

От

"Rodrigo E. De León Plicet"

Дата:

02 февраля 2008 г., 16:33:19

On Jan 31, 2008 8:49 AM, Enrico Sirola <enrico.sirola@gmail.com> wrote:
> I'd create a "previousTime" column and manage it using a trigger.
> Anyway, it depends on the time-dependancy of the table
> Then you can perform "temporal" in a much easier way.
> You could be interested in taking a look at the following link
>
> http://www.cs.arizona.edu/~rts/tdbbook.pdf

+1

That book provides many sane examples for handling temporal data. Recommended.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Oracle Analytical Functions

Вложения