Обсуждение: Window function?

Поиск
Список
Период
Сортировка

Window function?

От
Robert Stanford
Дата:
Hi,

I have time series data from multiple inputs with start and
end timestamps.

Can anyone suggest an elegant way to coalesce consecutive rows so only the 
first start time and last end time for each group of events (by input) is returned.

Sample from and to below where the rows for Input number 4 could be massaged.
(Timestamps selected with timestamp(0) for convenience)

From this:
Input   Start                 End                  
5       2022-06-04 09:09:00   2022-06-04 09:09:29
4       2022-06-04 09:08:50   2022-06-04 09:09:00
4       2022-06-04 09:08:10   2022-06-04 09:08:50
4       2022-06-04 09:07:47   2022-06-04 09:08:10
17      2022-06-04 09:06:47   2022-06-04 09:07:47
4       2022-06-04 09:06:37   2022-06-04 09:06:47
4       2022-06-04 09:06:29   2022-06-04 09:06:37
4       2022-06-04 09:06:17   2022-06-04 09:06:29
4       2022-06-04 09:05:53   2022-06-04 09:06:17
16      2022-06-04 09:04:33   2022-06-04 09:05:53

To this:
Input   Start                 End                  
5       2022-06-04 09:09:00   2022-06-04 09:09:29
4       2022-06-04 09:07:47   2022-06-04 09:09:00
17      2022-06-04 09:06:47   2022-06-04 09:07:47
4       2022-06-04 09:05:53   2022-06-04 09:06:47
16      2022-06-04 09:04:33   2022-06-04 09:05:53 

Thanks in advance to anyone who can help!
Robert

Re: Window function?

От
Thiemo Kellner
Дата:

Hi Robert

Interesting problem. I need to think about it.

You need to figure out when Input changes. You can achieve this by using lead or lag (depending of the sort direction over start) https://www.postgresql.org/docs/current/functions-window.html .

Hope this nudges you to a solution.

Kind regards

Thiemo

Am 04.06.22 um 10:18 schrieb Robert Stanford:
Hi,

I have time series data from multiple inputs with start and
end timestamps.

Can anyone suggest an elegant way to coalesce consecutive rows so only the 
first start time and last end time for each group of events (by input) is returned.

Sample from and to below where the rows for Input number 4 could be massaged.
(Timestamps selected with timestamp(0) for convenience)

From this:
Input   Start                 End                  
5       2022-06-04 09:09:00   2022-06-04 09:09:29
4       2022-06-04 09:08:50   2022-06-04 09:09:00
4       2022-06-04 09:08:10   2022-06-04 09:08:50
4       2022-06-04 09:07:47   2022-06-04 09:08:10
17      2022-06-04 09:06:47   2022-06-04 09:07:47
4       2022-06-04 09:06:37   2022-06-04 09:06:47
4       2022-06-04 09:06:29   2022-06-04 09:06:37
4       2022-06-04 09:06:17   2022-06-04 09:06:29
4       2022-06-04 09:05:53   2022-06-04 09:06:17
16      2022-06-04 09:04:33   2022-06-04 09:05:53

To this:
Input   Start                 End                  
5       2022-06-04 09:09:00   2022-06-04 09:09:29
4       2022-06-04 09:07:47   2022-06-04 09:09:00
17      2022-06-04 09:06:47   2022-06-04 09:07:47
4       2022-06-04 09:05:53   2022-06-04 09:06:47
16      2022-06-04 09:04:33   2022-06-04 09:05:53 

Thanks in advance to anyone who can help!
Robert

Re: Window function?

От
Robert Stanford
Дата:

On Sat, 4 Jun 2022 at 22:56, Thiemo Kellner <thiemo@gelassene-pferde.biz> wrote:

Hi Robert

Interesting problem. I need to think about it.

You need to figure out when Input changes. You can achieve this by using lead or lag (depending of the sort direction over start) https://www.postgresql.org/docs/current/functions-window.html .

After a bit of poking around this seems to get things going in the right direction.

select row_number() over (order by start)  - row_number() over (partition by input order by start) 

  

Re: Window function?

От
Tom Lane
Дата:
Robert Stanford <rstanford@gmail.com> writes:
> Can anyone suggest an elegant way to coalesce consecutive rows so only the
> first start time and last end time for each group of events (by input) is
> returned.

Window functions don't change the number of query result rows, so that
idea is a dead end.  The standard way to solve this is GROUP BY:

SELECT Input, MIN(Start) AS Start, MAX(End) AS End
FROM ...
GROUP BY Input
ORDER BY Input -- optional

If MIN/MAX don't quite express what you need to happen, then you
might need to write custom aggregates.  Point though is that
you want aggregation within GROUP BY groups, not windowing.

            regards, tom lane



Re: Window function?

От
"David G. Johnston"
Дата:
On Sat, Jun 4, 2022 at 7:50 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Stanford <rstanford@gmail.com> writes:
> Can anyone suggest an elegant way to coalesce consecutive rows so only the
> first start time and last end time for each group of events (by input) is
> returned.

Window functions don't change the number of query result rows, so that
idea is a dead end.  The standard way to solve this is GROUP BY:

SELECT Input, MIN(Start) AS Start, MAX(End) AS End
FROM ...
GROUP BY Input
ORDER BY Input -- optional

Assuming that input group 4 really is supposed to show up twice in the output then the final solution is going to have to be a combination of this and window functions.  You need the later in order to be able to say "input 4, appearance 1" and "input 4, appearance 2" - which lead/lag tends to help with.  Then the group by becomes {input, appearance}.

David J.

Re: Window function?

От
John W Higgins
Дата:


On Sat, Jun 4, 2022 at 1:18 AM Robert Stanford <rstanford@gmail.com> wrote:
Hi,
 
From this:
Input   Start                 End                  
5       2022-06-04 09:09:00   2022-06-04 09:09:29
4       2022-06-04 09:08:50   2022-06-04 09:09:00
4       2022-06-04 09:08:10   2022-06-04 09:08:50
4       2022-06-04 09:07:47   2022-06-04 09:08:10
17      2022-06-04 09:06:47   2022-06-04 09:07:47
4       2022-06-04 09:06:37   2022-06-04 09:06:47
4       2022-06-04 09:06:29   2022-06-04 09:06:37
4       2022-06-04 09:06:17   2022-06-04 09:06:29
4       2022-06-04 09:05:53   2022-06-04 09:06:17
16      2022-06-04 09:04:33   2022-06-04 09:05:53

To this:
Input   Start                 End                  
5       2022-06-04 09:09:00   2022-06-04 09:09:29
4       2022-06-04 09:07:47   2022-06-04 09:09:00
17      2022-06-04 09:06:47   2022-06-04 09:07:47
4       2022-06-04 09:05:53   2022-06-04 09:06:47
16      2022-06-04 09:04:33   2022-06-04 09:05:53 

lag is indeed your friend here - assuming times is your table name

with times_cte as (select *, lag(input, 1) over () from times)
select input, start, end from times_cte where input != coalesce(lag, -1);

The coalesce to -1 is needed at the end to get the first row which has null for the lag value because it's the first row.

John