Обсуждение: Finding "most recent" using daterange

Поиск
Список
Период
Сортировка

Finding "most recent" using daterange

От
Rob Foehl
Дата:
Coming back to PostgreSQL after a (decades-)long absence...  If I have
something like:

CREATE TABLE example (
    id integer NOT NULL,
    value text NOT NULL,
    dates daterange NOT NULL
);

INSERT INTO example VALUES
    (1, 'a', '[2010-01-01,2020-01-01)'),
    (1, 'b', '[2010-01-01,)'),
    (1, 'c', '[,2021-01-01)'),
    (2, 'd', '[2010-01-01,2021-01-01)'),
    (2, 'e', '[2015-01-01,2020-01-01)'),
    (3, 'f', '[2014-01-01,2016-01-01)'),
    (3, 'g', '[2013-01-01,)'),
    (3, 'h', '[2012-01-01,)'),
    (3, 'i', '[2013-01-01,2017-01-01)'),
    (4, 'j', '[2010-01-01,2015-01-01)');

and I want to find the "most recent" value out of each group, meaning
that having the greatest upper bound followed by the greatest lower
bound, what I've managed to come up with thus far is:

WITH intermediate AS (
    SELECT e.id, e.value, e.dates
    FROM example AS e
    JOIN (
        SELECT id, max(coalesce(upper(dates), 'infinity')) AS max_date
        FROM example GROUP BY id
    ) AS max_upper ON e.id = max_upper.id
        AND coalesce(upper(dates), 'infinity') = max_upper.max_date
)
SELECT i.id, i.value, i.dates
FROM intermediate AS i
JOIN (
    SELECT id, max(coalesce(lower(dates), '-infinity')) AS max_date
    FROM intermediate GROUP BY id
) AS max_lower ON i.id = max_lower.id
    AND coalesce(lower(dates), '-infinity') = max_lower.max_date;

which produces the desired result for this minimal example:

 id | value |          dates
----+-------+-------------------------
  1 | b     | [2010-01-01,)
  2 | d     | [2010-01-01,2021-01-01)
  3 | g     | [2013-01-01,)
  4 | j     | [2010-01-01,2015-01-01)

I pretty quickly discovered that there's no max(daterange) -- although
it isn't obvious what that would do, anyway -- and the "intermediate"
CTE is what followed.  Is there a better way?

(Note that this doesn't try to handle duplicate ranges -- I haven't
decided whether that'll be necessary in the real case.  Assume it'll
have something beyond daterange NOT NULL and/or some _agg() magic,
eventually.)

-Rob



Re: Finding "most recent" using daterange

От
Greg Sabino Mullane
Дата:
This is a good candidate for a window function. Also note that nulls already get sorted correctly by the DESC so no need to get 'infinity' involved, although you could write 'DESC NULLS FIRST' to be explicit about it.

with x as (select *,  row_number() over (partition by id order by upper(dates) desc, lower(dates) desc) from example)
  select id,value,dates from x where row_number = 1;

 id | value |          dates
----+-------+-------------------------
  1 | b     | [2010-01-01,)
  2 | d     | [2010-01-01,2021-01-01)
  3 | g     | [2013-01-01,)
  4 | j     | [2010-01-01,2015-01-01)
(4 rows)


Cheers,
Greg

Re: Finding "most recent" using daterange

От
Isaac Morland
Дата:
On Wed, 22 May 2024 at 10:15, Greg Sabino Mullane <htamfids@gmail.com> wrote:
This is a good candidate for a window function. Also note that nulls already get sorted correctly by the DESC so no need to get 'infinity' involved, although you could write 'DESC NULLS FIRST' to be explicit about it.

with x as (select *,  row_number() over (partition by id order by upper(dates) desc, lower(dates) desc) from example)
  select id,value,dates from x where row_number = 1;

Don’t you need NULLS LAST for the lower bounds? There NULL means something closer to -infinity and should appear after the non-NULL values in a descending sort.

Actually it strikes me that this sorting issue could be a reason to avoid NULL bounds on ranges and prefer the use of +/-infinity if the underlying data type supports it.

Re: Finding "most recent" using daterange

От
Greg Sabino Mullane
Дата:
Oh, you are right - NULLS LAST, got my -infinity crossed with my infinity. :) 

Cheers,
Greg

Re: Finding "most recent" using daterange

От
Isaac Morland
Дата:
On Wed, 22 May 2024 at 11:36, Greg Sabino Mullane <htamfids@gmail.com> wrote:
Oh, you are right - NULLS LAST, got my -infinity crossed with my infinity. :) 

NULLS LAST for lower bound, NULLS FIRST for upper bound.

The other way around if you were doing an ascending sort.

Re: Finding "most recent" using daterange

От
Alban Hertroys
Дата:
> On 22 May 2024, at 09:58, Rob Foehl <rwf@loonybin.net> wrote:
>
> Coming back to PostgreSQL after a (decades-)long absence...  If I have
> something like:
>
> CREATE TABLE example (
> id integer NOT NULL,
> value text NOT NULL,
> dates daterange NOT NULL
> );
>
> INSERT INTO example VALUES
> (1, 'a', '[2010-01-01,2020-01-01)'),
> (1, 'b', '[2010-01-01,)'),
> (1, 'c', '[,2021-01-01)'),
> (2, 'd', '[2010-01-01,2021-01-01)'),
> (2, 'e', '[2015-01-01,2020-01-01)'),
> (3, 'f', '[2014-01-01,2016-01-01)'),
> (3, 'g', '[2013-01-01,)'),
> (3, 'h', '[2012-01-01,)'),
> (3, 'i', '[2013-01-01,2017-01-01)'),
> (4, 'j', '[2010-01-01,2015-01-01)');
>
> and I want to find the "most recent" value out of each group, meaning
> that having the greatest upper bound followed by the greatest lower
> bound, what I've managed to come up with thus far is:

Sounds like a good candidate for using EXISTS to prove that no more recent value exists for a given id:

SELECT e.id, e.value, e.dates
FROM example AS e
WHERE NOT EXISTS (
        SELECT 1
        FROM example AS i
        WHERE i.id = e.id
        AND (coalesce(upper(i.dates), 'infinity') > coalesce(upper(e.dates), 'infinity')
                OR (coalesce(upper(i.dates), 'infinity') = coalesce(upper(e.dates), 'infinity')
                AND coalesce(lower(i.dates), '-infinity') > coalesce(lower(e.dates), '-infinity'))
        )
);

 id | value |          dates
----+-------+-------------------------
  1 | b     | [2010-01-01,)
  2 | d     | [2010-01-01,2021-01-01)
  3 | g     | [2013-01-01,)
  4 | j     | [2010-01-01,2015-01-01)
(4 rows)

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Re: Finding "most recent" using daterange

От
Ken Tanzer
Дата:
On Wed, May 22, 2024 at 11:07 AM Alban Hertroys <haramrae@gmail.com> wrote:

Sounds like a good candidate for using EXISTS to prove that no more recent value exists for a given id:

SELECT e.id, e.value, e.dates
FROM example AS e
WHERE NOT EXISTS (
        SELECT 1
        FROM example AS i
        WHERE i.id = e.id
        AND (coalesce(upper(i.dates), 'infinity') > coalesce(upper(e.dates), 'infinity')
                OR (coalesce(upper(i.dates), 'infinity') = coalesce(upper(e.dates), 'infinity')
                AND coalesce(lower(i.dates), '-infinity') > coalesce(lower(e.dates), '-infinity'))
        )
);


Not sure if I'm missing something, but what about just using DISTINCT?

SELECT DISTINCT ON (id) id,value,dates FROM example ORDER BY id,upper(dates) desc, lower(dates) desc;

 id | value |          dates          
----+-------+-------------------------
  1 | b     | [2010-01-01,)
  2 | d     | [2010-01-01,2021-01-01)
  3 | g     | [2013-01-01,)
  4 | j     | [2010-01-01,2015-01-01)
(4 rows)



Cheers,
Ken

--

AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.