Re: Awkward Join between generate_series and long table

Поиск
Список
Период
Сортировка
От Lincoln Swaine-Moore
Тема Re: Awkward Join between generate_series and long table
Дата
Msg-id CABcidk+1h0RidMehRcDL31XxO+o6e1fJTVyAEQmfcRc-tnoyQw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Awkward Join between generate_series and long table  ("David G. Johnston" <david.g.johnston@gmail.com>)
Список pgsql-performance
Maybe doesn't help overall but this can be equivalently written as:
s + '1 day'::interval as period_end

Ah, so I've glossed over a detail here which is that I'm relying on some timezone specific behavior and not actually generate_series itself. If you're curious, the details are here: https://www.postgresql.org/message-id/2582288.1696428710%40sss.pgh.pa.us

I think that makes the window function necessary, or at least something a little more sophisticated than addition of a day (though I'd be happy to be wrong about that).

LEFT JOIN LATERAL (SELECT

Oh wow, this seems to get the index used! That's wonderful news--thank you.

I'd be super curious if anyone has any intuition about why the planner is so much more successful there--most of what I see online about LATERAL JOINs is focused as you said on semantics not performance. But in terms of solving my problem, this seems to do the trick.

Thanks again!


On Wed, Nov 8, 2023 at 5:45 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Wed, Nov 8, 2023 at 6:26 PM Lincoln Swaine-Moore <lswainemoore@gmail.com> wrote:
            SELECT
                s at time zone 'utc'  AS period_start,
                LEAD(s) OVER (
                    ORDER BY
                        s
                )   at time zone 'utc' AS period_end

Maybe doesn't help overall but this can be equivalently written as:
s + '1 day'::interval as period_end

Resorting to a window function here is expensive waste, the lead() value can be computed, not queried.
 
SELECT
    p.period_start,
    p.period_end,
    COUNT (distinct d.id)
FROM
    periods p
    LEFT JOIN data d
    ON
        d.timestamp >= (p.period_start)
        AND d."timestamp" < (p.period_end)
        AND d.sn = 'BLAH'

This seems better written (semantically, not sure about execution dynamics) as:

FROM periods AS p
LEFT JOIN LATERAL (SELECT count(distinct? d.id) FROM data AS d WHERE d.timestamp >= p.period_start AND d.timestamp < p.period_end AND d.sn = 'BLAH') AS cnt_d
-- NO grouping required at this query level

David J.



--
Lincoln Swaine-Moore

В списке pgsql-performance по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: Awkward Join between generate_series and long table
Следующее
От: Philip Semanchuk
Дата:
Сообщение: Re: Awkward Join between generate_series and long table