Обсуждение: Awkward Join between generate_series and long table

Поиск
Список
Период
Сортировка

Awkward Join between generate_series and long table

От
Lincoln Swaine-Moore
Дата:
Hi all--

I'm having a performance problem in 12.16 that I'm hoping someone can help with.

I have a table shaped roughly like this:

                                   Table "public.data"

     Column      |            Type             | Collation | Nullable |                    Default
-----------------+-----------------------------+-----------+----------+------------------------------------------------
 id              | integer                     |           | not null | nextval('data_seq'::regclass)
 timestamp       | timestamp without time zone |           | not null |
 sn              | character varying(36)       |           | not null |

Indexes:
    "data_pkey" PRIMARY KEY, btree (id)
    "data_multicol_sn_and_timestamp_desc_idx" btree (sn, "timestamp" DESC)


with something in the 10M to 100M range in terms of number of rows.


I have a query more or less like:


WITH periods as (
    SELECT
        *
    FROM
        (
            SELECT
                s at time zone 'utc'  AS period_start,
                LEAD(s) OVER (
                    ORDER BY
                        s
                )   at time zone 'utc' AS period_end
            FROM
                generate_series(
                    ('2023-01-01T00:00:00-08:00') :: timestamptz,
                    ('2023-11-01T00:00:00-07:00') :: timestamptz,
                    ('1 day') :: interval
                ) s
        ) with_junk_period
    WHERE
        period_end IS NOT NULL  
)
SELECT
    p.period_start,
    p.period_end,
    COUNT (distinct d.id)
FROM
    periods p
    LEFT JOIN data d
    ON
        d.timestamp >= (p.period_start)
        AND d."timestamp" < (p.period_end)
        AND d.sn = 'BLAH'
GROUP BY
    p.period_start,
    p.period_end
ORDER BY
    p.period_start ASC
;



This worked fine on a smaller, but same-shaped, set of data on a staging database, where the query plan was:

GroupAggregate  (cost=14843021.48..15549022.41 rows=200 width=24) (actual time=1311.052..1515.344 rows=303 loops=1)
  Group Key: with_junk_period.period_start, with_junk_period.period_end
  ->  Sort  (cost=14843021.48..15019521.21 rows=70599893 width=20) (actual time=1305.969..1384.676 rows=463375 loops=1)
        Sort Key: with_junk_period.period_start, with_junk_period.period_end
        Sort Method: external merge  Disk: 13632kB
        ->  Nested Loop Left Join  (cost=60.26..2329833.01 rows=70599893 width=20) (actual time=355.379..917.049 rows=463375 loops=1)
              ->  Subquery Scan on with_junk_period  (cost=59.83..92.33 rows=995 width=16) (actual time=355.307..358.978 rows=303 loops=1)
                    Filter: (with_junk_period.period_end IS NOT NULL)
                    Rows Removed by Filter: 1
                    ->  WindowAgg  (cost=59.83..82.33 rows=1000 width=24) (actual time=355.302..358.723 rows=304 loops=1)
                          ->  Sort  (cost=59.83..62.33 rows=1000 width=8) (actual time=355.265..355.510 rows=304 loops=1)
                                Sort Key: s.s
                                Sort Method: quicksort  Memory: 39kB
                                ->  Function Scan on generate_series s  (cost=0.00..10.00 rows=1000 width=8) (actual time=355.175..355.215 rows=304 loops=1)
              ->  Index Scan using data_multicol_sn_and_timestamp_desc_idx on data d  (cost=0.43..1631.90 rows=70955 width=12) (actual time=0.042..1.516 rows=1529 loops=303)
"                    Index Cond: (((sn)::text = 'BLAH'::text) AND (""timestamp"" >= with_junk_period.period_start) AND (""timestamp"" < with_junk_period.period_end))"
Planning Time: 0.283 ms
JIT:
  Functions: 20
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 1.570 ms, Inlining 29.051 ms, Optimization 192.084 ms, Emission 134.022 ms, Total 356.727 ms
Execution Time: 1523.983 ms


But on the production database, the query is no longer runnable for the same frame, and when we restrict, it we can see that index is no longer being fully utilized:

GroupAggregate  (cost=56901238.84..58446602.98 rows=200 width=24) (actual time=3652.892..3653.669 rows=4 loops=1)
  Group Key: with_junk_period.period_start, with_junk_period.period_end
  ->  Sort  (cost=56901238.84..57287579.38 rows=154536214 width=20) (actual time=3652.544..3652.765 rows=5740 loops=1)
        Sort Key: with_junk_period.period_start, with_junk_period.period_end
        Sort Method: quicksort  Memory: 641kB
        ->  Nested Loop Left Join  (cost=60.40..32259766.02 rows=154536214 width=20) (actual time=172.908..3651.658 rows=5740 loops=1)
"              Join Filter: ((d.""timestamp"" >= with_junk_period.period_start) AND (d.""timestamp"" < with_junk_period.period_end))"
              Rows Removed by Join Filter: 5310656
              ->  Subquery Scan on with_junk_period  (cost=59.83..92.33 rows=995 width=16) (actual time=152.963..153.014 rows=4 loops=1)
                    Filter: (with_junk_period.period_end IS NOT NULL)
                    Rows Removed by Filter: 1
                    ->  WindowAgg  (cost=59.83..82.33 rows=1000 width=24) (actual time=152.958..153.004 rows=5 loops=1)
                          ->  Sort  (cost=59.83..62.33 rows=1000 width=8) (actual time=152.937..152.945 rows=5 loops=1)
                                Sort Key: s.s
                                Sort Method: quicksort  Memory: 25kB
                                ->  Function Scan on generate_series s  (cost=0.00..10.00 rows=1000 width=8) (actual time=152.928..152.930 rows=5 loops=1)
              ->  Materialize  (cost=0.57..1138670.54 rows=1397815 width=12) (actual time=0.017..811.790 rows=1329099 loops=4)
                    ->  Index Scan using mdata_multicol_sn_and_timestamp_desc_idx on data d  (cost=0.57..1124855.46 rows=1397815 width=12) (actual time=0.051..2577.248 rows=1329099 loops=1)
"                          Index Cond: ((sn)::text = 'BLAH'::text)"
Planning Time: 0.184 ms
JIT:
  Functions: 22
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 1.429 ms, Inlining 11.254 ms, Optimization 75.746 ms, Emission 65.959 ms, Total 154.388 ms
Execution Time: 3662.526 ms


Instead, the time filtering has been moved up to the join condition, which means that the database needs to look at many, many more rows.

One optimization I was able to make was to manually add the starts and end dates of the generate_series as an overall filter whole for the right side:

    periods p
    LEFT JOIN data d
    ON
        d.timestamp >= (p.period_start)
        AND d."timestamp" < (p.period_end)
        AND d.sn = 'BLAH'
        AND d.timestamp >= ('2023-01-01T00:00:00-08:00'::timestamptz) at time zone 'utc'
        AND d.timestamp <  ('2023-11-01T00:00:00-08:00'::timestamptz) at time zone 'utc'


This improves the query's performance somewhat, because the last two conditions make it into the index usage, but the time-related join conditions remain as part of the join filter, so the upside is limited. It also seems like this is a kind of "hint" that the planner should really be able to infer from the structure of "periods".

Stepping back a bit, I think there should be a way to write this query so that the planner recognizes that each row of data should only fit into one periods bucket, and to utilize the fact that the table are both easily sortable to perform a more efficient join (merge?) than the nested loop. But I'd definitely settle for getting it to use the index like it does on the staging data.

A few things I've tried to no avail:
- moving around the non-joiny parts of the join to where, or to a CTE of data
- ordering each of the components descending in various ways to match data's index
- starting with data as the left table and periods on the right (I'd have to do some post-processing here, but if it was fast it would be worth it)

The only thing I can think of left to do is to run 12 separate queries--one for each month, which is a time period that blows up the nested loop just enough that the query can finish--and then concatenate them after the fact. But it seems like that's just using some knowledge about the structure of the tables that I should be able to communicate to the planner, and I'd really love to keep it all in the database!

Thanks for any and all help and suggestions.



--
Lincoln Swaine-Moore

Re: Awkward Join between generate_series and long table

От
"David G. Johnston"
Дата:
On Wed, Nov 8, 2023 at 6:26 PM Lincoln Swaine-Moore <lswainemoore@gmail.com> wrote:
            SELECT
                s at time zone 'utc'  AS period_start,
                LEAD(s) OVER (
                    ORDER BY
                        s
                )   at time zone 'utc' AS period_end

Maybe doesn't help overall but this can be equivalently written as:
s + '1 day'::interval as period_end

Resorting to a window function here is expensive waste, the lead() value can be computed, not queried.
 
SELECT
    p.period_start,
    p.period_end,
    COUNT (distinct d.id)
FROM
    periods p
    LEFT JOIN data d
    ON
        d.timestamp >= (p.period_start)
        AND d."timestamp" < (p.period_end)
        AND d.sn = 'BLAH'

This seems better written (semantically, not sure about execution dynamics) as:

FROM periods AS p
LEFT JOIN LATERAL (SELECT count(distinct? d.id) FROM data AS d WHERE d.timestamp >= p.period_start AND d.timestamp < p.period_end AND d.sn = 'BLAH') AS cnt_d
-- NO grouping required at this query level

David J.

Re: Awkward Join between generate_series and long table

От
Lincoln Swaine-Moore
Дата:
Maybe doesn't help overall but this can be equivalently written as:
s + '1 day'::interval as period_end

Ah, so I've glossed over a detail here which is that I'm relying on some timezone specific behavior and not actually generate_series itself. If you're curious, the details are here: https://www.postgresql.org/message-id/2582288.1696428710%40sss.pgh.pa.us

I think that makes the window function necessary, or at least something a little more sophisticated than addition of a day (though I'd be happy to be wrong about that).

LEFT JOIN LATERAL (SELECT

Oh wow, this seems to get the index used! That's wonderful news--thank you.

I'd be super curious if anyone has any intuition about why the planner is so much more successful there--most of what I see online about LATERAL JOINs is focused as you said on semantics not performance. But in terms of solving my problem, this seems to do the trick.

Thanks again!


On Wed, Nov 8, 2023 at 5:45 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Wed, Nov 8, 2023 at 6:26 PM Lincoln Swaine-Moore <lswainemoore@gmail.com> wrote:
            SELECT
                s at time zone 'utc'  AS period_start,
                LEAD(s) OVER (
                    ORDER BY
                        s
                )   at time zone 'utc' AS period_end

Maybe doesn't help overall but this can be equivalently written as:
s + '1 day'::interval as period_end

Resorting to a window function here is expensive waste, the lead() value can be computed, not queried.
 
SELECT
    p.period_start,
    p.period_end,
    COUNT (distinct d.id)
FROM
    periods p
    LEFT JOIN data d
    ON
        d.timestamp >= (p.period_start)
        AND d."timestamp" < (p.period_end)
        AND d.sn = 'BLAH'

This seems better written (semantically, not sure about execution dynamics) as:

FROM periods AS p
LEFT JOIN LATERAL (SELECT count(distinct? d.id) FROM data AS d WHERE d.timestamp >= p.period_start AND d.timestamp < p.period_end AND d.sn = 'BLAH') AS cnt_d
-- NO grouping required at this query level

David J.



--
Lincoln Swaine-Moore

Re: Awkward Join between generate_series and long table

От
Philip Semanchuk
Дата:

> On Nov 8, 2023, at 8:26 PM, Lincoln Swaine-Moore <lswainemoore@gmail.com> wrote:
>
> Hi all--
>
> I'm having a performance problem in 12.16 that I'm hoping someone can help with.

<much useful info snipped>

> Thanks for any and all help and suggestions.


Hi Lincoln,
I haven't read your SQL carefully so I may be completely off base, but I wanted to share an experience I had with
generate_series()that caused some planner headaches that may be affecting you too.  

I was using generate_series() in a CROSS JOIN with a large table. The call to generate_series() only emitted a small
numberof rows (4 - 24) but the planner estimated it would emit 1000 rows because that's Postgres' default in absence of
otherinfo. (See https://www.postgresql.org/docs/current/sql-createfunction.html, "The default assumption is 1000
rows.")I see an estimate for 1000 rows in your EXPLAIN output too, so you're experiencing the same although in your
casethe estimate of 1000 might be more accurate. The misestimation was causing significant performance problems for me. 

My solution was to wrap generate_series() in a custom function that had a ROWS qualifier (documented at the same link
asabove) to better inform the planner. In my case I used ROWS 16 since that was relatively accurate -- a lot more
accuratethan 1000, anyway. 

Then I found that my pure SQL custom function was getting inlined, which caused the information in the ROWS  qualifier
toget lost. :-) I rewrote it in PL/pgSQL to prevent the inlining, and that solution worked well for me. (See convo at
https://www.postgresql.org/message-id/flat/76B16E5F-59D0-4C97-8DBA-4B3BB21E2009%40americanefficient.com)

On another note, I have also seen unexpected performance gains from introducing LATERAL into a JOIN. My guess is that I
gotlucky, and that the use of LATERAL sent the planner down a better path.  

Hope this is at least a little helpful!

Good luck,
Philip


Re: Awkward Join between generate_series and long table

От
Lincoln Swaine-Moore
Дата:
> I see an estimate for 1000 rows in your EXPLAIN output too, so you're experiencing the same 
> although in your case the estimate of 1000 might be more accurate. The misestimation was causing 
> significant performance problems for me.

> My solution was to wrap generate_series() in a custom function that had a ROWS qualifier 

That's interesting! I actually wasn't familiar with the ROWs feature at all, so that is good knowledge to pocket.

In my case, I think the number of rows will vary quite a bit for different time periods/resolutions (and 1000 might not be a bad estimate for some of the workloads). I do wonder whether if the planner had a sense of how big the series result could be for longer periods/finer resolutions (which is a bit of information I could actually trivially generate outside and encode into the query explicitly if need be), it might avoid/minimize the NESTED LOOP at all costs, but I'm not sure how to communicate that information.

Anyway, thank you for sharing! Very helpful to hear what other people have dealt with in similar situations.