why memoize is not used for correlated subquery

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема why memoize is not used for correlated subquery
Дата
Msg-id CAFj8pRBRyML16xf4W-UJGg+O1OaH4WDR9BcwZhVzW0YLCD4XBA@mail.gmail.com
обсуждение исходный текст
Ответы Re: why memoize is not used for correlated subquery
Re: why memoize is not used for correlated subquery
Список pgsql-hackers
Hi

I am playing with examples for P2D2, and I found few issues related to memoize

1. I use dataset https://pgsql.cz/files/obce.sql - it is data about czech population

Dictionary - "obec" -> "village", "pocet_muzu" -> "number_of_men", "pocet_zen" -> "number_of_woman", "okres" -> "district", "nazev" -> "name"

I wrote the query - biggest village per district

select nazev 
  from obce o 
  where pocet_muzu + pocet_zen = (select max(pocet_muzu + pocet_zen)
                                    from obce
                                   where o.okres_id = okres_id);



I expected usage of memoize, because in this query, it can be very effective https://explain.depesz.com/s/0ubC

(2024-05-28 09:09:58) postgres=# explain select nazev from obce o where pocet_muzu + pocet_zen = (select max(pocet_muzu + pocet_zen) from obce where o.okres_id = okres_id);
                                                QUERY PLAN                                                
══════════════════════════════════════════════════════════════════════════════════════════════════════════
Seq Scan on obce o  (cost=0.00..33588.33 rows=31 width=10)
  Filter: ((pocet_muzu + pocet_zen) = (SubPlan 2))
  SubPlan 2
    ->  Result  (cost=5.34..5.35 rows=1 width=4)
          InitPlan 1
            ->  Limit  (cost=0.28..5.34 rows=1 width=4)
                  ->  Index Scan Backward using obce_expr_idx on obce  (cost=0.28..409.92 rows=81 width=4)
                        Index Cond: ((pocet_muzu + pocet_zen) IS NOT NULL)
                        Filter: ((o.okres_id)::text = (okres_id)::text)
(9 rows)


But it doesn't do. I rewrote this query to lateral join, and memoize was used, but the result was not good, because filter wa pushed to subquery

explain select * from obce o, lateral (select max(pocet_zen + pocet_muzu) from obce where o.okres_id = okres_id) where pocet_zen + pocet_muzu = max;
                                              QUERY PLAN                                              
══════════════════════════════════════════════════════════════════════════════════════════════════════
Nested Loop  (cost=12.83..19089.82 rows=31 width=45)
  ->  Seq Scan on obce o  (cost=0.00..121.50 rows=6250 width=41)
  ->  Memoize  (cost=12.83..12.85 rows=1 width=4)
        Cache Key: (o.pocet_zen + o.pocet_muzu), o.okres_id
        Cache Mode: binary
        ->  Subquery Scan on unnamed_subquery  (cost=12.82..12.84 rows=1 width=4)
              Filter: ((o.pocet_zen + o.pocet_muzu) = unnamed_subquery.max)
              ->  Aggregate  (cost=12.82..12.83 rows=1 width=4)
                    ->  Index Scan using obce_okres_id_idx on obce  (cost=0.28..12.41 rows=81 width=8)
                          Index Cond: ((okres_id)::text = (o.okres_id)::text)
(10 rows)

and then the effect of memoize is almost negative https://explain.depesz.com/s/TKLL

When I used optimization fence, then memoize was used effectively https://explain.depesz.com/s/hhgi

explain select * from (select * from obce o, lateral (select max(pocet_zen + pocet_muzu) from obce where o.okres_id = okres_id) offset 0) where pocet_zen + pocet_muzu = max;
                                              QUERY PLAN                                              
══════════════════════════════════════════════════════════════════════════════════════════════════════
Subquery Scan on unnamed_subquery  (cost=12.83..1371.93 rows=31 width=45)
  Filter: ((unnamed_subquery.pocet_zen + unnamed_subquery.pocet_muzu) = unnamed_subquery.max)
  ->  Nested Loop  (cost=12.83..1278.18 rows=6250 width=45)
        ->  Seq Scan on obce o  (cost=0.00..121.50 rows=6250 width=41)
        ->  Memoize  (cost=12.83..12.84 rows=1 width=4)
              Cache Key: o.okres_id
              Cache Mode: binary
              ->  Aggregate  (cost=12.82..12.83 rows=1 width=4)
                    ->  Index Scan using obce_okres_id_idx on obce  (cost=0.28..12.41 rows=81 width=8)
                          Index Cond: ((okres_id)::text = (o.okres_id)::text)
(10 rows)

My question is - does memoize support subqueries? And can be enhanced to support this exercise without LATERAL and optimization fences?

Regards

Pavel







В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pradeep Kumar
Дата:
Сообщение: Re: Need clarification on compilation errors in PG 16.2
Следующее
От: Tender Wang
Дата:
Сообщение: Re: why memoize is not used for correlated subquery