Обсуждение: strange cost for correlated subquery

Поиск
Список
Период
Сортировка

strange cost for correlated subquery

От
"Pavel Stehule"
Дата:
Hello

I tested speed SELF JOIN and correlated subquery for couting of subtotals:

It's strange, so correlated subqueries is faster, but it has much higher cost:

postgres=# explain analyze select t1.id, t1.sale_date, t1.product,
t1.sale_price, sum(t2.sale_price) from
history t1 inner join history t2 on t1.id >= t2.id and t1.product = t2.product
group by t1.id, t1.sale_date, t1.product, t1.sale_price
order by t1.id
;                                                              QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------Sort
(cost=3678.85..3691.36 rows=5003 width=19) (actual
 
time=1553.575..1560.618 rows=5003 loops=1)  Sort Key: t1.id  Sort Method:  quicksort  Memory: 480kB  ->  HashAggregate
(cost=3308.91..3371.45rows=5003 width=19)
 
(actual time=1530.276..1540.206 rows=5003 loops=1)        ->  Nested Loop  (cost=0.00..1708.29 rows=128050 width=19)
(actual time=0.264..1034.048 rows=198333 loops=1)              ->  Seq Scan on history t1  (cost=0.00..78.03 rows=5003
width=15) (actual time=0.077..8.835 rows=5003 loops=1)              ->  Index Scan using fxxx on history t2
(cost=0.00..0.31 rows=1 width=11) (actual time=0.017..0.087 rows=40
loops=5003)                    Index Cond: (((t2.product)::text =
(t1.product)::text) AND (t1.id >= t2.id))Total runtime: 1567.125 ms
(9 rows)

postgres=# explain analyze SELECT sale_date, product, sale_price,                 COALESCE((SELECT SUM(sale_price)
                       FROM history                             WHERE product = o.product
ANDid <= o.id), 0) AS total             FROM history o;                                                          QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------Seq
Scanon history o  (cost=0.00..41532.29 rows=5003 width=15)
 
(actual time=0.073..825.333 rows=5003 loops=1)  SubPlan    ->  Aggregate  (cost=8.28..8.29 rows=1 width=4) (actual
time=0.158..0.159 rows=1 loops=5003)          ->  Index Scan using fxxx on history  (cost=0.00..8.27
rows=1 width=4) (actual time=0.018..0.086 rows=40 loops=5003)                Index Cond: (((product)::text =
($0)::text)AND (id <= $1))Total runtime: 833.213 ms
 
(6 rows)

postgres=# show effective_cache_size ;effective_cache_size
----------------------600MB
(1 row)

postgres=# SHOW shared_buffers ;shared_buffers
----------------300MB
(1 row)

Maybe I have too big random_page_cost?
postgres=# SHOW random_page_cost ;random_page_cost
------------------4
(1 row)

Time: 0,351 ms
postgres=# set random_page_cost to 2;
SET
Time: 0,330 ms
postgres=# SHOW random_page_cost ;random_page_cost
------------------2
(1 row)

Time: 0,320 ms
postgres=# explain analyze SELECT sale_date, product, sale_price,                 COALESCE((SELECT SUM(sale_price)
                       FROM history                             WHERE product = o.product
ANDid <= o.id), 0) AS total             FROM history o;                                                          QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------Seq
Scanon history o  (cost=0.00..21518.09 rows=5003 width=15)
 
(actual time=0.132..809.701 rows=5003 loops=1)  SubPlan    ->  Aggregate  (cost=4.28..4.29 rows=1 width=4) (actual
time=0.154..0.155 rows=1 loops=5003)          ->  Index Scan using fxxx on history  (cost=0.00..4.27
rows=1 width=4) (actual time=0.020..0.088 rows=40 loops=5003)                Index Cond: (((product)::text =
($0)::text)AND (id <= $1))Total runtime: 817.358 ms
 

Regards
Pavel Stehule


Re: strange cost for correlated subquery

От
Tom Lane
Дата:
"Pavel Stehule" <pavel.stehule@gmail.com> writes:
> It's strange, so correlated subqueries is faster, but it has much higher cost:

In the nestloop plan, the estimated cost for the indexscan is discounted
based on the knowledge that it'll be executed repeatedly:

>                ->  Index Scan using fxxx on history t2
> (cost=0.00..0.31 rows=1 width=11) (actual time=0.017..0.087 rows=40
> loops=5003)

In the subplan case that doesn't happen:

>            ->  Index Scan using fxxx on history  (cost=0.00..8.27
> rows=1 width=4) (actual time=0.018..0.086 rows=40 loops=5003)
>                  Index Cond: (((product)::text = ($0)::text) AND (id <= $1))

Had the same discount been applied then the estimated costs would be
pretty nearly in line with reality, if I did the math right.

It'd be nice to do better but I'm not sure how; at the time that we
create plans for sub-queries we don't really have any way to know how
often they'll be called by the upper query.
        regards, tom lane