Обсуждение: Optimizer regression

Поиск
Список
Период
Сортировка

Optimizer regression

От
Jim Nasby
Дата:
Just upgraded to 8.4 (I know, I know…) and ran across this. Unfortunately I have no way to test this on 9.x, so I don't
knowif it's been fixed or not. I'm hoping that someone *cough*Tom*cough* would quickly recognize whether this push into
subqueryissue has been fixed or not, so I haven't included full details or a test case. I have a work-around so I don't
careabout this in 8.4, but if this regression still exists it would be nice if it were fixed. 

CREATE VIEW loans.payday AS SELECT * FROM loans WHERE loan_type_cd IN ( 'payday', 'other' );
CREATE VIEW loans.payday_defaulted AS SELECT * FROM loans.payday p WHERE EXISTS( SELECT * FROM loan_statuses ls WHERE
ls.loan_id= p.id AND ls.status_cd = 'in_default' ); 

This query is fast:

SELECT defaulted_then_paid_loans  , ( SELECT count(*)        FROM loans.payday        WHERE ROW( customer_id, status_cd
)= ROW( d.customer_id, d.status_cd )          AND id > coalesce( max_defaulted_loan_id, 0 )     ) AS
number_of_loans_since_last_default
FROM ( SELECT p.customer_id, p.status_cd, count( d.id ) AS defaulted_then_paid_loans         , max( d.id ) AS
max_defaulted_loan_id        FROM loans.payday p           LEFT JOIN loans.payday_defaulted d USING( id )         WHERE
d.customer_id= ?         GROUP BY p.customer_id, p.status_cd     ) d 
WHERE status_cd = 'paid_off';

This query is not (but was fine on 8.3):
SELECT defaulted_then_paid_loans  , ( SELECT count(*)        FROM loans.payday        WHERE ROW( customer_id, status_cd
)= ROW( d.customer_id, d.status_cd )          AND id > coalesce( max_defaulted_loan_id, 0 )     ) AS
number_of_loans_since_last_default
FROM ( SELECT p.customer_id, p.status_cd, count( d.id ) AS defaulted_then_paid_loans         , max( d.id ) AS
max_defaulted_loan_id        FROM loans.payday p           LEFT JOIN loans.payday_defaulted d USING( id )         GROUP
BYp.customer_id, p.status_cd     ) d 
WHERE status_cd = 'paid_off' AND customer_id = ?
;

Plan from the "bad" query on 8.3:                                                                           QUERY PLAN
                                                                         

------------------------------------------------------------------------------------------------------------------------------------------------------------------Subquery
Scand  (cost=0.00..438.00 rows=2 width=162) (actual time=4883.286..4883.286 rows=1 loops=1)  ->  GroupAggregate
(cost=0.00..421.91rows=2 width=17) (actual time=4883.181..4883.181 rows=1 loops=1)        ->  Nested Loop Left Join
(cost=0.00..421.75rows=13 width=17) (actual time=314.426..4883.082 rows=31 loops=1)              ->  Index Scan using
loans_m13on loans  (cost=0.00..36.72 rows=13 width=17) (actual time=52.209..561.240 rows=31 loops=1)
IndexCond: ((customer_id = 10287151) AND ((status_cd)::text = 'paid_off'::text))                    Filter:
((loan_type_cd)::text= ANY ('{payday,cso}'::text[]))              ->  Index Scan using loans_pkey on loans
(cost=0.00..29.61rows=1 width=4) (actual time=139.410..139.410 rows=0 loops=31)                    Index Cond:
(cnu.loans.id= cnu.loans.id)                    Filter: (((cnu.loans.loan_type_cd)::text = ANY
('{payday,cso}'::text[]))AND (subplan))                    SubPlan                      ->  Index Scan using
loan_status_u1on loan_statuses ls  (cost=0.00..23.43 rows=1 width=88) (actual time=109.521..109.521 rows=0 loops=31)
                       Index Cond: (loan_id = $3)                            Filter: ((status_cd)::text =
'in_default'::text) SubPlan    ->  Aggregate  (cost=8.03..8.04 rows=1 width=0) (actual time=0.100..0.100 rows=1
loops=1)         ->  Index Scan using loans_m13 on loans  (cost=0.00..8.02 rows=1 width=0) (actual time=0.041..0.084
rows=31loops=1)                Index Cond: ((customer_id = $0) AND ((status_cd)::text = ($1)::text))
Filter:(((loan_type_cd)::text = ANY ('{payday,cso}'::text[])) AND (id > COALESCE($2, 0)))Total runtime: 4883.439 ms 
(19 rows)

And from 8.4…                                                     QUERY PLAN
          

----------------------------------------------------------------------------------------------------------------------Subquery
Scand  (cost=3003014.53..3027074.69 rows=2 width=162)  ->  GroupAggregate  (cost=3003014.53..3027059.89 rows=2
width=17)       ->  Hash Left Join  (cost=3003014.53..3027059.73 rows=13 width=17)              Hash Cond:
(cnu.loans.id= cnu.loans.id)              ->  Index Scan using loans_m13 on loans  (cost=0.00..36.01 rows=13 width=17)
                 Index Cond: ((customer_id = 10287151) AND ((status_cd)::text = 'paid_off'::text))
Filter:((loan_type_cd)::text = ANY ('{payday,cso}'::text[]))              ->  Hash  (cost=2902187.44..2902187.44
rows=6145607width=4)                    ->  Hash Join  (cost=2027941.10..2902187.44 rows=6145607 width=4)
          Hash Cond: (cnu.loans.id = ls.loan_id)                          ->  Seq Scan on loans  (cost=0.00..688340.03
rows=10783881width=4)                                Filter: ((loan_type_cd)::text = ANY ('{payday,cso}'::text[]))
                   ->  Hash  (cost=2015760.83..2015760.83 rows=974422 width=4)                                ->
HashAggregate (cost=2006016.61..2015760.83 rows=974422 width=4)                                      ->  Seq Scan on
loan_statusesls  (cost=0.00..1984621.11 rows=8558199 width=4)                                            Filter:
((status_cd)::text= 'in_default'::text)  SubPlan 1    ->  Aggregate  (cost=7.38..7.39 rows=1 width=0)          ->
IndexScan using loans_m13 on loans  (cost=0.00..7.37 rows=1 width=0)                Index Cond: ((customer_id = $0) AND
((status_cd)::text= ($1)::text))                Filter: (((loan_type_cd)::text = ANY ('{payday,cso}'::text[])) AND (id
>COALESCE($2, 0))) 
(21 rows)

--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net




Re: Optimizer regression

От
Tom Lane
Дата:
Jim Nasby <jim@nasby.net> writes:
> Just upgraded to 8.4 (I know, I know�) and ran across this. Unfortunately I have no way to test this on 9.x, so I
don'tknow if it's been fixed or not. I'm hoping that someone *cough*Tom*cough* would quickly recognize whether this
pushinto subquery issue has been fixed or not, so I haven't included full details or a test case. I have a work-around
soI don't care about this in 8.4, but if this regression still exists it would be nice if it were fixed.
 

It's hard to be sure with such an incomplete example, but I think 8.4 is
flattening the EXISTS to a semijoin and then getting trapped by join
order constraints into doing something less than optimal for this
particular use-case.  It was this type of example that motivated the
"parameterized path" stuff I've been working on for the past couple
of years.

In short, 9.2 should produce at least as good a plan as 8.3 for this
example, but 8.4 through 9.1 might not.

BTW, your workaround looks wrong --- you need to constrain the outside
of the left join not the inside, no?
        regards, tom lane



Re: Optimizer regression

От
Jim Nasby
Дата:
On 10/13/12 2:45 PM, Tom Lane wrote:
> BTW, your workaround looks wrong --- you need to constrain the outside
> of the left join not the inside, no?

Ugh, yes, you're correct. :(
-- 
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: Optimizer regression

От
Jim Nasby
Дата:
On 10/13/12 2:45 PM, Tom Lane wrote:
> Jim Nasby <jim@nasby.net> writes:
>> Just upgraded to 8.4 (I know, I know…) and ran across this. Unfortunately I have no way to test this on 9.x, so I
don'tknow if it's been fixed or not. I'm hoping that someone *cough*Tom*cough* would quickly recognize whether this
pushinto subquery issue has been fixed or not, so I haven't included full details or a test case. I have a work-around
soI don't care about this in 8.4, but if this regression still exists it would be nice if it were fixed. 
>
> It's hard to be sure with such an incomplete example, but I think 8.4 is
> flattening the EXISTS to a semijoin and then getting trapped by join
> order constraints into doing something less than optimal for this
> particular use-case.  It was this type of example that motivated the
> "parameterized path" stuff I've been working on for the past couple
> of years.
>
> In short, 9.2 should produce at least as good a plan as 8.3 for this
> example, but 8.4 through 9.1 might not.

FWIW, it's definitely an issue of not being able to push down past the GROUP BY:

cnuapp_prod@postgres10.obr=# explain  WITH default_stats AS (select customer_id, status_cd, count(*), max(id)  from
loans.payday_defaultedgroup by customer_id, status_cd) SELECT * FROM default_stats  where customer_id=10287151;
                                       QUERY PLAN 
------------------------------------------------------------------------------------------------------------ CTE Scan
ondefault_stats  (cost=2980046.56..3004313.73 rows=5393 width=162)   Filter: (customer_id = 10287151)   CTE
default_stats    ->  HashAggregate  (cost=2963868.44..2980046.56 rows=1078541 width=17)           ->  Hash Join
(cost=2028045.22..2902409.22rows=6145922 width=17)                 Hash Cond: (loans.id = ls.loan_id)
-> Seq Scan on loans  (cost=0.00..688437.25 rows=10785404 width=17)                       Filter: ((loan_type_cd)::text
=ANY ('{payday,cso}'::text[]))                 ->  Hash  (cost=2015864.33..2015864.33 rows=974471 width=4)
        ->  HashAggregate  (cost=2006119.62..2015864.33 rows=974471 width=4)                             ->  Seq Scan
onloan_statuses ls  (cost=0.00..1984723.02 rows=8558638 width=4)                                   Filter:
((status_cd)::text= 'in_default'::text) 
(12 rows)

cnuapp_prod@postgres10.obr=# explain analyze  select customer_id, status_cd, count(*), max(id)  from
loans.payday_defaultedwhere customer_id=10287151 group by customer_id, status_cd;
                                  QUERY PLAN 

---------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=202.16..202.19 rows=2 width=17) (actual time=0.422..0.422 rows=0 loops=1)   ->  Nested Loop Semi
Join (cost=0.00..202.07 rows=9 width=17) (actual time=0.422..0.422 rows=0 loops=1)         ->  Index Scan using
loans_m12on loans  (cost=0.00..41.48 rows=16 width=17) (actual time=0.028..0.121 rows=31 loops=1)               Index
Cond:(customer_id = 10287151)               Filter: ((loan_type_cd)::text = ANY ('{payday,cso}'::text[]))         ->
IndexScan using loan_statuses__loan_id__status on loan_statuses ls  (cost=0.00..10.17 rows=1 width=4) (actual
time=0.009..0.009rows=0 loops=31)               Index Cond: ((ls.loan_id = loans.id) AND ((ls.status_cd)::text =
'in_default'::text))Total runtime: 0.510 ms 
(8 rows)

cnuapp_prod@postgres10.obr=#

I hope that we'll have 9.2 stood up before the year is out, so we'll check this then and see if it's fixed.
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: Optimizer regression

От
Tom Lane
Дата:
Jim Nasby <jim@nasby.net> writes:
> FWIW, it's definitely an issue of not being able to push down past the GROUP BY:

I think it's not that so much as the EXISTS inside a LEFT JOIN.
        regards, tom lane



Re: Optimizer regression

От
Jim Nasby
Дата:
On 10/13/12 3:15 PM, Jim Nasby wrote:
> FWIW, it's definitely an issue of not being able to push down past the GROUP BY:

I take that back... GROUP BY doesn't matter. It's an issue of having the EXISTS in the inner query. I realize the
exampleshave gotten a bit silly, but this seems to break it down to the simplest case of what's happening.
 

FAST:

explain analyze

                                                        SELECT p.customer_id, p.status_cd, EXISTS( SELECT * FROM
loan_statusesls WHERE ls.loan_id = p.id AND ls.status_cd = 'in_default' )
                                                                                                      FROM loans p

                                                                    LEFT JOIN

                                                                                                 ( SELECT * FROM loans
pWHERE loan_type_cd IN ( 'payday', 'cso' )

       ) d USING( id )

                                                                                                     WHERE
p.customer_id= 10287151

                           AND p.status_cd = 'paid_off'

                                                        AND p.loan_type_cd IN ( 'payday', 'cso' )



                                                ;
QUERY PLAN
 

-----------------------------------------------------------------------------------------------------------------------------------------------------------
NestedLoop Left Join  (cost=0.00..234.87 rows=13 width=17) (actual time=0.085..0.861 rows=31 loops=1)   ->  Index Scan
usingloans_m13 on loans p  (cost=0.00..36.01 rows=13 width=17) (actual time=0.045..0.137 rows=31 loops=1)         Index
Cond:((customer_id = 10287151) AND ((status_cd)::text = 'paid_off'::text))         Filter: ((loan_type_cd)::text = ANY
('{payday,cso}'::text[]))  ->  Index Scan using loans_pkey on loans p  (cost=0.00..5.12 rows=1 width=4) (actual
time=0.011..0.011rows=1 loops=31)         Index Cond: (p.id = p.id)         Filter: ((p.loan_type_cd)::text = ANY
('{payday,cso}'::text[]))  SubPlan 1     ->  Index Scan using loan_statuses__loan_id__status on loan_statuses ls
(cost=0.00..10.17rows=1 width=0) (actual time=0.011..0.011 rows=0 loops=31)           Index Cond: ((loan_id = $0) AND
((status_cd)::text= 'in_default'::text)) Total runtime: 0.950 ms
 
(11 rows)


SLOW:


cnuapp_prod@postgres10.obr=# explain

                                                                                     SELECT p.customer_id, p.status_cd


   FROM loans p

                                   LEFT JOIN

                                                                     ( SELECT * FROM loans p WHERE loan_type_cd IN (
'payday','cso' ) AND EXISTS( SELECT * FROM loan_statuses ls WHERE ls.loan_id = p.id AND ls.status_cd = 'in_default' )
                                                                                                     ) d USING( id )


    WHERE p.customer_id = 10287151

                                      AND p.status_cd = 'paid_of
 
f'

    AND p.loan_type_cd IN ( 'payday', 'cso' )


                                                                                                                     ;
                                             QUERY PLAN
 
---------------------------------------------------------------------------------------------------------- Hash Left
Join (cost=3003251.16..3027297.36 rows=13 width=13)   Hash Cond: (p.id = p.id)   ->  Index Scan using loans_m13 on
loansp  (cost=0.00..36.01 rows=13 width=17)         Index Cond: ((customer_id = 10287151) AND ((status_cd)::text =
'paid_off'::text))        Filter: ((loan_type_cd)::text = ANY ('{payday,cso}'::text[]))   ->  Hash
(cost=2902419.07..2902419.07rows=6145927 width=4)         ->  Hash Join  (cost=2028047.07..2902419.07 rows=6145927
width=4)              Hash Cond: (p.id = ls.loan_id)               ->  Seq Scan on loans p  (cost=0.00..688444.00
rows=10785509width=4)                     Filter: ((loan_type_cd)::text = ANY ('{payday,cso}'::text[]))
-> Hash  (cost=2015866.17..2015866.17 rows=974472 width=4)                     ->  HashAggregate
(cost=2006121.45..2015866.17rows=974472 width=4)                           ->  Seq Scan on loan_statuses ls
(cost=0.00..1984724.84rows=8558646 width=4)                                 Filter: ((status_cd)::text =
'in_default'::text)
(14 rows)

cnuapp_prod@postgres10.obr=#

-- 
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net