Обсуждение: bad selectivity estimates for CASE
While looking at a complex query that is being poorly planned by PostgreSQL 8.2.9, I discovered that any non-trivial CASE...WHEN expression seems to produce a selectivity estimate of 0.005. This also happens on HEAD. psql (8.4devel) Type "help" for help. head=# create table tenk (c) as select generate_series(1,10000); SELECT head=# alter table tenk alter column c set statistics 100; ALTER TABLE head=# analyze tenk; ANALYZE head=# explain select * from tenk where c in (1,2,3,4); QUERY PLAN ------------------------------------------------------ Seq Scan on tenk (cost=0.00..190.00 rows=4 width=4) Filter: (c = ANY ('{1,2,3,4}'::integer[])) (2 rows) head=# explain select * from tenk where case when c in (1,2,3,4) then 1 end = 1; QUERY PLAN -------------------------------------------------------------------------------- ------------ Seq Scan on tenk (cost=0.00..215.00 rows=50 width=4) Filter: (CASE WHEN (c = ANY ('{1,2,3,4}'::integer[])) THEN 1 ELSE NULL::integ er END = 1) (2 rows) head=# explain select * from tenk where case when c in (1,2,3,4) then 2 end = 1; QUERY PLAN -------------------------------------------------------------------------------- ------------ Seq Scan on tenk (cost=0.00..215.00 rows=50 width=4) Filter: (CASE WHEN (c = ANY ('{1,2,3,4}'::integer[])) THEN 2 ELSE NULL::integ er END = 1) (2 rows) head=# \q The last example is particularly egregious, since it can never return true, but the previous example is not much better, since in my actual query the actual selectivity (against a CASE with multiple WHEN branches) can be as high as ~0.8, so a value of 0.005 isn't close. It ends up causing a very expensive nested loop plan when something else would be better. Any suggestions would be appreciated. ...Robert
"Robert Haas" <robertmhaas@gmail.com> writes: > While looking at a complex query that is being poorly planned by > PostgreSQL 8.2.9, I discovered that any non-trivial CASE...WHEN > expression seems to produce a selectivity estimate of 0.005. If you have an idea for a non-silly estimate, feel free to enlighten us... regards, tom lane
On Mon, Jan 5, 2009 at 11:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Robert Haas" <robertmhaas@gmail.com> writes: >> While looking at a complex query that is being poorly planned by >> PostgreSQL 8.2.9, I discovered that any non-trivial CASE...WHEN >> expression seems to produce a selectivity estimate of 0.005. > > If you have an idea for a non-silly estimate, feel free to enlighten > us... Well, presumably CASE WHEN <expr1> THEN <constant1> WHEN <expr2> THEN <constant2> WHEN <expr3> THEN <constant3> ... END = <constantn> could be simplified to <exprn>. But that's not going to happen in time to do me any good on this query, if it ever happens (and might not be sufficient anyway since the selectivity estimates of <expr1> may not be very good either), so I was more looking for suggestions on coping with the situation, since I'm sure that I'm not the first person to have this type of problem. ...Robert
"Robert Haas" <robertmhaas@gmail.com> writes: > Well, presumably CASE WHEN <expr1> THEN <constant1> WHEN <expr2> THEN > <constant2> WHEN <expr3> THEN <constant3> ... END = <constantn> could > be simplified to <exprn>. Not without breaking the order-of-evaluation guarantees. Consider case when x=0 then 0 when 1/x = 42 then 1 end = 1 This expression should not suffer a divide-by-zero failure but your proposal would allow it to do so. regards, tom lane