Обсуждение: Bad plan when null is in an "in" list
Hi all, Jumping in directly to the subject, this is what I get: explain SELECT bigint_col_1, bigint_col_2 FROM big_table WHERE bigint_col_2 in (12132131, null, null, null, null); QUERY PLAN ------------------------------------------------------------------- Seq Scan on big_table (cost=0.00..2447201.85 rows=448 width=16) Filter: ((bigint_col_2 = 12132131::bigint) OR NULL::boolean) (2 rows) Compared to: explain SELECT bigint_col_1, bigint_col_2 FROM big_table WHERE bigint_col_2 in (12132131, 123781, 1297839032, 123667123); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using dom_idx_member_bigint_col_2, dom_idx_member_bigint_col_2, dom_idx_member_bigint_col_2, dom_idx_member_bigint_col_2 on big_table (cost=0.00..6427.28 rows=1789 width=16) Index Cond: ((bigint_col_2 = 12132131) OR (bigint_col_2 = 123781) OR (bigint_col_2 = 1297839032) OR (bigint_col_2 = 123667123)) (2 rows) big_table has ~ 100 million rows. Considering that NULL::boolean is always false, I don't quite understand why the first query is going for a sequential scan, instead of just ignoring the nulls from the in clause... We have a lot of queries like this for chunking activities, and they work too on oracle. We've just migrated a bigger data base to postgres from oracle, and this is the first thing to slow down our system to a crawl... we will fix this, but the parser could be smarter I guess. Cheers, Csaba.
On 7/26/05, Csaba Nagy <nagy@ecircle-ag.com> wrote: > Hi all, > > Jumping in directly to the subject, this is what I get: > > explain SELECT bigint_col_1, bigint_col_2 FROM big_table WHERE > bigint_col_2 in (12132131, null, null, null, > null); > > QUERY PLAN > ------------------------------------------------------------------- > Seq Scan on big_table (cost=0.00..2447201.85 rows=448 width=16) > Filter: ((bigint_col_2 = 12132131::bigint) OR NULL::boolean) > (2 rows) > > this is because null values can't be indexed... or telling other way the planner will never choose an index for comparing to null maybe a partial index can be used? > Compared to: > > > explain SELECT bigint_col_1, bigint_col_2 FROM big_table WHERE > bigint_col_2 in (12132131, 123781, 1297839032, 123667123); > > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Index Scan using dom_idx_member_bigint_col_2, > dom_idx_member_bigint_col_2, dom_idx_member_bigint_col_2, > dom_idx_member_bigint_col_2 on big_table (cost=0.00..6427.28 rows=1789 > width=16) > Index Cond: ((bigint_col_2 = 12132131) OR (bigint_col_2 = 123781) OR > (bigint_col_2 = 1297839032) OR (bigint_col_2 = 123667123)) > (2 rows) > > > big_table has ~ 100 million rows. > there is no nulls here so the index can be used > > Considering that NULL::boolean is always false, > null::boolean is null not false. that is because null means 'unknown value' not false nor true template1=# select null::boolean; bool ------ (1 fila) -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;)
Jaime Casanova <systemguards@gmail.com> writes: > On 7/26/05, Csaba Nagy <nagy@ecircle-ag.com> wrote: >> Seq Scan on big_table (cost=0.00..2447201.85 rows=448 width=16) >> Filter: ((bigint_col_2 = 12132131::bigint) OR NULL::boolean) >> >> Considering that NULL::boolean is always false, > null::boolean is null not false. that is because null means 'unknown > value' not false nor true The reason the planner ends up with this and not just "bigint_col_2 = 12132131::bigint" is that it's using a general-purpose expression simplifier, and in the general case we have to keep the NULL arm of the OR because it can affect the result (the OR output will be NULL not FALSE if the equality is false). In the context of the top level of a WHERE clause, we could discard the NULL, and then the OR, since we do not need to distinguish NULL and FALSE results. However, as far as I can see doing this would require an extra pass over the WHERE clause (it can't readily be folded into any of the existing traversals because those are done by routines that have other uses where dropping NULLs would be wrong). I'm unconvinced that the use-case for this justifies that much overhead ... regards, tom lane