Обсуждение: BUG #18442: Unnecessary Sort operator in indexScan Plan

Поиск
Список
Период
Сортировка

BUG #18442: Unnecessary Sort operator in indexScan Plan

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      18442
Logged by:          yajun Hu
Email address:      hu_yajun@qq.com
PostgreSQL version: 14.11
Operating system:   CentOS7 with kernel version 5.10
Description:

I have reproduced this problem in REL_14_11 and the latest master branch
(84db9a0eb10dd1dbee6db509c0e427fa237177dc).
The steps to reproduce are as follows.
1. ./configure  --enable-debug --enable-depend --enable-cassert CFLAGS=-O0
2. make -j; make install -j; initdb -D ./primary; pg_ctl -D ../primary -l
logfile start
3. run SQL:
```
create table t( a int, b int);
insert into t select null,i from generate_series(1,100)i;
insert into t select i,i from generate_series(1,100000)i;
analyze t;
create index on t(a,b);
postgres=# explain select * from t where a is null order by b; -- need
sort
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Sort  (cost=9.54..9.80 rows=103 width=8)
   Sort Key: b
   ->  Index Only Scan using t_a_b_idx on t  (cost=0.29..6.10 rows=103
width=8)
         Index Cond: (a IS NULL)
(4 rows)

postgres=# explain select * from t where a is null order by a, b; -- no need
sort
                                QUERY PLAN
--------------------------------------------------------------------------
 Index Only Scan using t_a_b_idx on t  (cost=0.29..6.10 rows=103 width=8)
   Index Cond: (a IS NULL)
(2 rows)

postgres=# explain select * from t where a = 1 order by b; -- no need sort
                               QUERY PLAN
------------------------------------------------------------------------
 Index Only Scan using t_a_b_idx on t  (cost=0.29..4.31 rows=1 width=8)
   Index Cond: (a = 1)
(2 rows)
```

In my understanding, in the first SELECT, because a is always NULL, the
scanned
data access by IndexOnlyScan is sorted according to b, which means that the
upper
Sort operator is unnecessary overhead.The second and third SELECT are both
as
expected. 

I tried to analyze the code and found that the EquivalenceClass of column a
and NULL
was missing, which caused build_index_pathkeys to return NIL. No pathkeys
makes the
optimizer decide that the upper layer needed Sort to ensure that the data
was in order.
I roughly know that it may be because NullTest in the check_mergejoinable
function is
not OpExpr. Is it possible here to generate special EquivalenceClass for
column a and
NULL to solve this problem?

I’m looking forward to someone answering my confusion, thank you very much!


Re: BUG #18442: Unnecessary Sort operator in indexScan Plan

От
Tom Lane
Дата:
PG Bug reporting form <noreply@postgresql.org> writes:
> create index on t(a,b);
> postgres=# explain select * from t where a is null order by b; -- need
> sort
>                                    QUERY PLAN
> --------------------------------------------------------------------------------
>  Sort  (cost=9.54..9.80 rows=103 width=8)
>    Sort Key: b
>    ->  Index Only Scan using t_a_b_idx on t  (cost=0.29..6.10 rows=103
> width=8)
>          Index Cond: (a IS NULL)
> (4 rows)

Postgres doesn't detect that it could do this because "a IS NULL"
is not an equivalence condition.  You're not the first to suggest
that it could be treated as one, but I believe there are semantic
difficulties that would ensue.  One example is that given
"a IS NULL" and "a = b", the EquivalenceClass machinery would
think it can discard "a = b" and instead emit "b IS NULL",
which would not give the same answers.

            regards, tom lane



Re: BUG #18442: Unnecessary Sort operator in indexScan Plan

От
David Rowley
Дата:
On Sat, 20 Apr 2024 at 02:03, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> One example is that given
> "a IS NULL" and "a = b", the EquivalenceClass machinery would
> think it can discard "a = b" and instead emit "b IS NULL",
> which would not give the same answers.

While this is relatively fresh, for the sake of the archives...

Presumably, if a=b is strict then effectively nothing could match as
the strict qual ensures NULLs never match and the IS NULL only allows
NULLs.

Couldn't strict equality conditions be handled using the same method
that we use to handle an Eclass with two distinct Consts. e.g a = 1
and a=b and b=2?

If the equality condition isn't strict then won't "b" be NULL if "a" is?

David



Re: BUG #18442: Unnecessary Sort operator in indexScan Plan

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> Couldn't strict equality conditions be handled using the same method
> that we use to handle an Eclass with two distinct Consts. e.g a = 1
> and a=b and b=2?

Perhaps, but it'd be a bit odd and confusing to treat two NULLs as
distinct rather than equal.

On the whole I'm disinclined to add complexity around this.  My gut
says there are more semantic gotchas than this one, and I think the
use-case is at best pretty hokey.  If your data design requires using
NULL as though it's a normal data value, you're in a state of sin
already, and you're going to find SQL fighting you all the way.

(A question closely related to this is whether IS NOT DISTINCT FROM
could be optimized more like regular equality.  I'm not convinced
about the use-case there either, although perhaps it's worth looking
into.)

            regards, tom lane



Re: BUG #18442: Unnecessary Sort operator in indexScan Plan

От
Richard Guo
Дата:

On Mon, Apr 22, 2024 at 9:53 AM David Rowley <dgrowleyml@gmail.com> wrote:
Presumably, if a=b is strict then effectively nothing could match as
the strict qual ensures NULLs never match and the IS NULL only allows
NULLs.

Yeah, in this case the two restrictions are self-inconsistent and that
makes the relation dummy and need not be scanned.  I think the planner
would figure that out in relation_excluded_by_constraints when the GUC
constraint_exclusion is on, but in a different way than ECs.

Thanks
Richard

Re: BUG #18442: Unnecessary Sort operator in indexScan Plan

От
David Rowley
Дата:
On Mon, 22 Apr 2024 at 14:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> (A question closely related to this is whether IS NOT DISTINCT FROM
> could be optimized more like regular equality.  I'm not convinced
> about the use-case there either, although perhaps it's worth looking
> into.)

This came up for me recently when adjusting the UNION planner to allow
the use of Merge Append.  I'd considered taking that work further to
expand it into INTERSECT and EXCEPT to transform those to INNER and
ANTI joins. I quickly realised that we'd need to have better
optimisation of joins with IS NOT DISTINCT FROM to make this a
worthwhile transformation.  The join transformation cannot be done
with simple equality conditions as that would eliminate NULLs. I was
unexcited about only making it work for non-nullable Vars as it seemed
like a backwater within a backwater.

David