Обсуждение: Bug: When user-defined AM is used, the index path cannot be selected correctly

Поиск
Список
Период
Сортировка

Bug: When user-defined AM is used, the index path cannot be selected correctly

От
Quan Zongliang
Дата:
Hi,

1. When using extended PGroonga

CREATE EXTENSION pgroonga;

CREATE TABLE memos (
   id boolean,
   content varchar
);

CREATE INDEX idxA ON memos USING pgroonga (id);

2. Disable bitmapscan and seqscan:

SET enable_seqscan=off;
SET enable_indexscan=on;
SET enable_bitmapscan=off;

3. Neither ID = 'f' nor id= 't' can use the index correctly.

postgres=# explain select * from memos where id='f';
                                 QUERY PLAN
--------------------------------------------------------------------------
  Seq Scan on memos  (cost=10000000000.00..10000000001.06 rows=3 width=33)
    Filter: (NOT id)
(2 rows)

postgres=# explain select * from memos where id='t';
                                 QUERY PLAN
--------------------------------------------------------------------------
  Seq Scan on memos  (cost=10000000000.00..10000000001.06 rows=3 width=33)
    Filter: id
(2 rows)

postgres=# explain select * from memos where id>='t';
                             QUERY PLAN
-------------------------------------------------------------------
  Index Scan using idxa on memos  (cost=0.00..4.01 rows=2 width=33)
    Index Cond: (id >= true)
(2 rows)



The reason is that these expressions are converted to BoolExpr and Var.
match_clause_to_indexcol does not use them to check boolean-index.

patch attached.

--
Quan Zongliang
Beijing Vastdata
Вложения

Re: Bug: When user-defined AM is used, the index path cannot be selected correctly

От
Tom Lane
Дата:
Quan Zongliang <quanzongliang@yeah.net> writes:
> 1. When using extended PGroonga
> ...
> 3. Neither ID = 'f' nor id= 't' can use the index correctly.

This works fine for btree indexes.  I think the actual problem
is that IsBooleanOpfamily only accepts the btree and hash
opclasses, and that's what needs to be improved.  Your proposed
patch fails to do that, which makes it just a crude hack that
solves some aspects of the issue (and probably breaks other
things).

It might work to change IsBooleanOpfamily so that it checks to
see whether BooleanEqualOperator is a member of the opclass.
That's basically what we need to know before we dare generate
substitute index clauses.  It's kind of an expensive test
though, and the existing coding assumes that IsBooleanOpfamily
is cheap ...

            regards, tom lane



Re: Bug: When user-defined AM is used, the index path cannot be selected correctly

От
Quan Zongliang
Дата:

On 2022/8/17 10:03, Tom Lane wrote:
> Quan Zongliang <quanzongliang@yeah.net> writes:
>> 1. When using extended PGroonga
>> ...
>> 3. Neither ID = 'f' nor id= 't' can use the index correctly.
> 
> This works fine for btree indexes.  I think the actual problem
> is that IsBooleanOpfamily only accepts the btree and hash
> opclasses, and that's what needs to be improved.  Your proposed
> patch fails to do that, which makes it just a crude hack that
> solves some aspects of the issue (and probably breaks other
> things).
> 
> It might work to change IsBooleanOpfamily so that it checks to
> see whether BooleanEqualOperator is a member of the opclass.
> That's basically what we need to know before we dare generate
> substitute index clauses.  It's kind of an expensive test
> though, and the existing coding assumes that IsBooleanOpfamily
> is cheap ...
> 
>             regards, tom lane

New patch attached.

It seems that partitions do not use AM other than btree and hash.
Rewrite only indxpath.c and check if it is a custom AM.
Вложения

Re: Bug: When user-defined AM is used, the index path cannot be selected correctly

От
Tom Lane
Дата:
Quan Zongliang <quanzongliang@yeah.net> writes:
> New patch attached.
> It seems that partitions do not use AM other than btree and hash.
> Rewrite only indxpath.c and check if it is a custom AM.

This seems drastically underdocumented, and the test you're using
for extension opclasses is wrong.  What we need to know before
applying match_boolean_index_clause is that a clause using
BooleanEqualOperator will be valid for the index.  That's got next
door to nothing to do with whether the opclass is default for
the index AM.  A non-default opclass might support that operator,
and conversely a default one might not (although I concede it's
not that easy to imagine what other set of operators-on-boolean
an extension opclass might be interested in).

I think we need something more like the attached.

            regards, tom lane

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 045ff2e487..63a8eef45c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -153,6 +153,7 @@ static IndexClause *match_clause_to_indexcol(PlannerInfo *root,
                                              RestrictInfo *rinfo,
                                              int indexcol,
                                              IndexOptInfo *index);
+static bool IsBooleanOpfamily(Oid opfamily);
 static IndexClause *match_boolean_index_clause(PlannerInfo *root,
                                                RestrictInfo *rinfo,
                                                int indexcol, IndexOptInfo *index);
@@ -2342,6 +2343,23 @@ match_clause_to_indexcol(PlannerInfo *root,
     return NULL;
 }

+/*
+ * IsBooleanOpfamily
+ *      Detect whether an opfamily supports boolean equality as an operator.
+ *
+ * If the opfamily OID is in the range of built-in objects, we can rely
+ * on hard-wired knowledge of which built-in opfamilies support this.
+ * For extension opfamilies, there's no choice but to do a catcache lookup.
+ */
+static bool
+IsBooleanOpfamily(Oid opfamily)
+{
+    if (opfamily < FirstNormalObjectId)
+        return IsBuiltinBooleanOpfamily(opfamily);
+    else
+        return op_in_opfamily(BooleanEqualOperator, opfamily);
+}
+
 /*
  * match_boolean_index_clause
  *      Recognize restriction clauses that can be matched to a boolean index.
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 1fa7fc99b5..b9c1ce0a64 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -1191,8 +1191,13 @@ partkey_is_bool_constant_for_query(RelOptInfo *partrel, int partkeycol)
     PartitionScheme partscheme = partrel->part_scheme;
     ListCell   *lc;

-    /* If the partkey isn't boolean, we can't possibly get a match */
-    if (!IsBooleanOpfamily(partscheme->partopfamily[partkeycol]))
+    /*
+     * If the partkey isn't boolean, we can't possibly get a match.
+     *
+     * Partitioning currently can only use built-in AMs, so checking for
+     * built-in boolean opclasses is good enough.
+     */
+    if (!IsBuiltinBooleanOpfamily(partscheme->partopfamily[partkeycol]))
         return false;

     /* Check each restriction clause for the partitioned rel */
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index bf9fe5b7aa..b5403149d7 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -3596,7 +3596,11 @@ match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,

     *outconst = NULL;

-    if (!IsBooleanOpfamily(partopfamily))
+    /*
+     * Partitioning currently can only use built-in AMs, so checking for
+     * built-in boolean opclasses is good enough.
+     */
+    if (!IsBuiltinBooleanOpfamily(partopfamily))
         return PARTCLAUSE_UNSUPPORTED;

     if (IsA(clause, BooleanTest))
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index 8dc9ce01bb..b56d714322 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -55,7 +55,8 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_opfamily_oid_index, 2755, OpfamilyOidIndexId, on pg

 #ifdef EXPOSE_TO_CLIENT_CODE

-#define IsBooleanOpfamily(opfamily) \
+/* This does not account for non-core opclasses that might accept boolean */
+#define IsBuiltinBooleanOpfamily(opfamily) \
     ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)

 #endif                            /* EXPOSE_TO_CLIENT_CODE */

Re: Bug: When user-defined AM is used, the index path cannot be selected correctly

От
Tom Lane
Дата:
I wrote:
> I think we need something more like the attached.

Meh -- serves me right for not doing check-world before sending.
The patch causes some plans to change in the btree_gin and btree_gist
modules; which is good, because that shows that the patch is actually
doing what it's supposed to.  The fact that your patch didn't make
the cfbot unhappy implies that it wasn't triggering for those modules.
I think the reason is that you did

+        ((amid) >= FirstNormalObjectId && \
+                OidIsValid(GetDefaultOpClass(BOOLOID, (amid)))) \

so that the FirstNormalObjectId cutoff was applied to the AM's OID,
not to the opfamily OID, causing it to do the wrong thing for
extension opclasses over built-in AMs.

The good news is this means we don't need to worry about making
a test case ...

            regards, tom lane

diff --git a/contrib/btree_gin/expected/bool.out b/contrib/btree_gin/expected/bool.out
index efb3e1e327..207a3f2328 100644
--- a/contrib/btree_gin/expected/bool.out
+++ b/contrib/btree_gin/expected/bool.out
@@ -87,13 +87,15 @@ EXPLAIN (COSTS OFF) SELECT * FROM test_bool WHERE i<=true ORDER BY i;
 (6 rows)

 EXPLAIN (COSTS OFF) SELECT * FROM test_bool WHERE i=true ORDER BY i;
-         QUERY PLAN
------------------------------
+                QUERY PLAN
+-------------------------------------------
  Sort
    Sort Key: i
-   ->  Seq Scan on test_bool
+   ->  Bitmap Heap Scan on test_bool
          Filter: i
-(4 rows)
+         ->  Bitmap Index Scan on idx_bool
+               Index Cond: (i = true)
+(6 rows)

 EXPLAIN (COSTS OFF) SELECT * FROM test_bool WHERE i>=true ORDER BY i;
                 QUERY PLAN
diff --git a/contrib/btree_gist/expected/bool.out b/contrib/btree_gist/expected/bool.out
index c67a9685d5..29390f079c 100644
--- a/contrib/btree_gist/expected/bool.out
+++ b/contrib/btree_gist/expected/bool.out
@@ -71,7 +71,7 @@ SELECT * FROM booltmp WHERE a;
                 QUERY PLAN
 ------------------------------------------
  Index Only Scan using boolidx on booltmp
-   Filter: a
+   Index Cond: (a = true)
 (2 rows)

 SELECT * FROM booltmp WHERE a;
@@ -85,7 +85,7 @@ SELECT * FROM booltmp WHERE NOT a;
                 QUERY PLAN
 ------------------------------------------
  Index Only Scan using boolidx on booltmp
-   Filter: (NOT a)
+   Index Cond: (a = false)
 (2 rows)

 SELECT * FROM booltmp WHERE NOT a;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 045ff2e487..63a8eef45c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -153,6 +153,7 @@ static IndexClause *match_clause_to_indexcol(PlannerInfo *root,
                                              RestrictInfo *rinfo,
                                              int indexcol,
                                              IndexOptInfo *index);
+static bool IsBooleanOpfamily(Oid opfamily);
 static IndexClause *match_boolean_index_clause(PlannerInfo *root,
                                                RestrictInfo *rinfo,
                                                int indexcol, IndexOptInfo *index);
@@ -2342,6 +2343,23 @@ match_clause_to_indexcol(PlannerInfo *root,
     return NULL;
 }

+/*
+ * IsBooleanOpfamily
+ *      Detect whether an opfamily supports boolean equality as an operator.
+ *
+ * If the opfamily OID is in the range of built-in objects, we can rely
+ * on hard-wired knowledge of which built-in opfamilies support this.
+ * For extension opfamilies, there's no choice but to do a catcache lookup.
+ */
+static bool
+IsBooleanOpfamily(Oid opfamily)
+{
+    if (opfamily < FirstNormalObjectId)
+        return IsBuiltinBooleanOpfamily(opfamily);
+    else
+        return op_in_opfamily(BooleanEqualOperator, opfamily);
+}
+
 /*
  * match_boolean_index_clause
  *      Recognize restriction clauses that can be matched to a boolean index.
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 1fa7fc99b5..b9c1ce0a64 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -1191,8 +1191,13 @@ partkey_is_bool_constant_for_query(RelOptInfo *partrel, int partkeycol)
     PartitionScheme partscheme = partrel->part_scheme;
     ListCell   *lc;

-    /* If the partkey isn't boolean, we can't possibly get a match */
-    if (!IsBooleanOpfamily(partscheme->partopfamily[partkeycol]))
+    /*
+     * If the partkey isn't boolean, we can't possibly get a match.
+     *
+     * Partitioning currently can only use built-in AMs, so checking for
+     * built-in boolean opclasses is good enough.
+     */
+    if (!IsBuiltinBooleanOpfamily(partscheme->partopfamily[partkeycol]))
         return false;

     /* Check each restriction clause for the partitioned rel */
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index bf9fe5b7aa..b5403149d7 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -3596,7 +3596,11 @@ match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,

     *outconst = NULL;

-    if (!IsBooleanOpfamily(partopfamily))
+    /*
+     * Partitioning currently can only use built-in AMs, so checking for
+     * built-in boolean opclasses is good enough.
+     */
+    if (!IsBuiltinBooleanOpfamily(partopfamily))
         return PARTCLAUSE_UNSUPPORTED;

     if (IsA(clause, BooleanTest))
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index 8dc9ce01bb..b56d714322 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -55,7 +55,8 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_opfamily_oid_index, 2755, OpfamilyOidIndexId, on pg

 #ifdef EXPOSE_TO_CLIENT_CODE

-#define IsBooleanOpfamily(opfamily) \
+/* This does not account for non-core opclasses that might accept boolean */
+#define IsBuiltinBooleanOpfamily(opfamily) \
     ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)

 #endif                            /* EXPOSE_TO_CLIENT_CODE */