Обсуждение: bug with expression index on partition

Поиск
Список
Период
Сортировка

bug with expression index on partition

От
Amit Langote
Дата:
Hi.

I noticed a bug with creating an expression index on a partitioned table.
The fact that a partition may have differing attribute numbers is not
accounted for when recursively *creating* the index on partition.  The
partition index gets created but is unusable.

create table p (a int) partition by hash (a);
create table p1 partition of p for values with (modulus 3, remainder 0);
create table p2 partition of p for values with (modulus 3, remainder 1);
create table p3 partition of p for values with (modulus 3, remainder 2);

-- make p3 have different attribute number for column a
alter table p detach partition p3;
alter table p3 drop a, add a int;
alter table p attach partition p3 for values with (modulus 3, remainder 2);

create index on p ((a+1));
\d p3
ERROR:  invalid attnum 1 for relation "p3"

That's because the IndexStmt that's passed for creating the index on p3
contains expressions whose Vars are exact copies of the parent's Vars,
which in this case is wrong.  We must have converted them to p3's
attribute numbers before calling DefineIndex on p3.

Note that if the same index already exists in a partition before creating
the parent's index, then that index already contains the right Vars, no
conversion is needed in that case.

alter table p detach partition p3;
alter table p3 drop a, add a int;
create index on p3 ((a+1));
alter table p attach partition p3 for values with (modulus 3, remainder 2);
\d p3
                 Table "public.p3"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 a      | integer |           |          |
Partition of: p FOR VALUES WITH (modulus 3, remainder 2)
Indexes:
    "p3_expr_idx" btree ((a + 1))

create index on p ((a+1));
\d p3
                 Table "public.p3"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 a      | integer |           |          |
Partition of: p FOR VALUES WITH (modulus 3, remainder 2)
Indexes:
    "p3_expr_idx" btree ((a + 1))

In other words, the code that checks whether an index with matching
definition exists in the partition correctly converts Vars before the
actual matching, so is able to conclude that, despite different Vars in
p3's index, it is same index as the one being created in the parent.  So,
the index is not created again, which actually if it were, it would be hit
by the very same bug I'm reporting.

Also is right the case where the partition being attached doesn't have the
index but the parent has and it is correctly *cloned* to the attached
partition.

alter table p detach partition p3;
alter table p3 drop a, add a int;
create index on p ((a+1));
alter table p attach partition p3 for values with (modulus 3, remainder 2);
\d p3
                 Table "public.p3"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 a      | integer |           |          |
Partition of: p FOR VALUES WITH (modulus 3, remainder 2)
Indexes:
    "p3_expr_idx" btree ((a + 1))


So, CompareIndexInfo and generateClonedIndexStmt are both doing the right
thing, but DefineIndex is not.  Attached is a patch to fix DefineIndex so
that it converts indexParams before recursing to create the index on a
partition.

By the way, do we want to do anything about the fact that we don't check
if a matching inherited index exists when creating an index directly on
the partition.  I wondered this because we do check the other way around.

\d p3
                 Table "public.p3"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 a      | integer |           |          |
Partition of: p FOR VALUES WITH (modulus 3, remainder 2)
Indexes:
    "p3_expr_idx" btree ((a + 1))

create index on p3 ((a+1));
\d p3
                 Table "public.p3"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 a      | integer |           |          |
Partition of: p FOR VALUES WITH (modulus 3, remainder 2)
Indexes:
    "p3_expr_idx" btree ((a + 1))
    "p3_expr_idx1" btree ((a + 1))

Thanks,
Amit

Вложения

Re: bug with expression index on partition

От
Amit Langote
Дата:
On 2018/06/21 15:35, Amit Langote wrote:
> So, CompareIndexInfo and generateClonedIndexStmt are both doing the right
> thing, but DefineIndex is not.  Attached is a patch to fix DefineIndex so
> that it converts indexParams before recursing to create the index on a
> partition.

I noticed that while CompareIndexInfo and generateClonedIndexStmt would
reject the case where index expressions contain a whole-row Var, my patch
didn't teach to do the same to DefineIndex, causing asymmetric behavior.
So, whereas ATTACH PARTITION would error out when trying to clone a
parent's index that contains a whole-row Var, recursively creating an
index on partition won't.

I updated the patch so that even DefineIndex will check if any whole-row
Vars were encountered during conversion and error out if so.

Thanks,
Amit

Вложения

Re: bug with expression index on partition

От
Amit Langote
Дата:
On 2018/06/21 16:19, Amit Langote wrote:
> I updated the patch so that even DefineIndex will check if any whole-row
> Vars were encountered during conversion and error out if so.

I first thought of starting a new thread for this, but thought I'd just
reply here because the affected code is nearby.

I was wondering if it wouldn't hurt to allow whole-row vars to be present
in the expressions of inherited indexes.  If we did allow it, the query
shown in the example below is able to use the indexes on partitions.

create table p (a int) partition by hash (a);
create table p1 partition of p for values with (modulus 3, remainder 0);
create table p2 partition of p for values with (modulus 3, remainder 1);
create table p3 partition of p for values with (modulus 3, remainder 2);
create index on p ((p));

explain (costs off) select p from p order by p;
              QUERY PLAN
---------------------------------------
 Merge Append
   Sort Key: ((p1.*)::p)
   ->  Index Scan using p1_p_idx on p1
   ->  Index Scan using p2_p_idx on p2
   ->  Index Scan using p3_p_idx on p3
(5 rows)

After applying the patch in my last email, each of
generateClonedIndexStmt, CompareIndexInfo, and DefineIndex reject
inheriting an index if its expressions are found to contain whole-row
vars.  Now, one can create those indexes on partitions individually, but
they cannot be matched to an ORDER BY clause of a query accessing those
partitions via the parent table.

drop index p_p_idx;
create index on p1 ((p1));
create index on p2 ((p2));
create index on p3 ((p3));

explain (costs off) select p from p order by p;
         QUERY PLAN
----------------------------
 Sort
   Sort Key: ((p1.*)::p)
   ->  Append
         ->  Seq Scan on p1
         ->  Seq Scan on p2
         ->  Seq Scan on p3
(6 rows)

It is of course usable if partition's accessed directly.

explain (costs off) select p1 from p1 order by p1;
            QUERY PLAN
----------------------------------
 Index Scan using p1_p1_idx on p1
(1 row)

OTOH, an inherited index with whole-row vars (if we decide to start
allowing them as I'm proposing) cannot be used if partition's accessed
directly.

drop index p1_p1_idx;
create index on p ((p));
explain (costs off) select p1 from p1 order by p1;
      QUERY PLAN
----------------------
 Sort
   Sort Key: p1.*
   ->  Seq Scan on p1
(3 rows)

but maybe that's tolerable.

Thoughts?

Thanks,
Amit



Re: bug with expression index on partition

От
Alvaro Herrera
Дата:
On 2018-Jun-21, Amit Langote wrote:

> On 2018/06/21 15:35, Amit Langote wrote:
> > So, CompareIndexInfo and generateClonedIndexStmt are both doing the right
> > thing, but DefineIndex is not.  Attached is a patch to fix DefineIndex so
> > that it converts indexParams before recursing to create the index on a
> > partition.
> 
> I noticed that while CompareIndexInfo and generateClonedIndexStmt would
> reject the case where index expressions contain a whole-row Var, my patch
> didn't teach to do the same to DefineIndex, causing asymmetric behavior.
> So, whereas ATTACH PARTITION would error out when trying to clone a
> parent's index that contains a whole-row Var, recursively creating an
> index on partition won't.
> 
> I updated the patch so that even DefineIndex will check if any whole-row
> Vars were encountered during conversion and error out if so.

Thanks.  I pushed this version, although I tweaked the test so that this
condition is verified in the test that is supposed to do so, rather than
creating a whole new set of tables for this purpose.  The way it would
fail with unpatched code is perhaps less noisy that what you proposed,
but I think it's quite enough.

I think the whole-row vars issue is worthy of some more thinking.  I'm
letting that one go for now.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: bug with expression index on partition

От
Alvaro Herrera
Дата:
On 2018-Jun-21, Amit Langote wrote:

> explain (costs off) select p from p order by p;
>               QUERY PLAN
> ---------------------------------------
>  Merge Append
>    Sort Key: ((p1.*)::p)
>    ->  Index Scan using p1_p_idx on p1
>    ->  Index Scan using p2_p_idx on p2
>    ->  Index Scan using p3_p_idx on p3
> (5 rows)

Nice, but try adding a row > operator in the where clause.

I think it's clearly desirable to allow this row-based search to use indexes;
as I recall, we mostly enable pagination of results via this kind of
constructs.  However, we're lacking planner or executor features apparently,
because a query using a row > operator does not use indexes:

create table partp (a int, b int) partition by range (a);
create table partp1 partition of partp for values from (0) to (35);
create table partp2 partition of partp for values from (35) to (100);
create index on partp1 ((partp1.*));
create index on partp2 ((partp2.*));
explain select * from partp where partp > row(0,0) order by partp limit 25 ;
                                QUERY PLAN                                
──────────────────────────────────────────────────────────────────────────
 Limit  (cost=6.69..6.75 rows=25 width=40)
   ->  Sort  (cost=6.69..6.86 rows=66 width=40)
         Sort Key: ((partp1.*)::partp)
         ->  Append  (cost=0.00..4.83 rows=66 width=40)
               ->  Seq Scan on partp1  (cost=0.00..1.88 rows=23 width=40)
                     Filter: ((partp1.*)::partp > '(0,0)'::record)
               ->  Seq Scan on partp2  (cost=0.00..2.62 rows=43 width=40)
                     Filter: ((partp2.*)::partp > '(0,0)'::record)
(8 filas)

Note the indexes are ignored, as opposed to what it does in a non-partitioned
table:

create table p (a int, b int);
create index on p((p.*));
explain select * from p where p > row(0,0) order by p limit 25 ;
                                QUERY PLAN                                 
───────────────────────────────────────────────────────────────────────────
 Limit  (cost=0.15..2.05 rows=25 width=40)
   ->  Index Scan using p_p_idx on p  (cost=0.15..57.33 rows=753 width=40)
         Index Cond: (p.* > '(0,0)'::record)
(3 filas)

So it would be good to fix this, but there are more pieces missing.  Or
there is some trick to enable the indexes to be used in that example --
if so I'm all ears.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: bug with expression index on partition

От
Amit Langote
Дата:
On 2018/06/23 5:54, Alvaro Herrera wrote:
> On 2018-Jun-21, Amit Langote wrote:
> 
>> On 2018/06/21 15:35, Amit Langote wrote:
>>> So, CompareIndexInfo and generateClonedIndexStmt are both doing the right
>>> thing, but DefineIndex is not.  Attached is a patch to fix DefineIndex so
>>> that it converts indexParams before recursing to create the index on a
>>> partition.
>>
>> I noticed that while CompareIndexInfo and generateClonedIndexStmt would
>> reject the case where index expressions contain a whole-row Var, my patch
>> didn't teach to do the same to DefineIndex, causing asymmetric behavior.
>> So, whereas ATTACH PARTITION would error out when trying to clone a
>> parent's index that contains a whole-row Var, recursively creating an
>> index on partition won't.
>>
>> I updated the patch so that even DefineIndex will check if any whole-row
>> Vars were encountered during conversion and error out if so.
> 
> Thanks.  I pushed this version, although I tweaked the test so that this
> condition is verified in the test that is supposed to do so, rather than
> creating a whole new set of tables for this purpose.  The way it would
> fail with unpatched code is perhaps less noisy that what you proposed,
> but I think it's quite enough.

Thanks.  The test changes you committed seem fine.

Regards,
Amit



Re: bug with expression index on partition

От
Amit Langote
Дата:
On 2018/06/23 6:51, Alvaro Herrera wrote:
> On 2018-Jun-21, Amit Langote wrote:
> 
>> explain (costs off) select p from p order by p;
>>               QUERY PLAN
>> ---------------------------------------
>>  Merge Append
>>    Sort Key: ((p1.*)::p)
>>    ->  Index Scan using p1_p_idx on p1
>>    ->  Index Scan using p2_p_idx on p2
>>    ->  Index Scan using p3_p_idx on p3
>> (5 rows)
> 
> Nice, but try adding a row > operator in the where clause.
> 
> I think it's clearly desirable to allow this row-based search to use indexes;
> as I recall, we mostly enable pagination of results via this kind of
> constructs.  However, we're lacking planner or executor features apparently,
> because a query using a row > operator does not use indexes:
> 
> create table partp (a int, b int) partition by range (a);
> create table partp1 partition of partp for values from (0) to (35);
> create table partp2 partition of partp for values from (35) to (100);
> create index on partp1 ((partp1.*));
> create index on partp2 ((partp2.*));
> explain select * from partp where partp > row(0,0) order by partp limit 25 ;
>                                 QUERY PLAN                                
> ──────────────────────────────────────────────────────────────────────────
>  Limit  (cost=6.69..6.75 rows=25 width=40)
>    ->  Sort  (cost=6.69..6.86 rows=66 width=40)
>          Sort Key: ((partp1.*)::partp)
>          ->  Append  (cost=0.00..4.83 rows=66 width=40)
>                ->  Seq Scan on partp1  (cost=0.00..1.88 rows=23 width=40)
>                      Filter: ((partp1.*)::partp > '(0,0)'::record)
>                ->  Seq Scan on partp2  (cost=0.00..2.62 rows=43 width=40)
>                      Filter: ((partp2.*)::partp > '(0,0)'::record)
> (8 filas)
> 
> Note the indexes are ignored, as opposed to what it does in a non-partitioned
> table:

Ah, yes.

IIUC, that happens because any whole-row Vars in WHERE quals and
EquivalenceClass expressions corresponding to child relations each has a
ConvertRowtypeExpr on top, whereas, a child index's expressions read off
pg_index doesn't contain ConvertRowtypeExpr expressions.  So, WHERE quals
and/or ORDER BY expressions containing references to the parent's
whole-row Vars cannot be matched to a child index containing same
whole-row Vars.

It's a bit unfortunate that the WHERE quals and EC expressions are
transformed such that they contain ConvertRowtypeExpr nodes at a point
where they're perhaps not necessary (such as the point when a WHERE clause
or EC expression is matched with an index expression).  A related
discussion is underway on a nearby thread titled "Expression errors with
"FOR UPDATE" and postgres_fdw with partition wise join enabled", so it'd
be nice if that thread concludes such that whole-row child indexes start
becoming useful.

Thanks,
Amit