Обсуждение: bug with expression index on partition
Hi. I noticed a bug with creating an expression index on a partitioned table. The fact that a partition may have differing attribute numbers is not accounted for when recursively *creating* the index on partition. The partition index gets created but is unusable. create table p (a int) partition by hash (a); create table p1 partition of p for values with (modulus 3, remainder 0); create table p2 partition of p for values with (modulus 3, remainder 1); create table p3 partition of p for values with (modulus 3, remainder 2); -- make p3 have different attribute number for column a alter table p detach partition p3; alter table p3 drop a, add a int; alter table p attach partition p3 for values with (modulus 3, remainder 2); create index on p ((a+1)); \d p3 ERROR: invalid attnum 1 for relation "p3" That's because the IndexStmt that's passed for creating the index on p3 contains expressions whose Vars are exact copies of the parent's Vars, which in this case is wrong. We must have converted them to p3's attribute numbers before calling DefineIndex on p3. Note that if the same index already exists in a partition before creating the parent's index, then that index already contains the right Vars, no conversion is needed in that case. alter table p detach partition p3; alter table p3 drop a, add a int; create index on p3 ((a+1)); alter table p attach partition p3 for values with (modulus 3, remainder 2); \d p3 Table "public.p3" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- a | integer | | | Partition of: p FOR VALUES WITH (modulus 3, remainder 2) Indexes: "p3_expr_idx" btree ((a + 1)) create index on p ((a+1)); \d p3 Table "public.p3" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- a | integer | | | Partition of: p FOR VALUES WITH (modulus 3, remainder 2) Indexes: "p3_expr_idx" btree ((a + 1)) In other words, the code that checks whether an index with matching definition exists in the partition correctly converts Vars before the actual matching, so is able to conclude that, despite different Vars in p3's index, it is same index as the one being created in the parent. So, the index is not created again, which actually if it were, it would be hit by the very same bug I'm reporting. Also is right the case where the partition being attached doesn't have the index but the parent has and it is correctly *cloned* to the attached partition. alter table p detach partition p3; alter table p3 drop a, add a int; create index on p ((a+1)); alter table p attach partition p3 for values with (modulus 3, remainder 2); \d p3 Table "public.p3" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- a | integer | | | Partition of: p FOR VALUES WITH (modulus 3, remainder 2) Indexes: "p3_expr_idx" btree ((a + 1)) So, CompareIndexInfo and generateClonedIndexStmt are both doing the right thing, but DefineIndex is not. Attached is a patch to fix DefineIndex so that it converts indexParams before recursing to create the index on a partition. By the way, do we want to do anything about the fact that we don't check if a matching inherited index exists when creating an index directly on the partition. I wondered this because we do check the other way around. \d p3 Table "public.p3" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- a | integer | | | Partition of: p FOR VALUES WITH (modulus 3, remainder 2) Indexes: "p3_expr_idx" btree ((a + 1)) create index on p3 ((a+1)); \d p3 Table "public.p3" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- a | integer | | | Partition of: p FOR VALUES WITH (modulus 3, remainder 2) Indexes: "p3_expr_idx" btree ((a + 1)) "p3_expr_idx1" btree ((a + 1)) Thanks, Amit
Вложения
On 2018/06/21 15:35, Amit Langote wrote: > So, CompareIndexInfo and generateClonedIndexStmt are both doing the right > thing, but DefineIndex is not. Attached is a patch to fix DefineIndex so > that it converts indexParams before recursing to create the index on a > partition. I noticed that while CompareIndexInfo and generateClonedIndexStmt would reject the case where index expressions contain a whole-row Var, my patch didn't teach to do the same to DefineIndex, causing asymmetric behavior. So, whereas ATTACH PARTITION would error out when trying to clone a parent's index that contains a whole-row Var, recursively creating an index on partition won't. I updated the patch so that even DefineIndex will check if any whole-row Vars were encountered during conversion and error out if so. Thanks, Amit
Вложения
On 2018/06/21 16:19, Amit Langote wrote: > I updated the patch so that even DefineIndex will check if any whole-row > Vars were encountered during conversion and error out if so. I first thought of starting a new thread for this, but thought I'd just reply here because the affected code is nearby. I was wondering if it wouldn't hurt to allow whole-row vars to be present in the expressions of inherited indexes. If we did allow it, the query shown in the example below is able to use the indexes on partitions. create table p (a int) partition by hash (a); create table p1 partition of p for values with (modulus 3, remainder 0); create table p2 partition of p for values with (modulus 3, remainder 1); create table p3 partition of p for values with (modulus 3, remainder 2); create index on p ((p)); explain (costs off) select p from p order by p; QUERY PLAN --------------------------------------- Merge Append Sort Key: ((p1.*)::p) -> Index Scan using p1_p_idx on p1 -> Index Scan using p2_p_idx on p2 -> Index Scan using p3_p_idx on p3 (5 rows) After applying the patch in my last email, each of generateClonedIndexStmt, CompareIndexInfo, and DefineIndex reject inheriting an index if its expressions are found to contain whole-row vars. Now, one can create those indexes on partitions individually, but they cannot be matched to an ORDER BY clause of a query accessing those partitions via the parent table. drop index p_p_idx; create index on p1 ((p1)); create index on p2 ((p2)); create index on p3 ((p3)); explain (costs off) select p from p order by p; QUERY PLAN ---------------------------- Sort Sort Key: ((p1.*)::p) -> Append -> Seq Scan on p1 -> Seq Scan on p2 -> Seq Scan on p3 (6 rows) It is of course usable if partition's accessed directly. explain (costs off) select p1 from p1 order by p1; QUERY PLAN ---------------------------------- Index Scan using p1_p1_idx on p1 (1 row) OTOH, an inherited index with whole-row vars (if we decide to start allowing them as I'm proposing) cannot be used if partition's accessed directly. drop index p1_p1_idx; create index on p ((p)); explain (costs off) select p1 from p1 order by p1; QUERY PLAN ---------------------- Sort Sort Key: p1.* -> Seq Scan on p1 (3 rows) but maybe that's tolerable. Thoughts? Thanks, Amit
On 2018-Jun-21, Amit Langote wrote: > On 2018/06/21 15:35, Amit Langote wrote: > > So, CompareIndexInfo and generateClonedIndexStmt are both doing the right > > thing, but DefineIndex is not. Attached is a patch to fix DefineIndex so > > that it converts indexParams before recursing to create the index on a > > partition. > > I noticed that while CompareIndexInfo and generateClonedIndexStmt would > reject the case where index expressions contain a whole-row Var, my patch > didn't teach to do the same to DefineIndex, causing asymmetric behavior. > So, whereas ATTACH PARTITION would error out when trying to clone a > parent's index that contains a whole-row Var, recursively creating an > index on partition won't. > > I updated the patch so that even DefineIndex will check if any whole-row > Vars were encountered during conversion and error out if so. Thanks. I pushed this version, although I tweaked the test so that this condition is verified in the test that is supposed to do so, rather than creating a whole new set of tables for this purpose. The way it would fail with unpatched code is perhaps less noisy that what you proposed, but I think it's quite enough. I think the whole-row vars issue is worthy of some more thinking. I'm letting that one go for now. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-Jun-21, Amit Langote wrote: > explain (costs off) select p from p order by p; > QUERY PLAN > --------------------------------------- > Merge Append > Sort Key: ((p1.*)::p) > -> Index Scan using p1_p_idx on p1 > -> Index Scan using p2_p_idx on p2 > -> Index Scan using p3_p_idx on p3 > (5 rows) Nice, but try adding a row > operator in the where clause. I think it's clearly desirable to allow this row-based search to use indexes; as I recall, we mostly enable pagination of results via this kind of constructs. However, we're lacking planner or executor features apparently, because a query using a row > operator does not use indexes: create table partp (a int, b int) partition by range (a); create table partp1 partition of partp for values from (0) to (35); create table partp2 partition of partp for values from (35) to (100); create index on partp1 ((partp1.*)); create index on partp2 ((partp2.*)); explain select * from partp where partp > row(0,0) order by partp limit 25 ; QUERY PLAN ────────────────────────────────────────────────────────────────────────── Limit (cost=6.69..6.75 rows=25 width=40) -> Sort (cost=6.69..6.86 rows=66 width=40) Sort Key: ((partp1.*)::partp) -> Append (cost=0.00..4.83 rows=66 width=40) -> Seq Scan on partp1 (cost=0.00..1.88 rows=23 width=40) Filter: ((partp1.*)::partp > '(0,0)'::record) -> Seq Scan on partp2 (cost=0.00..2.62 rows=43 width=40) Filter: ((partp2.*)::partp > '(0,0)'::record) (8 filas) Note the indexes are ignored, as opposed to what it does in a non-partitioned table: create table p (a int, b int); create index on p((p.*)); explain select * from p where p > row(0,0) order by p limit 25 ; QUERY PLAN ─────────────────────────────────────────────────────────────────────────── Limit (cost=0.15..2.05 rows=25 width=40) -> Index Scan using p_p_idx on p (cost=0.15..57.33 rows=753 width=40) Index Cond: (p.* > '(0,0)'::record) (3 filas) So it would be good to fix this, but there are more pieces missing. Or there is some trick to enable the indexes to be used in that example -- if so I'm all ears. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018/06/23 5:54, Alvaro Herrera wrote: > On 2018-Jun-21, Amit Langote wrote: > >> On 2018/06/21 15:35, Amit Langote wrote: >>> So, CompareIndexInfo and generateClonedIndexStmt are both doing the right >>> thing, but DefineIndex is not. Attached is a patch to fix DefineIndex so >>> that it converts indexParams before recursing to create the index on a >>> partition. >> >> I noticed that while CompareIndexInfo and generateClonedIndexStmt would >> reject the case where index expressions contain a whole-row Var, my patch >> didn't teach to do the same to DefineIndex, causing asymmetric behavior. >> So, whereas ATTACH PARTITION would error out when trying to clone a >> parent's index that contains a whole-row Var, recursively creating an >> index on partition won't. >> >> I updated the patch so that even DefineIndex will check if any whole-row >> Vars were encountered during conversion and error out if so. > > Thanks. I pushed this version, although I tweaked the test so that this > condition is verified in the test that is supposed to do so, rather than > creating a whole new set of tables for this purpose. The way it would > fail with unpatched code is perhaps less noisy that what you proposed, > but I think it's quite enough. Thanks. The test changes you committed seem fine. Regards, Amit
On 2018/06/23 6:51, Alvaro Herrera wrote: > On 2018-Jun-21, Amit Langote wrote: > >> explain (costs off) select p from p order by p; >> QUERY PLAN >> --------------------------------------- >> Merge Append >> Sort Key: ((p1.*)::p) >> -> Index Scan using p1_p_idx on p1 >> -> Index Scan using p2_p_idx on p2 >> -> Index Scan using p3_p_idx on p3 >> (5 rows) > > Nice, but try adding a row > operator in the where clause. > > I think it's clearly desirable to allow this row-based search to use indexes; > as I recall, we mostly enable pagination of results via this kind of > constructs. However, we're lacking planner or executor features apparently, > because a query using a row > operator does not use indexes: > > create table partp (a int, b int) partition by range (a); > create table partp1 partition of partp for values from (0) to (35); > create table partp2 partition of partp for values from (35) to (100); > create index on partp1 ((partp1.*)); > create index on partp2 ((partp2.*)); > explain select * from partp where partp > row(0,0) order by partp limit 25 ; > QUERY PLAN > ────────────────────────────────────────────────────────────────────────── > Limit (cost=6.69..6.75 rows=25 width=40) > -> Sort (cost=6.69..6.86 rows=66 width=40) > Sort Key: ((partp1.*)::partp) > -> Append (cost=0.00..4.83 rows=66 width=40) > -> Seq Scan on partp1 (cost=0.00..1.88 rows=23 width=40) > Filter: ((partp1.*)::partp > '(0,0)'::record) > -> Seq Scan on partp2 (cost=0.00..2.62 rows=43 width=40) > Filter: ((partp2.*)::partp > '(0,0)'::record) > (8 filas) > > Note the indexes are ignored, as opposed to what it does in a non-partitioned > table: Ah, yes. IIUC, that happens because any whole-row Vars in WHERE quals and EquivalenceClass expressions corresponding to child relations each has a ConvertRowtypeExpr on top, whereas, a child index's expressions read off pg_index doesn't contain ConvertRowtypeExpr expressions. So, WHERE quals and/or ORDER BY expressions containing references to the parent's whole-row Vars cannot be matched to a child index containing same whole-row Vars. It's a bit unfortunate that the WHERE quals and EC expressions are transformed such that they contain ConvertRowtypeExpr nodes at a point where they're perhaps not necessary (such as the point when a WHERE clause or EC expression is matched with an index expression). A related discussion is underway on a nearby thread titled "Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled", so it'd be nice if that thread concludes such that whole-row child indexes start becoming useful. Thanks, Amit