Обсуждение: should check collations when creating partitioned index
When creating a partitioned index, the partition key must be a subset of the index's columns. DefineIndex() explains: * If this table is partitioned and we're creating a unique index, primary * key, or exclusion constraint, make sure that the partition key is a * subset of the index's columns. Otherwise it would be possible to * violate uniqueness by putting values that ought to be unique in * different partitions. But this currently doesn't check that the collations between the partition key and the index definition match. So you can construct a unique index that fails to enforce uniqueness. Here is a non-partitioned case for reference: create collation case_insensitive (provider=icu, locale='und-u-ks-level2', deterministic=false); create table t0 (a int, b text); create unique index i0 on t0 (b collate case_insensitive); insert into t0 values (1, 'a'), (2, 'A'); -- violates unique constraint Here is a partitioned case that doesn't work correctly: create table t1 (a int, b text) partition by hash (b); create table t1a partition of t1 for values with (modulus 2, remainder 0); create table t1b partition of t1 for values with (modulus 2, remainder 1); create unique index i1 on t1 (b collate case_insensitive); insert into t1 values (1, 'a'), (2, 'A'); -- this succeeds The attached patch adds the required collation check. In the example, it would not allow the index i1 to be created.
Вложения
On Mon, 2023-11-13 at 10:24 +0100, Peter Eisentraut wrote: > * If this table is partitioned and we're creating a unique index, primary > * key, or exclusion constraint, make sure that the partition key is a > * subset of the index's columns. Otherwise it would be possible to > * violate uniqueness by putting values that ought to be unique in > * different partitions. > > But this currently doesn't check that the collations between the > partition key and the index definition match. So you can construct a > unique index that fails to enforce uniqueness. > > Here is a partitioned case that doesn't work correctly: > > create collation case_insensitive (provider=icu, > locale='und-u-ks-level2', deterministic=false); > create table t1 (a int, b text) partition by hash (b); > create table t1a partition of t1 for values with (modulus 2, remainder 0); > create table t1b partition of t1 for values with (modulus 2, remainder 1); > create unique index i1 on t1 (b collate case_insensitive); > insert into t1 values (1, 'a'), (2, 'A'); -- this succeeds > > The attached patch adds the required collation check. In the example, > it would not allow the index i1 to be created. The patch looks good, but I think the error message needs love: > + ereport(ERROR, > + errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + errmsg("collation of column \"%s\" does not match between partition key and index definition", > + NameStr(att->attname))); "does not match between" sounds weird. How about collation of index column \"%s\" must match collation of the partitioning key column This will be backpatched, right? What if somebody already created an index like that? Does this warrant an entry in the "however" for the release notes, or is the case exotic enough that we can assume that nobody is affected? Yours, Laurenz Albe
On 13.11.23 21:04, Laurenz Albe wrote: > On Mon, 2023-11-13 at 10:24 +0100, Peter Eisentraut wrote: >> * If this table is partitioned and we're creating a unique index, primary >> * key, or exclusion constraint, make sure that the partition key is a >> * subset of the index's columns. Otherwise it would be possible to >> * violate uniqueness by putting values that ought to be unique in >> * different partitions. >> >> But this currently doesn't check that the collations between the >> partition key and the index definition match. So you can construct a >> unique index that fails to enforce uniqueness. >> >> Here is a partitioned case that doesn't work correctly: >> >> create collation case_insensitive (provider=icu, >> locale='und-u-ks-level2', deterministic=false); >> create table t1 (a int, b text) partition by hash (b); >> create table t1a partition of t1 for values with (modulus 2, remainder 0); >> create table t1b partition of t1 for values with (modulus 2, remainder 1); >> create unique index i1 on t1 (b collate case_insensitive); >> insert into t1 values (1, 'a'), (2, 'A'); -- this succeeds >> >> The attached patch adds the required collation check. In the example, >> it would not allow the index i1 to be created. > > The patch looks good, but I think the error message needs love: > >> + ereport(ERROR, >> + errcode(ERRCODE_FEATURE_NOT_SUPPORTED), >> + errmsg("collation of column \"%s\" does not match between partition key and index definition", >> + NameStr(att->attname))); > > "does not match between" sounds weird. How about > > collation of index column \"%s\" must match collation of the partitioning key column > > This will be backpatched, right? What if somebody already created an index like that? > Does this warrant an entry in the "however" for the release notes, or is the case > exotic enough that we can assume that nobody is affected? I think it's exotic enough that I wouldn't bother backpatching it. But I welcome input on this.
Peter Eisentraut <peter@eisentraut.org> writes: > On 13.11.23 21:04, Laurenz Albe wrote: >> This will be backpatched, right? What if somebody already created an index like that? >> Does this warrant an entry in the "however" for the release notes, or is the case >> exotic enough that we can assume that nobody is affected? > I think it's exotic enough that I wouldn't bother backpatching it. But > I welcome input on this. I think it should be back-patched. I don't love the patch details though. It seems entirely wrong to check this before we check the opclass match. Also, in at least some cases the code presses on looking for another match if the current opclass doesn't match; you've broken such cases. regards, tom lane
On Mon, 2023-11-13 at 10:24 +0100, Peter Eisentraut wrote: > create table t1 (a int, b text) partition by hash (b); > create table t1a partition of t1 for values with (modulus 2, > remainder 0); > create table t1b partition of t1 for values with (modulus 2, > remainder 1); > create unique index i1 on t1 (b collate case_insensitive); > insert into t1 values (1, 'a'), (2, 'A'); -- this succeeds > > The attached patch adds the required collation check. In the > example, > it would not allow the index i1 to be created. In the patch, you check for an exact collation match. Considering this case only depends on equality, I think it would be correct if the requirement was that (a) both collations are deterministic; or (b) the collations match exactly. This is related to the discussion here: https://postgr.es/m/b7a9f32eee8d24518f791168bc6fb653d1f95f4d.camel@j-davis.com Regards, Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > In the patch, you check for an exact collation match. Considering this > case only depends on equality, I think it would be correct if the > requirement was that (a) both collations are deterministic; or (b) the > collations match exactly. You keep harping on this idea that we are only concerned with equality, but I think you are wrong. We expect a btree index to provide ordering not only equality, and this example definitely is a btree index. Possibly, with a great deal more specificity added to the check, we could distinguish the cases where ordering can't matter and allow collation variance then. I do not see the value of that, especially not when measured against the risk of introducing subtle bugs. regards, tom lane
On Fri, 2023-11-17 at 15:18 -0500, Tom Lane wrote: > You keep harping on this idea that we are only concerned with > equality, > but I think you are wrong. We expect a btree index to provide > ordering > not only equality, and this example definitely is a btree index. > > Possibly, with a great deal more specificity added to the check, we > could distinguish the cases where ordering can't matter and allow > collation variance then. I do not see the value of that, especially > not when measured against the risk of introducing subtle bugs. Fair point. As background, I don't see a complete solution to our collation problems and on the horizon. You've probably noticed that I'm looking for various ways to mitigate the problem, and this thread was about reducing the number of situations in which we rely on collation. I'll focus on other potential improvements/mitigations and see if I can make progress somewhere else. Regards, Jeff Davis
On 14.11.23 17:15, Tom Lane wrote: > I don't love the patch details though. It seems entirely wrong to check > this before we check the opclass match. Not sure why? The order doesn't seem to matter? > Also, in at least some cases > the code presses on looking for another match if the current opclass > doesn't match; you've broken such cases. I see. That means we shouldn't raise an error on a mismatch but just do if (key->partcollation[i] != collationIds[j]) continue; and then let the existing error under if (!found) complain. I suppose we could move that into the if (get_opclass_opfamily_and_input_type(...)) block. I'm not sure I see the difference.
Peter Eisentraut <peter@eisentraut.org> writes: > On 14.11.23 17:15, Tom Lane wrote: >> I don't love the patch details though. It seems entirely wrong to check >> this before we check the opclass match. > Not sure why? The order doesn't seem to matter? The case that was bothering me was if we had a non-collated type versus a collated type. That would result in throwing an error about collation mismatch, when complaining about the opclass seems more apropos. However, if we do this: > I see. That means we shouldn't raise an error on a mismatch but just do > if (key->partcollation[i] != collationIds[j]) > continue; it might not matter much. regards, tom lane
On 20.11.23 17:25, Tom Lane wrote: > Peter Eisentraut <peter@eisentraut.org> writes: >> On 14.11.23 17:15, Tom Lane wrote: >>> I don't love the patch details though. It seems entirely wrong to check >>> this before we check the opclass match. > >> Not sure why? The order doesn't seem to matter? > > The case that was bothering me was if we had a non-collated type > versus a collated type. That would result in throwing an error > about collation mismatch, when complaining about the opclass seems > more apropos. However, if we do this: > >> I see. That means we shouldn't raise an error on a mismatch but just do >> if (key->partcollation[i] != collationIds[j]) >> continue; > > it might not matter much. Here is an updated patch that works as indicated above. The behavior if you try to create an index with mismatching collations now is that it will skip over the column and complain at the end with something like ERROR: 0A000: unique constraint on partitioned table must include all partitioning columns DETAIL: UNIQUE constraint on table "t1" lacks column "b" which is part of the partition key. which perhaps isn't intuitive, but I think it would be the same if you somehow tried to build an index with different operator classes than the partitioning. I think these less-specific error messages are ok in such edge cases.
Вложения
On 23.11.23 11:01, Peter Eisentraut wrote: > On 20.11.23 17:25, Tom Lane wrote: >> Peter Eisentraut <peter@eisentraut.org> writes: >>> On 14.11.23 17:15, Tom Lane wrote: >>>> I don't love the patch details though. It seems entirely wrong to >>>> check >>>> this before we check the opclass match. >> >>> Not sure why? The order doesn't seem to matter? >> >> The case that was bothering me was if we had a non-collated type >> versus a collated type. That would result in throwing an error >> about collation mismatch, when complaining about the opclass seems >> more apropos. However, if we do this: >> >>> I see. That means we shouldn't raise an error on a mismatch but just do >>> if (key->partcollation[i] != collationIds[j]) >>> continue; >> >> it might not matter much. > > Here is an updated patch that works as indicated above. > > The behavior if you try to create an index with mismatching collations > now is that it will skip over the column and complain at the end with > something like > > ERROR: 0A000: unique constraint on partitioned table must include all > partitioning columns > DETAIL: UNIQUE constraint on table "t1" lacks column "b" which is part > of the partition key. > > which perhaps isn't intuitive, but I think it would be the same if you > somehow tried to build an index with different operator classes than the > partitioning. I think these less-specific error messages are ok in such > edge cases. If there are no further comments on this patch version, I plan to go ahead and commit it soon.
Peter Eisentraut <peter@eisentraut.org> writes: >> Here is an updated patch that works as indicated above. >> >> The behavior if you try to create an index with mismatching collations >> now is that it will skip over the column and complain at the end with >> something like >> >> ERROR: 0A000: unique constraint on partitioned table must include all >> partitioning columns >> DETAIL: UNIQUE constraint on table "t1" lacks column "b" which is part >> of the partition key. >> >> which perhaps isn't intuitive, but I think it would be the same if you >> somehow tried to build an index with different operator classes than the >> partitioning. I think these less-specific error messages are ok in such >> edge cases. > If there are no further comments on this patch version, I plan to go > ahead and commit it soon. Sorry for slow response --- I've been dealing with a little too much $REAL_LIFE lately. Anyway, I'm content with the v2 patch. I see that the existing code works a little harder than this to produce an on-point error message for mismatching operator, but after studying that I'm not totally convinced that it's ideal behavior either. I think we can wait for some field complaints to see if we need a better error message for mismatching collation, and if so what the shape of the bad input is exactly. regards, tom lane