Обсуждение: should check collations when creating partitioned index

Поиск
Список
Период
Сортировка

should check collations when creating partitioned index

От
Peter Eisentraut
Дата:
When creating a partitioned index, the partition key must be a subset of 
the index's columns.  DefineIndex() explains:

      * If this table is partitioned and we're creating a unique index, 
primary
      * key, or exclusion constraint, make sure that the partition key is a
      * subset of the index's columns.  Otherwise it would be possible to
      * violate uniqueness by putting values that ought to be unique in
      * different partitions.

But this currently doesn't check that the collations between the 
partition key and the index definition match.  So you can construct a 
unique index that fails to enforce uniqueness.

Here is a non-partitioned case for reference:

create collation case_insensitive (provider=icu, 
locale='und-u-ks-level2', deterministic=false);
create table t0 (a int, b text);
create unique index i0 on t0 (b collate case_insensitive);
insert into t0 values (1, 'a'), (2, 'A');  -- violates unique constraint

Here is a partitioned case that doesn't work correctly:

create table t1 (a int, b text) partition by hash (b);
create table t1a partition of t1 for values with (modulus 2, remainder 0);
create table t1b partition of t1 for values with (modulus 2, remainder 1);
create unique index i1 on t1 (b collate case_insensitive);
insert into t1 values (1, 'a'), (2, 'A');  -- this succeeds

The attached patch adds the required collation check.  In the example, 
it would not allow the index i1 to be created.
Вложения

Re: should check collations when creating partitioned index

От
Laurenz Albe
Дата:
On Mon, 2023-11-13 at 10:24 +0100, Peter Eisentraut wrote:
>       * If this table is partitioned and we're creating a unique index, primary
>       * key, or exclusion constraint, make sure that the partition key is a
>       * subset of the index's columns.  Otherwise it would be possible to
>       * violate uniqueness by putting values that ought to be unique in
>       * different partitions.
>
> But this currently doesn't check that the collations between the
> partition key and the index definition match.  So you can construct a
> unique index that fails to enforce uniqueness.
>
> Here is a partitioned case that doesn't work correctly:
>
> create collation case_insensitive (provider=icu,
> locale='und-u-ks-level2', deterministic=false);
> create table t1 (a int, b text) partition by hash (b);
> create table t1a partition of t1 for values with (modulus 2, remainder 0);
> create table t1b partition of t1 for values with (modulus 2, remainder 1);
> create unique index i1 on t1 (b collate case_insensitive);
> insert into t1 values (1, 'a'), (2, 'A');  -- this succeeds
>
> The attached patch adds the required collation check.  In the example,
> it would not allow the index i1 to be created.

The patch looks good, but I think the error message needs love:

> +        ereport(ERROR,
> +            errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +            errmsg("collation of column \"%s\" does not match between partition key and index definition",
> +                   NameStr(att->attname)));

"does not match between" sounds weird.  How about

  collation of index column \"%s\" must match collation of the partitioning key column

This will be backpatched, right?  What if somebody already created an index like that?
Does this warrant an entry in the "however" for the release notes, or is the case
exotic enough that we can assume that nobody is affected?

Yours,
Laurenz Albe



Re: should check collations when creating partitioned index

От
Peter Eisentraut
Дата:
On 13.11.23 21:04, Laurenz Albe wrote:
> On Mon, 2023-11-13 at 10:24 +0100, Peter Eisentraut wrote:
>>        * If this table is partitioned and we're creating a unique index, primary
>>        * key, or exclusion constraint, make sure that the partition key is a
>>        * subset of the index's columns.  Otherwise it would be possible to
>>        * violate uniqueness by putting values that ought to be unique in
>>        * different partitions.
>>
>> But this currently doesn't check that the collations between the
>> partition key and the index definition match.  So you can construct a
>> unique index that fails to enforce uniqueness.
>>
>> Here is a partitioned case that doesn't work correctly:
>>
>> create collation case_insensitive (provider=icu,
>> locale='und-u-ks-level2', deterministic=false);
>> create table t1 (a int, b text) partition by hash (b);
>> create table t1a partition of t1 for values with (modulus 2, remainder 0);
>> create table t1b partition of t1 for values with (modulus 2, remainder 1);
>> create unique index i1 on t1 (b collate case_insensitive);
>> insert into t1 values (1, 'a'), (2, 'A');  -- this succeeds
>>
>> The attached patch adds the required collation check.  In the example,
>> it would not allow the index i1 to be created.
> 
> The patch looks good, but I think the error message needs love:
> 
>> +        ereport(ERROR,
>> +            errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> +            errmsg("collation of column \"%s\" does not match between partition key and index definition",
>> +                   NameStr(att->attname)));
> 
> "does not match between" sounds weird.  How about
> 
>    collation of index column \"%s\" must match collation of the partitioning key column
> 
> This will be backpatched, right?  What if somebody already created an index like that?
> Does this warrant an entry in the "however" for the release notes, or is the case
> exotic enough that we can assume that nobody is affected?

I think it's exotic enough that I wouldn't bother backpatching it.  But 
I welcome input on this.




Re: should check collations when creating partitioned index

От
Tom Lane
Дата:
Peter Eisentraut <peter@eisentraut.org> writes:
> On 13.11.23 21:04, Laurenz Albe wrote:
>> This will be backpatched, right?  What if somebody already created an index like that?
>> Does this warrant an entry in the "however" for the release notes, or is the case
>> exotic enough that we can assume that nobody is affected?

> I think it's exotic enough that I wouldn't bother backpatching it.  But
> I welcome input on this.

I think it should be back-patched.

I don't love the patch details though.  It seems entirely wrong to check
this before we check the opclass match.  Also, in at least some cases
the code presses on looking for another match if the current opclass
doesn't match; you've broken such cases.

            regards, tom lane



Re: should check collations when creating partitioned index

От
Jeff Davis
Дата:
On Mon, 2023-11-13 at 10:24 +0100, Peter Eisentraut wrote:
> create table t1 (a int, b text) partition by hash (b);
> create table t1a partition of t1 for values with (modulus 2,
> remainder 0);
> create table t1b partition of t1 for values with (modulus 2,
> remainder 1);
> create unique index i1 on t1 (b collate case_insensitive);
> insert into t1 values (1, 'a'), (2, 'A');  -- this succeeds
>
> The attached patch adds the required collation check.  In the
> example,
> it would not allow the index i1 to be created.

In the patch, you check for an exact collation match. Considering this
case only depends on equality, I think it would be correct if the
requirement was that (a) both collations are deterministic; or (b) the
collations match exactly.

This is related to the discussion here:

https://postgr.es/m/b7a9f32eee8d24518f791168bc6fb653d1f95f4d.camel@j-davis.com

Regards,
    Jeff Davis




Re: should check collations when creating partitioned index

От
Tom Lane
Дата:
Jeff Davis <pgsql@j-davis.com> writes:
> In the patch, you check for an exact collation match. Considering this
> case only depends on equality, I think it would be correct if the
> requirement was that (a) both collations are deterministic; or (b) the
> collations match exactly.

You keep harping on this idea that we are only concerned with equality,
but I think you are wrong.  We expect a btree index to provide ordering
not only equality, and this example definitely is a btree index.

Possibly, with a great deal more specificity added to the check, we
could distinguish the cases where ordering can't matter and allow
collation variance then.  I do not see the value of that, especially
not when measured against the risk of introducing subtle bugs.

            regards, tom lane



Re: should check collations when creating partitioned index

От
Jeff Davis
Дата:
On Fri, 2023-11-17 at 15:18 -0500, Tom Lane wrote:
> You keep harping on this idea that we are only concerned with
> equality,
> but I think you are wrong.  We expect a btree index to provide
> ordering
> not only equality, and this example definitely is a btree index.
>
> Possibly, with a great deal more specificity added to the check, we
> could distinguish the cases where ordering can't matter and allow
> collation variance then.  I do not see the value of that, especially
> not when measured against the risk of introducing subtle bugs.

Fair point.

As background, I don't see a complete solution to our collation
problems and on the horizon. You've probably noticed that I'm looking
for various ways to mitigate the problem, and this thread was about
reducing the number of situations in which we rely on collation.

I'll focus on other potential improvements/mitigations and see if I can
make progress somewhere else.

Regards,
    Jeff Davis




Re: should check collations when creating partitioned index

От
Peter Eisentraut
Дата:
On 14.11.23 17:15, Tom Lane wrote:
> I don't love the patch details though.  It seems entirely wrong to check
> this before we check the opclass match.

Not sure why?  The order doesn't seem to matter?

> Also, in at least some cases
> the code presses on looking for another match if the current opclass
> doesn't match; you've broken such cases.

I see.  That means we shouldn't raise an error on a mismatch but just do

     if (key->partcollation[i] != collationIds[j])
         continue;

and then let the existing error under if (!found) complain.

I suppose we could move that into the

     if (get_opclass_opfamily_and_input_type(...))

block.  I'm not sure I see the difference.




Re: should check collations when creating partitioned index

От
Tom Lane
Дата:
Peter Eisentraut <peter@eisentraut.org> writes:
> On 14.11.23 17:15, Tom Lane wrote:
>> I don't love the patch details though.  It seems entirely wrong to check
>> this before we check the opclass match.

> Not sure why?  The order doesn't seem to matter?

The case that was bothering me was if we had a non-collated type
versus a collated type.  That would result in throwing an error
about collation mismatch, when complaining about the opclass seems
more apropos.  However, if we do this:

> I see.  That means we shouldn't raise an error on a mismatch but just do
>      if (key->partcollation[i] != collationIds[j])
>          continue;

it might not matter much.

            regards, tom lane



Re: should check collations when creating partitioned index

От
Peter Eisentraut
Дата:
On 20.11.23 17:25, Tom Lane wrote:
> Peter Eisentraut <peter@eisentraut.org> writes:
>> On 14.11.23 17:15, Tom Lane wrote:
>>> I don't love the patch details though.  It seems entirely wrong to check
>>> this before we check the opclass match.
> 
>> Not sure why?  The order doesn't seem to matter?
> 
> The case that was bothering me was if we had a non-collated type
> versus a collated type.  That would result in throwing an error
> about collation mismatch, when complaining about the opclass seems
> more apropos.  However, if we do this:
> 
>> I see.  That means we shouldn't raise an error on a mismatch but just do
>>       if (key->partcollation[i] != collationIds[j])
>>           continue;
> 
> it might not matter much.

Here is an updated patch that works as indicated above.

The behavior if you try to create an index with mismatching collations 
now is that it will skip over the column and complain at the end with 
something like

ERROR:  0A000: unique constraint on partitioned table must include all 
partitioning columns
DETAIL:  UNIQUE constraint on table "t1" lacks column "b" which is part 
of the partition key.

which perhaps isn't intuitive, but I think it would be the same if you 
somehow tried to build an index with different operator classes than the 
partitioning.  I think these less-specific error messages are ok in such 
edge cases.
Вложения

Re: should check collations when creating partitioned index

От
Peter Eisentraut
Дата:
On 23.11.23 11:01, Peter Eisentraut wrote:
> On 20.11.23 17:25, Tom Lane wrote:
>> Peter Eisentraut <peter@eisentraut.org> writes:
>>> On 14.11.23 17:15, Tom Lane wrote:
>>>> I don't love the patch details though.  It seems entirely wrong to 
>>>> check
>>>> this before we check the opclass match.
>>
>>> Not sure why?  The order doesn't seem to matter?
>>
>> The case that was bothering me was if we had a non-collated type
>> versus a collated type.  That would result in throwing an error
>> about collation mismatch, when complaining about the opclass seems
>> more apropos.  However, if we do this:
>>
>>> I see.  That means we shouldn't raise an error on a mismatch but just do
>>>       if (key->partcollation[i] != collationIds[j])
>>>           continue;
>>
>> it might not matter much.
> 
> Here is an updated patch that works as indicated above.
> 
> The behavior if you try to create an index with mismatching collations 
> now is that it will skip over the column and complain at the end with 
> something like
> 
> ERROR:  0A000: unique constraint on partitioned table must include all 
> partitioning columns
> DETAIL:  UNIQUE constraint on table "t1" lacks column "b" which is part 
> of the partition key.
> 
> which perhaps isn't intuitive, but I think it would be the same if you 
> somehow tried to build an index with different operator classes than the 
> partitioning.  I think these less-specific error messages are ok in such 
> edge cases.

If there are no further comments on this patch version, I plan to go 
ahead and commit it soon.




Re: should check collations when creating partitioned index

От
Tom Lane
Дата:
Peter Eisentraut <peter@eisentraut.org> writes:
>> Here is an updated patch that works as indicated above.
>> 
>> The behavior if you try to create an index with mismatching collations 
>> now is that it will skip over the column and complain at the end with 
>> something like
>> 
>> ERROR:  0A000: unique constraint on partitioned table must include all 
>> partitioning columns
>> DETAIL:  UNIQUE constraint on table "t1" lacks column "b" which is part 
>> of the partition key.
>> 
>> which perhaps isn't intuitive, but I think it would be the same if you 
>> somehow tried to build an index with different operator classes than the 
>> partitioning.  I think these less-specific error messages are ok in such 
>> edge cases.

> If there are no further comments on this patch version, I plan to go 
> ahead and commit it soon.

Sorry for slow response --- I've been dealing with a little too much
$REAL_LIFE lately.  Anyway, I'm content with the v2 patch.  I see
that the existing code works a little harder than this to produce
an on-point error message for mismatching operator, but after
studying that I'm not totally convinced that it's ideal behavior
either.  I think we can wait for some field complaints to see if
we need a better error message for mismatching collation, and
if so what the shape of the bad input is exactly.

            regards, tom lane