Re: Documenting more pitfalls of non-default collations?

Поиск

Список

Период

Сортировка

От	Jeff Davis
Тема	Re: Documenting more pitfalls of non-default collations?
Дата	11 июня 20:58:54
Msg-id	23342b9d538adeab472217fd6a96c75b48a659fb.camel@j-davis.com обсуждение исходный текст
Ответ на	Documenting more pitfalls of non-default collations? (Will Mortensen <will@extrahop.com>)
Ответы	Re: Documenting more pitfalls of non-default collations? (Will Mortensen <will@extrahop.com>)
Список	pgsql-docs

Дерево обсуждения

On Mon, 2024-06-10 at 23:55 -0700, Will Mortensen wrote:
> I mentioned to Jeremy at pgConf.dev that using non-default collations
> in some SQL idioms can produce undesired results, and he asked me to
> send an email. An example idiom is the way Django implements
> case-insensitive comparisons using "upper(x) = upper(y)" [1][2][3] ,
> which returns false if x = y but they have different collations that
> produce different uppercase.

Hi,

Thank you for the examples.

There are quite a few subtleties to getting case-insensitive
comparisons right, and neither LOWER() nor UPPER() get everything quite
right even if the collation is the same.

For instance (for almost any locale other than "C"):

  UPPER(LOWER(U&'\1E9E')) != UPPER(U&'\1E9E')

And:

  LOWER(UPPER(U&'\03C2')) != LOWER(U&'\03C2')

The results of UPPER() and LOWER() can also change if some language
adds a new case variant in the future, which could be a problem if the
results are stored somewhere.

How should we document all of that? If we include too many caveats,
it's just frustrating.

Instead, I propose that we implement Unicode "case folding" in PG18,
which solves these issues by transforming the string to a canonical
form suitable for case-insensitive comparison. (In most cases, the
results are the same as LOWER(), but there are exceptions specifically
to avoid the problems above.)

Then, we can just have a section in the docs on "case folding" to
describe the right way to use it. That still leaves one caveat: the
handling of dotted- and dotless-i. But one caveat is a lot easier to
keep track of.

Regards,
    Jeff Davis

В списке pgsql-docs по дате отправления:

Предыдущее

От: PG Doc comments form
Дата: 11 июня, 12:00:31
Сообщение: Bibliography reference redirects to a Forbiden page.

Следующее

От: Will Mortensen
Дата: 12 июня, 04:20:42
Сообщение: Re: Documenting more pitfalls of non-default collations?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Documenting more pitfalls of non-default collations?

Предыдущее

Следующее