Re: Built-in CTYPE provider

Поиск
Список
Период
Сортировка
От Jeff Davis
Тема Re: Built-in CTYPE provider
Дата
Msg-id 6b1370d5eaba5e8c42f54c05f7bc2b8e27b8db12.camel@j-davis.com
обсуждение исходный текст
Ответ на Re: Built-in CTYPE provider  ("Daniel Verite" <daniel@manitou-mail.org>)
Ответы Re: Built-in CTYPE provider  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
On Wed, 2023-12-20 at 13:49 +0100, Daniel Verite wrote:
>
> But C.UTF-8 is not available everywhere, and there's still the
> problem that Unicode updates through libc are not aligned
> with Postgres releases.

Attached is an implementation of a built-in provider for the "C.UTF-8"
locale. That way applications (and tests!) can count on C.UTF-8 always
being available on any platform; and it also aligns with the Postgres
Unicode updates. Documentation is sparse and the patch is a bit rough,
but feedback is welcome -- it does have some basic tests which can be
used as a guide.

The C.UTF-8 locale, briefly, is a UTF-8 locale that provides simple
collation semantics (code point order) but rich ctype semantics
(lower/upper/initcap and regexes). This locale is for users who want
proper Unicode semantics for character operations (upper/lower,
regexes), but don't need a specific natural-language string sort order
to apply to all queries and indexes in their system. One might use it
as the database default collation, and use COLLATE clauses (i.e.
COLLATE UNICODE) where more specific behavior is needed.

The builtin C.UTF-8 locale has the following advantages over using the
libc C.UTF-8 locale:

  * Collation performance: the builtin provider uses memcmp and
abbreviated keys. In libc, these advantages are only available for the
C locale.

  * Unicode version is aligned with other parts of Postgres, like
normalization.

  * Available on all platforms with exactly the same semantics.

  * Testable and documentable.

  * Avoids index corruption risks. In theory libc C.UTF-8 should also
have stable collation, but that is not 100% true. In the builtin
provider it is 100% stable.

Regards,
    Jeff Davis


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Justin Pryzby
Дата:
Сообщение: Re: cannot abort transaction 2737414167, it was already committed
Следующее
От: Corey Huinker
Дата:
Сообщение: Re: Statistics Import and Export