Re: Order changes in PG16 since ICU introduction

Поиск
Список
Период
Сортировка
От Jeff Davis
Тема Re: Order changes in PG16 since ICU introduction
Дата
Msg-id 73f97db9a37fd281bd5077ce16383cd8006b1f59.camel@j-davis.com
обсуждение исходный текст
Ответ на Re: Order changes in PG16 since ICU introduction  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Ответы Re: Order changes in PG16 since ICU introduction  ("Daniel Verite" <daniel@manitou-mail.org>)
Re: Order changes in PG16 since ICU introduction  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
Список pgsql-hackers
On Tue, 2023-06-06 at 21:37 +0100, Andrew Gierth wrote:
> > > > >
> I like either "internal" or "builtin" because they correctly identify
> that no external resources are used. I'm not keen on "postgresql".

"builtin" seems to be the winner. New patch series attached with doc
and test updates.

This has been a long discussion (it's a messy problem), but I think
I've addressed the most important concerns raised. If you disagree with
something, please indicate whether it's an objection, or a more minor
difference of opinion that I can weigh against other opinions. Also
please indicate if you think something is out of scope for 16.

Patches 0001, 0002:

These patches implement the built-in provider and automatically change
provider=icu to provider=builtin when the locale is C. Other approaches
were considered:
 * Pretend that ICU can support the C locale, and use similar checks
throughout the code like the libc provider does: This was somewhat of a
hack, and had potential issues with upgraded clusters, and several
people seemed to reject it.
 * Switch to the libc provider for the C locale: would make the libc
provider even more complicated and had some potential for confusion,
and also has catalog representation problems when --locale is specified
along with --lc-ctype.

Ultimately we need to choose one approach, and the built-in provider
seems the nicest (though most invasive). It reflects the reality that
we don't actually use libc or icu for the C locale, and it's nicer to
document. The builtin provider seemed to get the most support.


Patch 0003:

Makes LOCALE apply to all providers. The overall feel after this patch
is that "locale" now means the collation locale, and
LC_COLLATE/LC_CTYPE are for the server environment. When using libc,
LC_COLLATE and LC_CTYPE still work as they did before, but their
relationship to database collation feels more like a special case of
the libc provider. I believe most people favor this patch and I haven't
seen recent objections.


I didn't find any surprising behaviors, but there are a few that I'd
like to draw attention to:

0. If you initdb with --locale-provider=libc, and don't specify ICU at
any later point, then none of these changes should affect you and
you'll remain on libc. If someone notices otherwise, please let me
know.

1. If you specify --locale-provider=builtin at initdb time, you *must*
specify --locale=C/POSIX, otherwise you get an error.

2. Patch 0004 is possibly out of scope for 16, but it felt consistent
with the other UI changes and low risk. Please try with/without before
objecting.

3. Daniel Verite felt that we should only change the provider from ICU
to "builtin" for the C locale if the provider is defaulting to ICU; not
if it's specified as ICU. I did not differentiate between specifying
ICU and defaulting to ICU because:
  a. "libc" unconditionally uses the built-in memcmp() logic for C, it
never actually uses libc
  b. If a user really wants the root locale or the en-US-u-va-posix
locale, they can specify those directly
  c. I don't see any plausible case where it helps a user to keep
provider=icu when locale=C.

4. Joe Conway and Peter Eisentraut both felt that C.UTF-8 with
provider=icu should not be changed to use the builtin provider, and
instead passed on to ICU. I implemented a compromise where initdb will
change C.UTF-8 to the built-in provider; but CREATE DATABASE/COLLATION
will pass it along to ICU (which may support it as en-US-u-va-posix in
some versions, or may throw an error in other versions). My reasoning
is that initdb is pulling from the environment, and we should try
harder to succeed on any reasonable environmental settings (otherwise
initdb with default settings could fail); whereas we can be more strict
with CREATE DATABASE/COLLATION.

5. For the built-in provider, initdb defaults to UTF-8 rather than
SQL_ASCII. Otherwise, you would be unable to use ICU at all later,
because ICU doesn't support SQL_ASCII.


--
Jeff Davis
PostgreSQL Contributor Team - AWS



Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Joe Conway
Дата:
Сообщение: Re: git.postgresql.org seems to be down
Следующее
От: Jaime Casanova
Дата:
Сообщение: Re: git.postgresql.org seems to be down