Re: pg_collation.collversion for C.UTF-8

Поиск
Список
Период
Сортировка
От Daniel Verite
Тема Re: pg_collation.collversion for C.UTF-8
Дата
Msg-id 5ad8d2f8-c11f-46d6-aab5-ed529d8e958a@manitou-mail.org
обсуждение исходный текст
Ответ на Re: pg_collation.collversion for C.UTF-8  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: pg_collation.collversion for C.UTF-8  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
    Jeff Davis wrote:

> >   For libc: this change may affect any user who happened to have
> > LANG=C.UTF-8 in their environment at initdb time, which is probably a
> > lot of users, and some buildfarm members. However, the average risk
> > seems to be much lower, because we've gone a long time with the
> > assumption that C.UTF-8 has the same behavior as C, and this only
> > recently came up.

Currently, neither lc_collate_is_c() nor lookup_collation_cache()
think that C.UTF-8 is a C collation, since they do that kind of test:

        if (strcmp(localeptr, "C") == 0)
            result = true;
        else if (strcmp(localeptr, "POSIX") == 0)
            result = true;
        else
            result = false;

What is relatively new (v15) is that we compute a version for libc
collations in get_collation_actual_version(), with code that assumes
that C.* does not need a version, implying that it's immune to
Unicode changes. What came up in this thread is that this assumption
is not true for at least one major platform: Debian/Ubuntu for
releases occurring before 2022 (glibc < 2.35).


> We can avoid this risk by converting C.anything or POSIX.anything to
> plain "C" or "POSIX", respectively, for new collations before storing
> the string in the catalog. For upgraded collations, we can preserve the
> existing locale name. When opening the locale, we would still only
> recognize plain "C" and "POSIX" as the C locale.


Then Postgres would not sort the same as the operating system with the
same locale, at least on some OS. Concerning glibc, after waiting a
few years, glibc<2.35 will be obsolete, and C.UTF-8 sorting like C
will happen by itself.
But in the meantime, personally I don't quite see why Postgres should
start forcing C.UTF-8 to sort differently in the database than in the
OS.


Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Let's make PostgreSQL multi-threaded
Следующее
От: Jeff Davis
Дата:
Сообщение: Re: Order changes in PG16 since ICU introduction