Re: pg_collation.collversion for C.UTF-8

Поиск
Список
Период
Сортировка
От Daniel Verite
Тема Re: pg_collation.collversion for C.UTF-8
Дата
Msg-id ac61fb5a-461a-4bdf-9201-68fa67b6242b@manitou-mail.org
обсуждение исходный текст
Ответ на Re: pg_collation.collversion for C.UTF-8  (Thomas Munro <thomas.munro@gmail.com>)
Ответы Re: pg_collation.collversion for C.UTF-8  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-hackers
    Thomas Munro wrote:

> It looks like for technical reasons
> inside glibc, that couldn't be done before 2.35:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=17318
>
> That strengthens my opinion that C.UTF-8 (the real C.UTF-8 supplied
> by the glibc project) isn't supposed to be versioned, but it's
> extremely unfortunate that a bunch of OSes (Debian and maybe more)
> have been sorting text in some other order under that name for
> years.

Yes. This is consistent with Debian/Ubuntu patches in
glibc/localedata/locales/C

glibc-2.35 is not patched, and upstream has this:
  LC_COLLATE
  % The keyword 'codepoint_collation' in any part of any LC_COLLATE
  % immediately discards all collation information and causes the
  % locale to use strcmp/wcscmp for collation comparison.  This is
  % exactly what is needed for C (ASCII) or C.UTF-8.
  codepoint_collation
  END LC_COLLATE

But in older versions, glibc doesn't have the locales/C data file.
Debian adds it in debian/patches/localedata/C with that kind of
content:

* glibc 2.31  Debian 11
  LC_COLLATE
  order_start forward
  <U0000>
  ..
  <U007F>
  <U0080>
  ..
  <U00FF>
  etc...

But as explained in the above-linked bugzilla entry, that did not
result in true byte-comparison semantics, for several reasons
that got fixed in 2.35.

So this looks like a solved problem for anyone starting to use these
collation with glibc 2.35 or newer (or other OSes that don't have a
compatibility issue with them in the first place).
But Debian/Ubuntu users upgrading from the older C.* to 2.35+ will not
be having the normal warning about the need to reindex.

I understand that my proposal to version C.* like any other collation
might be erring on the side of caution, but ignoring these collation
changes on at least one major OS does not feel right either.
Maybe we should consider doing platform-dependent checks?



Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Isaac Morland
Дата:
Сообщение: Re: Mark a transaction uncommittable
Следующее
От: Vik Fearing
Дата:
Сообщение: Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options