Re: ICU locale validation / canonicalization

Поиск
Список
Период
Сортировка
От Peter Eisentraut
Тема Re: ICU locale validation / canonicalization
Дата
Msg-id 899ab44a-4307-064f-0945-412723d57c02@enterprisedb.com
обсуждение исходный текст
Ответ на Re: ICU locale validation / canonicalization  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: ICU locale validation / canonicalization  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
On 28.02.23 06:57, Jeff Davis wrote:
> On Mon, 2023-02-20 at 15:23 -0800, Jeff Davis wrote:
>>
>> New patch attached. The new patch also includes a GUC that (when
>> enabled) validates that the collator is actually found.
> 
> New patch attached.
> 
> Now it always preserves the exact locale string during pg_upgrade, and
> does not attempt to canonicalize it. Before it was trying to be clever
> by determining if the language tag was finding the same collator as the
> original string -- I didn't find a problem with that, but it just
> seemed a bit too clever. So, only newly-created locales and databases
> have the ICU locale string canonicalized to a language tag.
> 
> Also, I added a SQL function pg_icu_language_tag() that can convert
> locale strings to language tags, and check whether they exist or not.

This patch appears to do about three things at once, and it's not clear 
exactly where the boundaries are between them and which ones we might 
actually want.  And I think the terminology also gets mixed up a bit, 
which makes following this harder.

1. Canonicalizing the locale string.  This is presumably what 
uloc_canonicalize() does, which the patch doesn't actually use.  What 
are examples of what this does?  Does the patch actually do this?

2. Converting the locale string to BCP 47 format.  This converts 
'de@collation=phonebook' to 'de-u-co-phonebk'.  This is what 
uloc_getLanguageTag() does.

3. Validating the locale string, to reject faulty input.

What are the relationships between these?

I don't understand how the validation actually happens in your patch. 
Does uloc_getLanguageTag() do the validation also?

Can you do canonicalization without converting to language tag?

Can you do validation of un-canonicalized locale names?

What is the guidance for the use of the icu_locale_validation GUC?

The description throws in yet another term: "validates that ICU locale 
strings are well-formed".  What is "well-formed"?  How does that relate 
to the other concepts?

Personally, I'm not on board with this behavior:

=> CREATE COLLATION test (provider = icu, locale = 
'de@collation=phonebook');
NOTICE:  00000: using language tag "de-u-co-phonebk" for locale 
"de@collation=phonebook"

I mean, maybe that is a thing we want to do somehow sometime, to migrate 
people to the "new" spellings, but the old ones aren't wrong.  So this 
should be a separate consideration, with an option, and it would require 
various updates in the documentation.  It also doesn't appear to address 
how to handle ICU before version 54.

But, see earlier questions, are these three things all connected somehow?




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Julien Rouhaud
Дата:
Сообщение: Re: pg_upgrade and logical replication
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: Allow tests to pass in OpenSSL FIPS mode