RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

Поиск
Список
Период
Сортировка
От Haifang Wang (Centific Technologies Inc)
Тема RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Дата
Msg-id PH8PR21MB3902402C3C3C20DD8CB40AFFE5EC2@PH8PR21MB3902.namprd21.prod.outlook.com
обсуждение исходный текст
Ответ на Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607  (Thomas Munro <thomas.munro@gmail.com>)
Ответы RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Список pgsql-bugs
Thanks for your questions, Thomas and Tom. + @Vishwa to help with technical questions.

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com> 
Sent: Monday, May 13, 2024 11:38 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, May 14, 2024 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> +1 for the long-term solution being more-stable locale identifiers.
> However, we should try to build something that will let users get out 
> of these situations with the existing identifiers, so I like your idea 
> of a plain-text mapping file for Windows locale names.
> I don't think an environment variable is necessary; just define a 
> fixed name "$PGDATA/locale_map.txt" or such.  If that file exists, 
> just read it and map the pg_database field values with it.

OK, I tried that, first draft attached (with my standard proviso that I don't do Windows, I just know that this passes
CIand that the code works the way I intended on my local Unix system if extracted into a little harness).  With this,
youcould in theory create a file PGDATA/win32setlocale.map containing:
 

c Turkish_Turkey.1254=Turkish_Türkiye.1254

... or perhaps more likely:

c Turkish_Turkey.1254=tr-TR.1254

I also absorbed the pre-existing kludge table into the new system by default (though they got a bit shorter 'cause I
inventedsome wildcards).  Some problems came up while wondering how to fit Türkiye into the defaults, and how to
back-patch:

1.  In the back-branches, we claim to support ancient Windows releases as far back as "Windows 2000 SP4" (!), which
obviouslyaren't getting the Windows updates, so I guess "Turkish_Türkiye.1254" will fail there and generally before
Windows10.  And even if you exclude the extremities of our support window somehow (how?), modern systems might not have
appliedthe update yet (IIUC they *have* to at some point under the new world order, so there is a defined window of
versionskew these days).
 

2.  It's generally a terrible idea to be using "ü" in a locale name.
FWIW I assume setlocale() actually accepts and returns names encoded in the current ACP ("active codepage", system-wide
changeablesetting that controls char↔wchar_t conversion in system APIs), so the encoding of that file (and the built-in
defaulttable) would need to match that to work, as coded.  Perhaps it would be possible to make the mapping file UTF-8
andtransform that to ACP!  But it feels a bit too loopy for me, and on the PostgreSQL side it is undefined/illegal
whateveryou choose in PostgreSQL due to being accessed from different databases which are using potentially different
encodingsthat are only required to be a superset of ASCII.  Avoid.
 

3.  Therefore you'd probably want to prefer "tr-TR.1254" as the replacement string.  But what is the oldest Windows
releasethat can understand a BCP47 code like that?
 

4.  Conversely, on modern systems, I'm still not entirely sure that "tr-TR.1254" is exactly the same thing as
"Turkish-XXX.1254"and that it's OK to put ".1254" on the end like that.  Is it, and is it?  I don't mean just "does it
meanTurkish?", I mean "does it give exactly the same answer for every conceivable pair of strings when compared with
strcoll_l(),and likewise for the ctype-based functions like
 
towlower() et al".

If the answers are not in our favour, I guess we could leave the default behaviour unchanged, and let people set up a
textfile as shown above to fix their database if they want, but that's also not very nice and kinda weird (helping
hypotheticalusers of museum-grade systems by leaving real users' systems broken).
 

If the answer to 4 is yes, yes then we could also push ahead with the plan to make initdb pick BCP47 names by default
inPG18 (or even 17).
 

> Maybe this shouldn't even be Windows-specific?  Are there any cases 
> where it'd save people's bacon on other platforms?

Good question.  Sometimes ISO code go away or countries split etc, so it's no like POSIX locale names are set in stone
underall circumstances.  But on Unixen it's all just files in practice, you can always just symlink them, move them
around,compile them yourself from sources, etc, if you really have to, so I think I'd rather contain the crazy in
win32*.c.

В списке pgsql-bugs по дате отправления:

Предыдущее
От: WU Yan
Дата:
Сообщение: Re: BUG #18466: Wrong row estimate for nested loop
Следующее
От: Bowen Shi
Дата:
Сообщение: Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae