Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Дата
Msg-id CA+hUKGJJgyc9xg+g_8_jPUV08Kr0NFdgq2_H9mm7-eZd7YVRQg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Список pgsql-bugs
On Tue, May 14, 2024 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> +1 for the long-term solution being more-stable locale identifiers.
> However, we should try to build something that will let users get
> out of these situations with the existing identifiers, so I like
> your idea of a plain-text mapping file for Windows locale names.
> I don't think an environment variable is necessary; just define
> a fixed name "$PGDATA/locale_map.txt" or such.  If that file
> exists, just read it and map the pg_database field values with it.

OK, I tried that, first draft attached (with my standard proviso that
I don't do Windows, I just know that this passes CI and that the code
works the way I intended on my local Unix system if extracted into a
little harness).  With this, you could in theory create a file
PGDATA/win32setlocale.map containing:

c Turkish_Turkey.1254=Turkish_Türkiye.1254

... or perhaps more likely:

c Turkish_Turkey.1254=tr-TR.1254

I also absorbed the pre-existing kludge table into the new system by
default (though they got a bit shorter 'cause I invented some
wildcards).  Some problems came up while wondering how to fit Türkiye
into the defaults, and how to back-patch:

1.  In the back-branches, we claim to support ancient Windows releases
as far back as "Windows 2000 SP4" (!), which obviously aren't getting
the Windows updates, so I guess "Turkish_Türkiye.1254" will fail there
and generally before Windows 10.  And even if you exclude the
extremities of our support window somehow (how?), modern systems might
not have applied the update yet (IIUC they *have* to at some point
under the new world order, so there is a defined window of version
skew these days).

2.  It's generally a terrible idea to be using "ü" in a locale name.
FWIW I assume setlocale() actually accepts and returns names encoded
in the current ACP ("active codepage", system-wide changeable setting
that controls char↔wchar_t conversion in system APIs), so the encoding
of that file (and the built-in default table) would need to match that
to work, as coded.  Perhaps it would be possible to make the mapping
file UTF-8 and transform that to ACP!  But it feels a bit too loopy
for me, and on the PostgreSQL side it is undefined/illegal whatever
you choose in PostgreSQL due to being accessed from different
databases which are using potentially different encodings that are
only required to be a superset of ASCII.  Avoid.

3.  Therefore you'd probably want to prefer "tr-TR.1254" as the
replacement string.  But what is the oldest Windows release that can
understand a BCP47 code like that?

4.  Conversely, on modern systems, I'm still not entirely sure that
"tr-TR.1254" is exactly the same thing as "Turkish-XXX.1254" and that
it's OK to put ".1254" on the end like that.  Is it, and is it?  I
don't mean just "does it mean Turkish?", I mean "does it give exactly
the same answer for every conceivable pair of strings when compared
with strcoll_l(), and likewise for the ctype-based functions like
towlower() et al".

If the answers are not in our favour, I guess we could leave the
default behaviour unchanged, and let people set up a text file as
shown above to fix their database if they want, but that's also not
very nice and kinda weird (helping hypothetical users of museum-grade
systems by leaving real users' systems broken).

If the answer to 4 is yes, yes then we could also push ahead with the
plan to make initdb pick BCP47 names by default in PG18 (or even 17).

> Maybe this shouldn't even be Windows-specific?  Are there any
> cases where it'd save people's bacon on other platforms?

Good question.  Sometimes ISO code go away or countries split etc, so
it's no like POSIX locale names are set in stone under all
circumstances.  But on Unixen it's all just files in practice, you can
always just symlink them, move them around, compile them yourself from
sources, etc, if you really have to, so I think I'd rather contain the
crazy in win32*.c.

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Jan Wieck
Дата:
Сообщение: Re: numeric calculation bug as of 16.2-2
Следующее
От: PG Bug reporting form
Дата:
Сообщение: BUG #18463: Possible bug in stored procedures with polymorphic OUT parameters