Re: [PATCH] hstore: Fix parsing on Mac OS X: isspace() is locale specific

Поиск

Список

Период

Сортировка

От	Evan Jones
Тема	Re: [PATCH] hstore: Fix parsing on Mac OS X: isspace() is locale specific
Дата	10 октября 2023 г. 17:51:10
Msg-id	CA+HWA9aN-M1O-9Ma=_Pqz-uwzDA07DEk+pui7Zy7K7-Y1PjpUg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [PATCH] hstore: Fix parsing on Mac OS X: isspace() is locale specific (Thomas Munro <thomas.munro@gmail.com>)
Ответы	Re: [PATCH] hstore: Fix parsing on Mac OS X: isspace() is locale specific (Michael Paquier <michael@paquier.xyz>)
Список	pgsql-hackers

Дерево обсуждения

Thanks for bringing this up! I just looked at the uses if isspace() in that file. It looks like it is the usual thing: it is allowing leading or trailing whitespace when parsing values, or for this "needs quoting" logic on output. The fix would be the same: this *should* be using scanner_isspace. This has the same disadvantage: it would change Postgres's results for some inputs that contain these non-ASCII "space" characters.

Here is a quick demonstration of this issue, showing that the quoting behavior is different between these two. Mac OS X with the "default" locale includes quotes because ą includes 0x85 in its UTF-8 encoding:

postgres=# SELECT ROW('keyą');
row
----------
("keyą")
(1 row)

On Mac OS X with the LANG=C environment variable set, it does not include quotes:

postgres=# SELECT ROW('keyą');
row
--------
(keyą)
(1 row)

On Mon, Oct 9, 2023 at 11:18 PM Thomas Munro <thomas.munro@gmail.com> wrote:

FTR I ran into a benign case of the phenomenon in this thread when
dealing with row types. In rowtypes.c, we double-quote stuff
containing spaces, but we detect them by passing individual bytes of
UTF-8 sequences to isspace(). Like macOS, Windows thinks that 0xa0 is
a space when you do that, so for example the Korean character '점'
(code point C810, UTF-8 sequence EC A0 90) gets quotes on Windows but
not on Linux. That confused a migration/diff tool while comparing
Windows and Linux database servers using that representation. Not a
big deal, I guess no one ever promised that the format was stable
across platforms, and I don't immediately see a way for anything more
serious to go wrong (though I may lack imagination). It does seem a
bit weird to be using locale-aware tokenising for a machine-readable
format, and then making sure its behaviour is undefined by feeding it
chopped up bytes.

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Robert Haas
Дата: 10 октября 2023 г., 17:50:55
Сообщение: Re: On login trigger: take three

Следующее

От: Alvaro Herrera
Дата: 10 октября 2023 г., 18:15:36
Сообщение: Re: Fwd: Advice about preloaded libraries

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [PATCH] hstore: Fix parsing on Mac OS X: isspace() is locale specific

Предыдущее

Следующее