Re: PostgreSQL fails to convert decomposed utf-8 to other encodings

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: PostgreSQL fails to convert decomposed utf-8 to other encodings
Дата
Msg-id 53E1A6F8.1000104@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: PostgreSQL fails to convert decomposed utf-8 to other encodings  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: PostgreSQL fails to convert decomposed utf-8 to other encodings  (Craig Ringer <craig@2ndquadrant.com>)
Re: PostgreSQL fails to convert decomposed utf-8 to other encodings  (Tatsuo Ishii <ishii@postgresql.org>)
Список pgsql-bugs
On 08/06/2014 09:14 AM, Tom Lane wrote:
> We don't actually support "decomposed" utf8; if there is any bug here,
> it's that the input you show isn't rejected.  But I think there was
> some intentional choice to not check \u escapes fully.

Combining characters (i.e. decomposed utf-8 form, for chars where there
is a combined equivalent) are part of utf-8. They're not an optional add-on.

So if Pg doesn't support them, it doesn't fully support utf-8. Which is
fine as far as it goes, but must be documented as a limitation at
minimum. (I'll deal with that).

It also means that you get fun anomalies like:

regress=> SELECT 'á' = 'á';
 ?column?
----------
 f
(1 row)

which is IMO insane.

Not only that, but we can't reject decomposed forms, because they will
already exist in live installs. That'd break dump and reload of such
installs and cause exciting problems with pg_upgrade.

The "we'll just reject part of utf-8" opportunity has flown. It needs to
be documented as a bug in existing versions, and I guess given that I'm
the one complaining I get to see if I can find a sane fix for 9.5...

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re:
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: PostgreSQL fails to convert decomposed utf-8 to other encodings