Re: Bug with UTF-8 character

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Bug with UTF-8 character
Дата
Msg-id 25791.1148654039@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Bug with UTF-8 character  (Hans-Jürgen Schönig <postgres@cybertec.at>)
Список pgsql-hackers
Hans-Jürgen Schönig <postgres@cybertec.at> writes:
> But the code does a check where the second character should not be 
> greater than 0x9F, when first character is 0xED. This is not according 
> to UTF-8 standard in RFC 3629.

Better read the RFC again: it says
  UTF8-3      = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /                %xED %x80-9F UTF8-tail / %xEE-EF 2(
UTF8-tail)                ------------
 

The reason for the prohibition is explained as
 The definition of UTF-8 prohibits encoding character numbers between U+D800 and U+DFFF, which are reserved for use
withthe UTF-16 encoding form (as surrogate pairs) and do not directly represent characters.
 

I don't know anything about "surrogate pairs", but I am not about to
decide that we know more about this than the RFC authors do.  If they
say it's invalid, it's invalid.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andreas Pflug
Дата:
Сообщение: Re: XLogArchivingActive
Следующее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: Bug with UTF-8 character