Re: lower and upper not UTF-8 safe

Поиск
Список
Период
Сортировка
От Karel Zak
Тема Re: lower and upper not UTF-8 safe
Дата
Msg-id 20030805065850.GA12563@zf.jcu.cz
обсуждение исходный текст
Ответ на Re: lower and upper not UTF-8 safe  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: lower and upper not UTF-8 safe  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Mon, Aug 04, 2003 at 05:03:02PM -0400, Tom Lane wrote:
> Julian Satchell <j.satchell@eris.qinetiq.com> writes:
> > The implementations of lower and upper in
> > src/backend/utils/adt/oracle_compat.c use the single byte macros from
> > ctype.h to alter individual bytes in the text string. 
> 
> > If the text is UTF-8 encoded this is totally wrong, and will result in
> > an invalid string that is no longer UTF-8.
> 
> Only if you use a locale that is assuming a character set that is not
> UTF8 but does have characters with the high bit set.  I'm not sure that
> we can do anything to defend against locale/charset mismatch.
We can try detect typical locale charset and compare it with actualcharset used in DB and send NOTICE to FE if it's
mismatched.The problem is portability of charset detection code, because there is differences between OS. The best it's
iflibc support nl_langinfo(CODESET) call.The complete code of charset detection you can found in libcharset orglib (I
usesimplification of these codes and it's 300 lines:-).
 
   Karel


-- Karel Zak  <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Oleg Bartunov
Дата:
Сообщение: Re: Release changes
Следующее
От: "Shridhar Daithankar"
Дата:
Сообщение: 7.4 beta binaries