Re: [HACKERS] UTF-8 safe ascii() function

Поиск
Список
Период
Сортировка
От Patrice Hédé
Тема Re: [HACKERS] UTF-8 safe ascii() function
Дата
Msg-id 20020519114413.2265b70e.phede-ml@islande.org
обсуждение исходный текст
Ответ на UTF-8 safe ascii() function  (Jean-Michel POURE <jm.poure@freesurf.fr>)
Ответы Re: [HACKERS] UTF-8 safe ascii() function  (Jean-Michel POURE <jm.poure@freesurf.fr>)
Re: [HACKERS] UTF-8 safe ascii() function  (Jean-Michel POURE <jm.poure@freesurf.fr>)
Список pgsql-general
Hi Jean-Michel,

Jean-Michel POURE <jm.poure@freesurf.fr> a écrit :
> Dear all,
>
> I would like to transform UTF-8 strings into Java-Unicode. Example :
> - Latin1 : 'é'
> - UTF-8 : 'é'
> - Java Unicode = '\u00233'
>
> Basically, a Unicode compatible ascii() function would be fine.
> ascii('é') should return 233.
>
> 1) Has anyone written an ascii UTF-8 safe wrapper to ascii() function?
> If yes, would you be so kind to publish this function on the list.

OK, I just gave it a try, see the attachment.

The function is taking the first character of a TEXT element, and
returns its UCS2 value. I just did some basic test (i.e. I have not
tried with 3 or 4 bytes UTF-8 chars). The function is following the
Unicode 3.2 spec.

SELECT utf8toucs2('a'), utf8toucs2('é');
  utf8toucs2 | utf8toucs2
------------+------------
         97 |        233
(1 row)

The function returns -1 on error.

> 2) Are there plans to add an ascii() UTF-8 safe function to
> PostrgeSQL?

I don't think the function I did is useful as such. It would be better
to make a function that converts the whole string or something.

By the way, what is the encoding for Java Unicode ? is it always "\u"
followed by 5 hex digits (in which case your example is wrong) ? Then,
it shouldn't be too difficult to make the relevant function, though I'm
wondering if the Java programme would convert an incoming '\' 'u' '0'
'0' '2' '3' '3' to the corresponding UCS2/UTF16 character ?

Maybe we should have some similar input (and output ?) functionality in
psql, but then I would much prefer the Perl way, which is
\x{hex_digits}, which is unambiguous.

Regards,

Patrice

--
Patrice Hédé
email: patrice hede(à)islande org
www  : http://www.islande.org/


Вложения

В списке pgsql-general по дате отправления:

Предыдущее
От: "Wm. G. Urquhart"
Дата:
Сообщение: Re: More on "What am I doing wrong!"
Следующее
От: "Wm. G. Urquhart"
Дата:
Сообщение: Re: More on "What am I doing wrong!"