Hi Jean-Michel,
Jean-Michel POURE <jm.poure@freesurf.fr> a écrit :
> Dear all,
>
> I would like to transform UTF-8 strings into Java-Unicode. Example :
> - Latin1 : 'é'
> - UTF-8 : 'é'
> - Java Unicode = '\u00233'
>
> Basically, a Unicode compatible ascii() function would be fine.
> ascii('é') should return 233.
>
> 1) Has anyone written an ascii UTF-8 safe wrapper to ascii() function?
> If yes, would you be so kind to publish this function on the list.
OK, I just gave it a try, see the attachment.
The function is taking the first character of a TEXT element, and
returns its UCS2 value. I just did some basic test (i.e. I have not
tried with 3 or 4 bytes UTF-8 chars). The function is following the
Unicode 3.2 spec.
SELECT utf8toucs2('a'), utf8toucs2('é');
utf8toucs2 | utf8toucs2
------------+------------
97 | 233
(1 row)
The function returns -1 on error.
> 2) Are there plans to add an ascii() UTF-8 safe function to
> PostrgeSQL?
I don't think the function I did is useful as such. It would be better
to make a function that converts the whole string or something.
By the way, what is the encoding for Java Unicode ? is it always "\u"
followed by 5 hex digits (in which case your example is wrong) ? Then,
it shouldn't be too difficult to make the relevant function, though I'm
wondering if the Java programme would convert an incoming '\' 'u' '0'
'0' '2' '3' '3' to the corresponding UCS2/UTF16 character ?
Maybe we should have some similar input (and output ?) functionality in
psql, but then I would much prefer the Perl way, which is
\x{hex_digits}, which is unambiguous.
Regards,
Patrice
--
Patrice Hédé
email: patrice hede(à)islande org
www : http://www.islande.org/