Обсуждение: Automatic code conversion between UNICODE and other encodings

Поиск
Список
Период
Сортировка

Automatic code conversion between UNICODE and other encodings

От
Tatsuo Ishii
Дата:
Hi,

I have committed the first implementation of an automatic code
conversion between UNICODE and other encodings. Currently
ISO8859-[1-5] and EUC_JP are supported. Supports for other encodings
coming soon. Testings of ISO8859 are welcome, since I have almost no
knowledge about European languages and have no idea how to test with
them.

How to use:

1. configure and install PostgreSQL with --enable-multibyte option

2. create database with UNICODE encoding
$ createdb -E UNICODE unicode

3. create a table and fill it with UNICODE (UTF-8) data. You could
create a table with even each column having different language.
create table t1(latin1 text, latin2 text);

4. set your terminal setting to (for example) ISO8859-2 or whatever

5. start psql

6. set client encoding to ISO8859-2
\encoding LATIN2

7. extract ISO8859-2 data from the UNICODE encoded table
select latin2 from t1;

P.S. I have used bsearch() to search code spaces. Is bsearch() is
portable enough?
--
Tatsuo Ishii


Re: Automatic code conversion between UNICODE and other encodings

От
Tom Lane
Дата:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> P.S. I have used bsearch() to search code spaces. Is bsearch() is
> portable enough?

According to my references, bsearch() was originally a SysV localism
but is a required library function in ANSI C.  So in theory it should
be portable enough ... but I notice we have implementations in
backend/ports for strtol() and strtoul() which are also required by
ANSI C, so apparently some people are or were running Postgres on
machines that are a few bricks shy of a full ANSI library.

I suggest waiting to see if anyone complains.  If so, we should be
able to write up a substitute bsearch() and add it to ports/.
        regards, tom lane