problem comparing strings when different cluster / database encoding

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема problem comparing strings when different cluster / database encoding
Дата
Msg-id 443444B5.1010205@fuzzy.cz
обсуждение исходный текст
Ответы Re: problem comparing strings when different cluster / database encoding  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: problem comparing strings when different cluster / database encoding  (Alvaro Herrera <alvherre@commandprompt.com>)
Список pgsql-sql
Greetings,

I've encountered a strange problem. We have a PG 8.0.x database cluster
(in the sense used in initdb, i.e. bunch of databases) created with
UNICODE encoding, namely cs_CZ.UTF-8 locale.

When a database is created with a different encoding (in our case it's
LATIN2) the string comparison doesn't work correctly. For example the query
  SELECT 'ě' = 'é';

returns 'true' which is obviously incorrect, as those two letters have
different accents (I hope you can see that). And of course, it's not
possible to create an unique index (or primary key) over a column of
words (for example in a dictionnary), because false collisions are
found, and the sorting works in a really strange way too.

If the both cluster and database are in the same encoding (UNICODE or
LATIN2), everything works fine.

Below is a short description how the database cluster and the databases
have been created.

----------------------------------------------------------------------
$ export LANG="cs_CZ.UTF-8"
$ initdb ... (cluster created with UNICODE encoding, cs_CZ.UTF-8 locale)
$ ... (postgres started, users created, etc.)
$ createdb -E LATIN2 my_database;
$ psql my_database;
> SELECT 'ě' = 'é'; (returns 'true', which is incorrect)
----------------------------------------------------------------------

If we create the cluster with LATIN2 encoding (or on the contrary the
database is created with UNICODE encoding), everything works fine.
For example the following works as expected.

----------------------------------------------------------------------
$ export LANG="cs_CZ" (thus the ISO-8859-2 encoding is used)
$ initdb ... (cluster created with LATIN2 encoding, cs_CZ locale)
$ ... (postgres started, users created, etc.)
$ createdb -E LATIN2 my_database;
$ psql my_database;
> SELECT 'ě' = 'é'; (returns 'false', which is correct)
----------------------------------------------------------------------

I'm trying to solve this for several days, but unsuccesfully. Is there
something I've missed? Some obvious solution I don't see?

The queston is why we need different encodings for cluster / databases.
(a) Until recently we've used LATIN2 cluster and LATIN2 databases (and    applications expecting LATIN2 encoding) -
that'sthe reason why we    need LATIN2 databases.
 
(c) On the other way some of the new clients want to 'internationalize'    their applications, so we need UNICODE
infrastructuretoo - that's    the reason why we use UNICODE cluster and databases.
 

I've came accross the nls_string function - with it it works fine, but
that's not an option for us, as it would require rewriting all the SQL
queries in the applications (and that's something we don't want).

Thanks for your suggestions
Tomas


В списке pgsql-sql по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: SELECT composite type
Следующее
От: Tom Lane
Дата:
Сообщение: Re: problem comparing strings when different cluster / database encoding