Обсуждение: Unicode support problem
Hi all, I'm having a problem with unicode support in postgres under linux. The issue is that I am copying lots of data from an MS SQL Server database via java/jdbc running on Windows XP over to a postgres database running on linux. I've setup the postgres database as follows: LANG=C initdb -E UNICODE createdb -E UNICODE And then when I'm transferring the data, when my program tries to send a string containing a character 0xF6 (Latin small letter o with diaeresis), then I get a JDBC exception & an error log on the server as follows: ERROR: invalid multibyte character for locale HINT: The server's LC_CTYPE locale is probably incompatible with the database encoding. I have tried setting locale/lc_ctype to C, POSIX, iso_8859_1, all kinds of things, and nothing seems to fix it. If I setup the database as follows: LANG=C initdb -E iso8859_1 createdb -E iso8859_1 Then it appears to work OK - but I then get an error with character 0xE2 (Latin small letter a with circumflex): ERROR: could not convert UTF-8 character 0x00e2 to ISO8859-1 Does anyone know how to do correctly do this? This is my environment: LINUX: DEBIAN 3.0, KERNEL 2.4 running on a 2CPU PC. Postgres: 8.0.1 built from source, no changes to anything, running on the linux box. JDBC driver: postgresql-8.0-310.jdbc3.jar Java JVM (Sun) 1.4.2_02 on Windows XP SP2. If I run postgres on the Windows XP machine (configured for UNICODE as above), then I don't have any errors at all. This only happens on the linux box. Any help in fixing this would be greatly appreciated. Thanks, --Jatinder Sangha Coalition Development
"Jatinder Sangha" <js@coalitiondev.com> writes: > I've setup the postgres database as follows: > LANG=C > initdb -E UNICODE > createdb -E UNICODE > I have tried setting locale/lc_ctype to C, POSIX, iso_8859_1, all kinds > of things, and nothing seems to fix it. You can't just pick random combinations of locale and database encoding. Any given locale setting implies a character set encoding, and you have to use that same encoding as the database encoding; at least if you want encoding-dependent operations such as upper()/lower() to work. The locale you want for Unicode (UTF8) may be named something like "en_US.utf8". Try "locale -a" to get a list of supported locales. regards, tom lane
> If I setup the database as follows: > LANG=C > initdb -E iso8859_1 > createdb -E iso8859_1 > > Then it appears to work OK - but I then get an error with character 0xE2 > (Latin small letter a with circumflex): > ERROR: could not convert UTF-8 character 0x00e2 to ISO8859-1 The error message says all. You are trying to convert an UTF-8 character starting with 0x00e2 to ISO-8859-1, which does not exist in the world. All ISO-8859-1 chars in UTF-8 are below 0x00e0 range. Probably you mixed up with ISO-8859-2 or any other characters other than ISO-8859-1? -- Tatsuo Ishii
Hi Tom, Thanks for the reply -- yes, creating the en_US.utf8 locale and using that, fixed all of my problems. Thanks, --Jatinder -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: 24 February 2005 17:11 To: Jatinder Sangha Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Unicode support problem "Jatinder Sangha" <js@coalitiondev.com> writes: > I've setup the postgres database as follows: > LANG=C > initdb -E UNICODE > createdb -E UNICODE > I have tried setting locale/lc_ctype to C, POSIX, iso_8859_1, all > kinds of things, and nothing seems to fix it. You can't just pick random combinations of locale and database encoding. Any given locale setting implies a character set encoding, and you have to use that same encoding as the database encoding; at least if you want encoding-dependent operations such as upper()/lower() to work. The locale you want for Unicode (UTF8) may be named something like "en_US.utf8". Try "locale -a" to get a list of supported locales. regards, tom lane