Re: UTF-8 -> ISO8859-1 conversion problem
От | J. Michael Crawford |
---|---|
Тема | Re: UTF-8 -> ISO8859-1 conversion problem |
Дата | |
Msg-id | 6.1.1.1.2.20041029134638.0326bc38@pop.suscom-maine.net обсуждение исходный текст |
Ответ на | Re: UTF-8 -> ISO8859-1 conversion problem ("J. Michael Crawford" <jmichael@gwi.net>) |
Список | pgsql-general |
Correction: Four things that need to be done, THREE if you're not serving up html. Sorry for the editing error. - Mike At 01:19 PM 10/29/2004, J. Michael Crawford wrote: > In my experience, there are just some characters that don't want to be > converted, even if they appear to be part of the normal 8-bit character > system. We went to Unicode databases to hold our Latin1 characters > because of this. There was even a case where the client was cutting and > pasting ascii text into our database, and it just wouldn't take some of > the letters, giving the same error you reported. > > I'm going to send a more detailed post on the topic, but in general, > we've found that there are four things that need to be done (four, if > you're not serving up web pages) for Latin1 characters to work on > multiple platforms. > > 1. Create the database in Unicode so that it will hold anything you > throw at it. > > 2. When importing data, set the encoding in the script that loads the > data, or if there's no script, use the "SET CLIENT_ENCODING TO > (encoding)" command. Setting the encoding in a tool like pgManager is > not always enough. Use this to be sure. > > 3. When retrieving data in a java application, the JVM encoding will > vary from JVM to JVM, and no attempt on our part to change the JVM > encoding or translate the encoding of the database strings has worked, > either to or from the database. We spent weeks going through every > permutation getBytes("ISO-8859-1") and related calls we could find, but > to no avail. The JVM will tell you it has a new encoding, but postgres > will return gibberish. You can translate the bytes, or get a translated > string, but it's all the same garbage. The solution: set the client > encoding manually through a jdbc prepared statement. Once you set the > client encoding properly, all seems to be fine: > >String DBEncoding = "anEncoding" //use a real encoding, either returned >from the jvm or explicitly stated >PreparedStatement statement = dbCon.prepareStatement("SET CLIENT_ENCODING >TO '" + DBEncoding + "'"); >statement.execute(); > > 4. If writing html for a web page, make sure the encoding of the web > page matches the encoding of the strings you're throwing at it. So if > you have a Linux JVM that has a "UTF-8" encoding, the web page will need > the html equivalent: > ><meta http-equiv="Content-Type" content="text/html; charset=utf-8"> > >--- > > This is likely far more information than you require, but I thought I'd > add it anyway so that the information is in the archives. It took us > months to solve our problem, even with help from the postgres community, > so I at least want the basics to be posted while I get my act together > and write something with more detail. > > - Mike > > >At 12:12 PM 10/29/2004, Cott Lang wrote: > >ERROR: could not convert UTF-8 character 0x00ef to ISO8859-1 > > > >Running 7.4.5, I frequently get this error, and ONLY on this particular > >character despite seeing quite a bit of 8 bit. I don't really follow why > >it can't be converted, it's the same character (239) in both character > >sets. Databases are in ISO8859-1, JDBC driver is defaulting to UTF-8. > > > >Am I flubbing something up? I'm probably going to (reluctantly) convert > >to UTF-8 in the database at some point, but it'd sure be nice if this > >worked without that. :) > > > >thanks! > > > > > > > > > > > > > > > >---------------------------(end of broadcast)--------------------------- > >TIP 8: explain analyze is your friend > > >---------------------------(end of broadcast)--------------------------- >TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html
В списке pgsql-general по дате отправления: