Re: new FAQ entry

Поиск
Список
Период
Сортировка
От Tino Wildenhain
Тема Re: new FAQ entry
Дата
Msg-id 4487CD78.4020805@wildenhain.de
обсуждение исходный текст
Ответ на new FAQ entry (was:Re: UTF8 problem)  (Tim Allen <tim@proximity.com.au>)
Ответы Re: new FAQ entry  (Tim Allen <tim@proximity.com.au>)
Список pgsql-general
Tim Allen schrieb:
> Matthew T. O'Connor wrote:
>
>> Well, to answer my own question, I hacked the source code of DBMail
>> and had it set the client encoding to LATIN1 immediately after
>> database connect, this seems to have fixed the problem.
>>
>> Sorry for the noise,
>>
>> Matt
>
>
> I've seen this sort of problem asked about in the mailing lists often
> enough to think it merits a FAQ entry, so how about this text:
>
> <entry>
> Q. Why do I have problems inserting text into my database, with error
> messages like
>
> ERROR:  invalid byte sequence for encoding "UTF8": 0xe1202c ?
>
> A. Almost certainly that byte sequence really is an invalid byte
> sequence for that encoding. The reason you are seeing the error is
> probably because you are providing text in some other encoding. You and
> the database need to agree between you what encoding you're using.
> PostgreSQL is fairly good at working with you, converting to and from
> whatever encoding you want to use, but you need to tell it what that
> encoding is, and then stick to that encoding consistently.
>
> If you don't set the client encoding, then PostgreSQL will use the
> default encoding for the database, which in modern times is often UTF8
> (aka UNICODE), and is set at database creation time. However, many
> client apps still use other encodings, (eg Latin1, aka ISO-8859-1), so
> you need to either educate the client app to use UTF8, or get it to
> inform PostgreSQL what other encoding to use.
>
> The way to tell PostgreSQL what encoding you want to use is by use of
> the client_encoding GUC variable, eg
>
> set client_encoding to 'LATIN1';

If you cant educate your client application to set this option on connect,
you can set this per user:

ALTER USER clientappuser SET client_encoding to 'what your app uses';
>
> One reason you may be seeing this problem now, after upgrading your
> version of PostgreSQL, is that recent versions have tighter validation
> of encoded text. Previously you may not have been conscious of what
> encoding you were actually using, especially if you're a speaker of a
> Western European language, and may have gotten away with writing
> incorrectly-encoded text without the database complaining. Now is the
> time to start getting it right.
>
> One thing to be wary of is the "SQL_ASCII" encoding. It appears to be
> commonly and incorrectly believed that this represents either some
> variant on latin1, or pure 7-bit ASCII. It is neither of those, but a
> completely unchecked encoding that really means whatever you want it to
> mean. This makes it not a very good encoding to use in practice, as it
> becomes prone to allowing a mixture of different encodings to be present
> in the same set of data, which will cause you headaches when you try to
> convert the whole lot to some consistent encoding in the future.
>
> See section 21.2 of the documentation for more complete information.
> </entry>
>
> Tim
>


В списке pgsql-general по дате отправления:

Предыдущее
От: Tim Allen
Дата:
Сообщение: new FAQ entry (was:Re: UTF8 problem)
Следующее
От: Tim Allen
Дата:
Сообщение: Re: new FAQ entry