Обсуждение: BUG #4098: Encoding problems

Поиск
Список
Период
Сортировка

BUG #4098: Encoding problems

От
"Jan-Peter Seifert"
Дата:
The following bug has been logged online:

Bug reference:      4098
Logged by:          Jan-Peter Seifert
Email address:      Jan-Peter.Seifert@gmx.de
PostgreSQL version: 8.2
Operating system:   Windows xp
Description:        Encoding problems
Details:

The encoding of the source db/server is LATIN1. The data type of the field
is text (The storage mode is extended). It's/was possible to add characters
available in CP1252 but not in LATIN1 like the Euro character (code 80).
When exporting to UTF8 via "pg_dump -o -U postgres -E UTF-8 ..." (iconv?) it
just adds the character with the code "C2" before the Euro character in the
dump. When importing the dump to a server/db with CP1252 it just throws an
error because of the inconvertible character (0xc280) and seems to stop with
no further error messages (at least the whole table in question is empty
although filled in the source db). It wasn't an issue before the
restrictions were put in place, because the C locale is insufficient.

Re: BUG #4098: Encoding problems

От
Heikki Linnakangas
Дата:
Jan-Peter Seifert wrote:
> The following bug has been logged online:
>
> Bug reference:      4098
> Logged by:          Jan-Peter Seifert
> Email address:      Jan-Peter.Seifert@gmx.de
> PostgreSQL version: 8.2
> Operating system:   Windows xp
> Description:        Encoding problems
> Details:
>
> The encoding of the source db/server is LATIN1. The data type of the field
> is text (The storage mode is extended). It's/was possible to add characters
> available in CP1252 but not in LATIN1 like the Euro character (code 80).
> When exporting to UTF8 via "pg_dump -o -U postgres -E UTF-8 ..." (iconv?) it
> just adds the character with the code "C2" before the Euro character in the
> dump.

Yes, but if you do that, PostgreSQL doesn't know that code 0x80 actually
means the Euro character. That's why the conversion to UTF-8 doesn't
work the way you expected.

You should've created the database with  WIN1252 encoding instead to
begin with.

I think you can fix that by dumping the database in LATIN1 encoding,
modifying "client_encoding" line in the dump file to 'WIN1252', and
importing it back.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com