Обсуждение: [GENERAL] loading file with en dash character into postgres 9.6.1 database

Поиск
Список
Период
Сортировка

[GENERAL] loading file with en dash character into postgres 9.6.1 database

От
"Hu, Patricia"
Дата:

The server and client encoding are both set to UTF8, and according to this http://www.fileformat.info/info/unicode/char/2013/index.htm en dash is a valid UTF8 character, but when running a script with insert statement with en dash character in it, I got the error below.

 

mydb=> select name, setting from pg_settings where name like '%encoding%';

      name       | setting

-----------------+---------

client_encoding | UTF8

server_encoding | UTF8

 

mydb=> \i activity_type.lst

psql:activity_type.lst:379: ERROR:  invalid byte sequence for encoding "UTF8": 0x96

 

If I set client_encoding to WIN1252, the same file will be run ok and records inserted in, but afterwards the en dash character showed up as “û”, instead of the en dash character “–”

mydb=> show client_encoding;

client_encoding

-----------------

WIN1252

 

I created a database with WIN1252 encoding so both server and client encoding are WIN1252, loaded the same file in, en dash character still showed up as “û”(actually that is on a windows box, on a linux box the character didn’t show up at all), so the client setting still makes a difference, even though client_encoding showed the same value.

 

Is there any way I will be able to load the en dash character into the postgresql database as is? I had worked around it by editing the input file to replace en dash with a plain dash, but that’s quite some manual work each time when a new dump is generated from an oracle database with WE8MSWIN1252 characterset.

 

 

Thanks,

Patricia

 

Confidentiality Notice:: This email, including attachments, may include non-public, proprietary, confidential or legally privileged information. If you are not an intended recipient or an authorized agent of an intended recipient, you are hereby notified that any dissemination, distribution or copying of the information contained in or transmitted with this e-mail is unauthorized and strictly prohibited. If you have received this email in error, please notify the sender by replying to this message and permanently delete this e-mail, its attachments, and any copies of it immediately. You should not retain, copy or use this e-mail or any attachment for any purpose, nor disclose all or any part of the contents to any other person. Thank you.

Re: [GENERAL] loading file with en dash character into postgres 9.6.1 database

От
Tom Lane
Дата:
"Hu, Patricia" <Patricia.Hu@finra.org> writes:
> The server and client encoding are both set to UTF8, and according to this
http://www.fileformat.info/info/unicode/char/2013/index.htmen dash is a valid UTF8 character, but when running a script
withinsert statement with en dash character in it, I got the error below. 
> psql:activity_type.lst:379: ERROR:  invalid byte sequence for encoding "UTF8": 0x96

Well, that certainly isn't valid UTF8, so your script file isn't in UTF8.

> If I set client_encoding to WIN1252, the same file will be run ok

0x96 does seem to be an en-dash in WIN1252, so this is probably the
appropriate fix.  Testing here says that PG will correctly convert
0x96 in WIN1252 to an en-dash (U+2013) in UTF8, so I think you are
getting the right thing inserted.

> but afterwards the en dash character showed up as "û", instead of the en dash character "-"

This indicates that your terminal program does *not* think its encoding
is WIN1252.  Having loaded that script file, you need to revert
client_encoding to whatever your terminal program is using, or non-ASCII
characters are going to be displayed wrong.

A bit of poking around suggests that your terminal may be operating
with code page 437 or similar, as 0x96 is "û" in that encoding ---
according to Wikipedia, at least:
https://en.wikipedia.org/wiki/Code_page_437
I don't think Postgres supports that as a client_encoding setting,
so one way or another you're going to need to switch the terminal
program's character set setting.

            regards, tom lane


Re: [GENERAL] loading file with en dash character into postgres9.6.1 database

От
Albe Laurenz
Дата:
Tom Lane wrote:
> "Hu, Patricia" <Patricia.Hu@finra.org> writes:
>> The server and client encoding are both set to UTF8, and according to this
>> http://www.fileformat.info/info/unicode/char/2013/index.htm en dash is a valid UTF8
>> character, but when running a script with insert statement with en dash character in it, I
>> got the error below.
>> psql:activity_type.lst:379: ERROR:  invalid byte sequence for encoding "UTF8": 0x96
> 
> Well, that certainly isn't valid UTF8, so your script file isn't in UTF8.
> 
>> If I set client_encoding to WIN1252, the same file will be run ok
>> but afterwards the en dash character showed up as "û", instead of the en dash character
>> "-"
> 
> This indicates that your terminal program does *not* think its encoding
> is WIN1252.  Having loaded that script file, you need to revert
> client_encoding to whatever your terminal program is using, or non-ASCII
> characters are going to be displayed wrong.
> 
> A bit of poking around suggests that your terminal may be operating
> with code page 437 or similar, as 0x96 is "û" in that encoding ---
> according to Wikipedia, at least:
> https://en.wikipedia.org/wiki/Code_page_437
> I don't think Postgres supports that as a client_encoding setting,
> so one way or another you're going to need to switch the terminal
> program's character set setting.

Running "chcp 1252" in your Windows console before starting psql
should do the trick.

Yours,
Laurenz Albe

Re: [GENERAL] loading file with en dash character into postgres 9.6.1database

От
"Hu, Patricia"
Дата:
Thanks Laurenz, that nailed it. It was what Tom was saying, except I didn't figure out how. 

Thanks,
Patricia 


-----Original Message-----
From: Albe Laurenz [mailto:laurenz.albe@wien.gv.at] 
Sent: Wednesday, July 12, 2017 5:31 AM
To: 'Tom Lane *EXTERN*'; Hu, Patricia
Cc: pgsql general (pgsql-general@postgresql.org)
Subject: [EXTERNAL] RE: [GENERAL] loading file with en dash character into postgres 9.6.1 database

Tom Lane wrote:
> "Hu, Patricia" <Patricia.Hu@finra.org> writes:
>> The server and client encoding are both set to UTF8, and according to 
>> this http://www.fileformat.info/info/unicode/char/2013/index.htm en 
>> dash is a valid UTF8 character, but when running a script with insert 
>> statement with en dash character in it, I got the error below.
>> psql:activity_type.lst:379: ERROR:  invalid byte sequence for 
>> encoding "UTF8": 0x96
> 
> Well, that certainly isn't valid UTF8, so your script file isn't in UTF8.
> 
>> If I set client_encoding to WIN1252, the same file will be run ok but 
>> afterwards the en dash character showed up as "û", instead of the en 
>> dash character "-"
> 
> This indicates that your terminal program does *not* think its 
> encoding is WIN1252.  Having loaded that script file, you need to 
> revert client_encoding to whatever your terminal program is using, or 
> non-ASCII characters are going to be displayed wrong.
> 
> A bit of poking around suggests that your terminal may be operating 
> with code page 437 or similar, as 0x96 is "û" in that encoding --- 
> according to Wikipedia, at least:
> https://en.wikipedia.org/wiki/Code_page_437
> I don't think Postgres supports that as a client_encoding setting, so 
> one way or another you're going to need to switch the terminal 
> program's character set setting.

Running "chcp 1252" in your Windows console before starting psql should do the trick.

Yours,
Laurenz Albe

Confidentiality Notice::  This email, including attachments, may include non-public, proprietary, confidential or
legallyprivileged information.  If you are not an intended recipient or an authorized agent of an intended recipient,
youare hereby notified that any dissemination, distribution or copying of the information contained in or transmitted
withthis e-mail is unauthorized and strictly prohibited.  If you have received this email in error, please notify the
senderby replying to this message and permanently delete this e-mail, its attachments, and any copies of it
immediately. You should not retain, copy or use this e-mail or any attachment for any purpose, nor disclose all or any
partof the contents to any other person. Thank you.