Обсуждение: could not determine encoding for locale "et_EE.UTF-8": codeset is "CPUTF-8" in pg_restore

Поиск
Список
Период
Сортировка

could not determine encoding for locale "et_EE.UTF-8": codeset is "CPUTF-8" in pg_restore

От
"Andrus"
Дата:
Hi!

Postgres 12 database dump is created in Debian 10 using pg_dump .

Trying to restore it in Windows 10  using

pg_restore --clean --create --if-exists --dbname=postgres --jobs=8 --no-password 
mydb.backup

produces strange message

pg_restore: WARNING:  could not determine encoding for locale "et_EE.UTF-8": 
codeset is "CPUTF-8"

How to fix this ?

Debian and Windows computer have same settings:

Latest Postgres 12  is used
OS and database locales are  Estonian
Database encoding is UTF-8

Andrus. 




Re: could not determine encoding for locale "et_EE.UTF-8": codeset is"CPUTF-8" in pg_restore

От
Adrian Klaver
Дата:
On 3/28/20 2:39 PM, Andrus wrote:
> Hi!
> 
> Postgres 12 database dump is created in Debian 10 using pg_dump .
> 
> Trying to restore it in Windows 10  using
> 
> pg_restore --clean --create --if-exists --dbname=postgres --jobs=8 
> --no-password mydb.backup
> 
> produces strange message
> 
> pg_restore: WARNING:  could not determine encoding for locale 
> "et_EE.UTF-8": codeset is "CPUTF-8"

In the Debian Postgres instance in psql what does \l show for the databases?

In the Windows 10 command prompt what does systeminfo show?


> 
> How to fix this ?
> 
> Debian and Windows computer have same settings:
> 
> Latest Postgres 12  is used
> OS and database locales are  Estonian
> Database encoding is UTF-8
> 
> Andrus.
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Hi!

>In the Debian Postgres instance in psql what does \l show for the 
>databases?

#psql namm postgres
psql (12.2 (Debian 12.2-2.pgdg100+1))
Type "help" for help.

namm=# \l namm
                                  List of databases
Name |   Owner    | Encoding |   Collate   |    Ctype    |     Access 
privileges
------+------------+----------+-------------+-------------+---------------------------
namm | namm_owner | UTF8     | et_EE.UTF-8 | et_EE.UTF-8 | =Tc/namm_owner 
+
      |            |          |             |             | 
namm_owner=CTc/namm_owner
(1 row)

>In the Windows 10 command prompt what does systeminfo show?

Host Name:                 SERVER2
OS Name:                   Microsoft Windows 10 Pro
OS Version:                10.0.18363 N/A Build 18363
OS Manufacturer:           Microsoft Corporation
OS Configuration:          Standalone Workstation
OS Build Type:             Multiprocessor Free
Registered Owner:          Windows User
Registered Organization:
Product ID:                00330-70008-16217-AAOEM
Original Install Date:     05.09.2019, 9:16:41
System Boot Time:          28.03.2020, 11:05:23
System Manufacturer:       Gigabyte Technology Co., Ltd.
System Model:              Q270M-D3H
System Type:               x64-based PC
Processor(s):              1 Processor(s) Installed.
                           [01]: Intel64 Family 6 Model 158 Stepping 9 
GenuineIntel ~3601 Mhz
BIOS Version:              American Megatrends Inc. F1, 09.01.2017
Windows Directory:         C:\WINDOWS
System Directory:          C:\WINDOWS\system32
Boot Device:               \Device\HarddiskVolume3
System Locale:             et;Eesti
Input Locale:              et;Eesti
Time Zone:                 (UTC+02:00) Helsingi, Kiiev, Riia, Sofia, 
Tallinn, Vilnius
Total Physical Memory:     16 286 MB
Available Physical Memory: 12 032 MB
Virtual Memory: Max Size:  18 718 MB
Virtual Memory: Available: 14 867 MB
Virtual Memory: In Use:    3 851 MB
Page File Location(s):     C:\pagefile.sys
Domain:                    WORKGROUP
Logon Server:              \\SERVER2
Hotfix(s):                 18 Hotfix(s) Installed.
                           [01]: KB4534132
                           [02]: KB4497165
                           [03]: KB4498523
                           [04]: KB4503308
                           [05]: KB4515383
                           [06]: KB4515530
                           [07]: KB4516115
                           [08]: KB4517245
                           [09]: KB4520390
                           [10]: KB4521863
                           [11]: KB4524244
                           [12]: KB4524569
                           [13]: KB4528759
                           [14]: KB4532441
                           [15]: KB4537759
                           [16]: KB4538674
                           [17]: KB4541338
                           [18]: KB4551762
Network Card(s):           2 NIC(s) Installed.
                           [01]: TAP-Windows Adapter V9
                                 Connection Name: Ethernet 4
                                 Status:          Media disconnected
                           [02]: Intel(R) Ethernet Connection (2) I219-LM
                                 Connection Name: Ethernet 3
                                 DHCP Enabled:    Yes
                                 DHCP Server:     192.168.91.1
                                 IP address(es)
                                 [01]: 192.168.91.154
                                 [02]: fe80::94d:b1c:3945:bc8a
                                 [03]: 
2001:7d0:4c83:4c80:257f:b077:e1f7:21e1
                                 [04]: 2001:7d0:4c83:4c80:94d:b1c:3945:bc8a
Hyper-V Requirements:      VM Monitor Mode Extensions: Yes
                           Virtualization Enabled In Firmware: Yes
                           Second Level Address Translation: Yes
                           Data Execution Prevention Available: Yes

Andrus.




Re: could not determine encoding for locale "et_EE.UTF-8": codeset is"CPUTF-8" in pg_restore

От
Adrian Klaver
Дата:
On 3/28/20 3:31 PM, Andrus wrote:
> Hi!
> 
>> In the Debian Postgres instance in psql what does \l show for the 
>> databases?
> 
> #psql namm postgres
> psql (12.2 (Debian 12.2-2.pgdg100+1))
> Type "help" for help.
> 
> namm=# \l namm
>                                   List of databases
> Name |   Owner    | Encoding |   Collate   |    Ctype    |     Access 
> privileges
> ------+------------+----------+-------------+-------------+--------------------------- 
> 
> namm | namm_owner | UTF8     | et_EE.UTF-8 | et_EE.UTF-8 | =Tc/namm_owner +
>       |            |          |             |             | 
> namm_owner=CTc/namm_owner
> (1 row)
> 
>> In the Windows 10 command prompt what does systeminfo show?
> 

> System Locale:             et;Eesti
> Input Locale:              et;Eesti

Hmm, I was expecting to see et_EE though I will admit to not truly 
understanding how Windows does locales.

I should have asked earlier, in the Postgres instance on Windows what 
does \l show for template0?

> 
> Andrus.
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



"Andrus" <kobruleht2@hot.ee> writes:
> Postgres 12 database dump is created in Debian 10 using pg_dump .
> Trying to restore it in Windows 10  using
> pg_restore --clean --create --if-exists --dbname=postgres --jobs=8 --no-password
> mydb.backup
> produces strange message
> pg_restore: WARNING:  could not determine encoding for locale "et_EE.UTF-8":
> codeset is "CPUTF-8"

> How to fix this ?

Generally speaking, locale names from Unix systems won't work at all on
Windows.  You need to create the database manually with whatever seems
to be the closest Windows locale match, and then restore its contents
without using --create.

(I do agree that that message isn't the most helpful thing.  It looks
like chklocale.c is overoptimistically assuming that what it's handed
is valid, even if GetLocaleInfoEx says it isn't.)

            regards, tom lane



I wrote:
> (I do agree that that message isn't the most helpful thing.  It looks
> like chklocale.c is overoptimistically assuming that what it's handed
> is valid, even if GetLocaleInfoEx says it isn't.)

After further digging, that optimism isn't *completely* without
foundation; it seems to be based on the fact that we know setlocale()
thought the locale string was OK.  Which is interesting --- apparently
Microsoft is more willing to accept Unix-style locale names than
I thought.  But they couldn't be bothered to make GetLocaleInfoEx()
and setlocale() take the same set of strings ...

            regards, tom lane



Hi!

>> System Locale:             et;Eesti
>> Input Locale:              et;Eesti
>Hmm, I was expecting to see et_EE though I will admit to not truly 
>understanding how Windows does locales.
>I should have asked earlier, in the Postgres instance on Windows what does 
>\l show for template0?

"D:\Program Files\PostgreSQL\12\bin\psql"  postgres postgres

psql (12.2)
WARNING: Console code page (775) differs from Windows code page (1257)
         8-bit characters might not work correctly. See psql reference
         page "Notes for Windows users" for details.
Type "help" for help.

postgres=# \l template0
                                            List of databases
   Name    |  Owner   | Encoding |        Collate        |         Ctype 
|   Access privileges
-----------+----------+----------+-----------------------+-----------------------+-----------------------
template0 | postgres | UTF8     | Estonian_Estonia.1257 | 
Estonian_Estonia.1257 | =c/postgres          +
           |          |          |                       | 
| postgres=CTc/postgres
(1 row)

Andrus. 




Hi!

>Generally speaking, locale names from Unix systems won't work at all on
>Windows.  You need to create the database manually with whatever seems
>to be the closest Windows locale match, and then restore its contents
>without using --create.

This is unattended script running in every night from .bat file.

How to replace pg_restore --create  option with   psql and/or createdb calls
and specify proper locale for them ?

Currently everthing has "Estonian_Estonia.1257"  locale in windows.
Which locale name should be specified in Windows instead of this?
Or maybe creating new template with proper encoding or changing template0 
encoding helps?

Andrus. 




Re: could not determine encoding for locale "et_EE.UTF-8": codeset is"CPUTF-8" in pg_restore

От
Adrian Klaver
Дата:
On 3/29/20 12:56 AM, Andrus wrote:
> Hi!
> 
>>> System Locale:             et;Eesti
>>> Input Locale:              et;Eesti
>> Hmm, I was expecting to see et_EE though I will admit to not truly 
>> understanding how Windows does locales.
>> I should have asked earlier, in the Postgres instance on Windows what 
>> does \l show for template0?


I rarely use Windows anymore so take the below with that in mind.

> 
> "D:\Program Files\PostgreSQL\12\bin\psql"  postgres postgres
> 
> psql (12.2)
> WARNING: Console code page (775) differs from Windows code page (1257)
>          8-bit characters might not work correctly. See psql reference
>          page "Notes for Windows users" for details.

There seems to a difference of opinion of what Baltic Code Page to use:

https://en.wikipedia.org/wiki/Code_page_775

https://en.wikipedia.org/wiki/Windows-1257

The post below shows a users method of dealing with this for another CP:

https://www.postgresql.org/message-id/549275CC.4010607%40gmail.com




> Type "help" for help.
> 
> postgres=# \l template0
>                                             List of databases
>    Name    |  Owner   | Encoding |        Collate        |         Ctype 
> |   Access privileges
> -----------+----------+----------+-----------------------+-----------------------+----------------------- 
> 
> template0 | postgres | UTF8     | Estonian_Estonia.1257 | 
> Estonian_Estonia.1257 | =c/postgres          +
>            |          |          |                       | | 

I'm guessing it is picking up Estonian_Estonia.1257 from the system.

The Windows Postgres instance was installed from the EDB installer?

> postgres=CTc/postgres
> (1 row)
> 
> Andrus.


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: could not determine encoding for locale "et_EE.UTF-8": codeset is"CPUTF-8" in pg_restore

От
Adrian Klaver
Дата:
On 3/29/20 1:02 AM, Andrus wrote:
> Hi!
> 
>> Generally speaking, locale names from Unix systems won't work at all on
>> Windows.  You need to create the database manually with whatever seems
>> to be the closest Windows locale match, and then restore its contents
>> without using --create.
> 
> This is unattended script running in every night from .bat file.

Per my previous post, you might try adding something like:

cmd.exe /c chcp 1257

to the top of the batch file. This idea came from here:


https://stackoverflow.com/questions/20794035/postgresql-warning-console-code-page-437-differs-from-windows-code-page-125

> 
> How to replace pg_restore --create  option with   psql and/or createdb 
> calls
> and specify proper locale for them ?
> 
> Currently everthing has "Estonian_Estonia.1257"  locale in windows.
> Which locale name should be specified in Windows instead of this?
> Or maybe creating new template with proper encoding or changing 
> template0 encoding helps?

If I am following the error correctly then the issue is that the 
Postgres console programs are using CP755 and that is not something for 
which there is an automatic conversion:

https://www.postgresql.org/docs/12/multibyte.html#id-1.6.10.5.7

There is a conversion for 1257 clients so having your console run as 
1257 should solve the problem. Someone with more current experience on 
Windows will need to comment on whether that is the viable or best solution.

> 
> Andrus.
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Hi!

>There seems to a difference of opinion of what Baltic Code Page to use:
>https://en.wikipedia.org/wiki/Code_page_775
>https://en.wikipedia.org/wiki/Windows-1257
>The post below shows a users method of dealing with this for another CP:
>https://www.postgresql.org/message-id/549275CC.4010607%40gmail.com

Console code page 775 message appears since psql is console application 
running from command line.

It does not have any relation to pg_dump/pg_restore issue since console code 
page is not used in this case.

There is Estonian locale everywhere.
Maybe this warning is harmless since Linux code page is ignored and default 
collation is used.
All table definitions in restored database contain references to default 
collation:

CREATE TABLE firma1.acquirpo
(
    kassanr numeric(3,0) NOT NULL,
    policyid character(2) COLLATE pg_catalog."default" NOT NULL,
    trantype character(6) COLLATE pg_catalog."default",
    tacdefault character(10) COLLATE pg_catalog."default",
    tacdenial character(10) COLLATE pg_catalog."default",
    taconline character(10) COLLATE pg_catalog."default",
    floorlimit numeric(12,0),
    randselthr numeric(12,0),
    minrandper numeric(2,0),
    maxrandper numeric(2,0),
    CONSTRAINT acquirpo_pkey PRIMARY KEY (kassanr, policyid)
)

TABLESPACE pg_default;

Same warning appears two times. This command execute by pg_restore probably 
causes this (harmless?)  warning:

CREATE DATABASE mydb
    WITH
    OWNER = mydb_owner
    ENCODING = 'UTF8'
    LC_COLLATE = 'et_EE.UTF-8'
    LC_CTYPE = 'et_EE.UTF-8'
    TABLESPACE = pg_default
    CONNECTION LIMIT = -1;

If new database is created manually in windows cluster the following command 
is generated:


CREATE DATABASE mydbmanually
    WITH
    OWNER = mydbmanually_owner
    ENCODING = 'UTF8'
    LC_COLLATE = 'Estonian_Estonia.1257'
    LC_CTYPE = 'Estonian_Estonia.1257'
    TABLESPACE = pg_default
    CONNECTION LIMIT = -1;

Andrus.




Hi!

>Per my previous post, you might try adding something like:
>cmd.exe /c chcp 1257
>to the top of the batch file. This idea came from here:

>https://stackoverflow.com/questions/20794035/postgresql-warning-console-code-page-437-differs-from-windows-code-page-125
>If I am following the error correctly then the issue is that the Postgres 
>console programs are using CP755 and that is not something for which there 
>is an automatic conversion:
>https://www.postgresql.org/docs/12/multibyte.html#id-1.6.10.5.7
>There is a conversion for 1257 clients so having your console run as 1257 
>should solve the problem. Someone with more current experience on Windows 
>will need to comment on whether that is the viable or best solution.

Both servers have UTF-8 encoding.

Non-unicode code page 755 referes only to command line applications like 
psql.

Postgres service, pg_dump and pg_restore do not use console codepages any 
way, they operate using only UTF-8 character set since both databases are in 
UTF-8

I think console code page warning message is not related to this issue.

Andrus.




Re: could not determine encoding for locale "et_EE.UTF-8": codeset is"CPUTF-8" in pg_restore

От
Adrian Klaver
Дата:
On 3/29/20 11:12 AM, Andrus wrote:
> Hi!
> 
>> There seems to a difference of opinion of what Baltic Code Page to use:
>> https://en.wikipedia.org/wiki/Code_page_775
>> https://en.wikipedia.org/wiki/Windows-1257
>> The post below shows a users method of dealing with this for another CP:
>> https://www.postgresql.org/message-id/549275CC.4010607%40gmail.com
> 
> Console code page 775 message appears since psql is console application 
> running from command line.
> 
> It does not have any relation to pg_dump/pg_restore issue since console 
> code page is not used in this case.
> 
> There is Estonian locale everywhere.
> Maybe this warning is harmless since Linux code page is ignored and 
> default collation is used.
> All table definitions in restored database contain references to default 
> collation:
> 
> CREATE TABLE firma1.acquirpo
> (
>     kassanr numeric(3,0) NOT NULL,
>     policyid character(2) COLLATE pg_catalog."default" NOT NULL,
>     trantype character(6) COLLATE pg_catalog."default",
>     tacdefault character(10) COLLATE pg_catalog."default",
>     tacdenial character(10) COLLATE pg_catalog."default",
>     taconline character(10) COLLATE pg_catalog."default",
>     floorlimit numeric(12,0),
>     randselthr numeric(12,0),
>     minrandper numeric(2,0),
>     maxrandper numeric(2,0),
>     CONSTRAINT acquirpo_pkey PRIMARY KEY (kassanr, policyid)
> )
> 
> TABLESPACE pg_default;
> 
> Same warning appears two times. This command execute by pg_restore 
> probably causes this (harmless?)  warning:

What warning?

I cranked up a Windows 7 instance and tried to migrate a Postgres 11 
database from Ubuntu and it failed on the CREATE DATABASE step because 
of this line in the dump file:

CREATE DATABASE redmine WITH TEMPLATE = template0 ENCODING = 'UTF8' 
LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8';

When I manually changed it in the plain text version of the dump file to:

CREATE DATABASE redmine WITH TEMPLATE = template0 ENCODING = 'UTF8' 
LC_COLLATE = 'English_United States.1252' LC_CTYPE = 'English_United 
States.1252';

borrowing from Tom's suggestion here:

https://www.postgresql.org/message-id/22672.1585442578%40sss.pgh.pa.us

it worked.

So basically what you see below.

If you followed Tom's suggestion fully you could restore into a manually 
created database on the Windows side by dropping the --create and 
pointing -d at the previously created database.

> 
> CREATE DATABASE mydb
>     WITH
>     OWNER = mydb_owner
>     ENCODING = 'UTF8'
>     LC_COLLATE = 'et_EE.UTF-8'
>     LC_CTYPE = 'et_EE.UTF-8'
>     TABLESPACE = pg_default
>     CONNECTION LIMIT = -1;
> 
> If new database is created manually in windows cluster the following 
> command is generated:
> 
> 
> CREATE DATABASE mydbmanually
>     WITH
>     OWNER = mydbmanually_owner
>     ENCODING = 'UTF8'
>     LC_COLLATE = 'Estonian_Estonia.1257'
>     LC_CTYPE = 'Estonian_Estonia.1257'
>     TABLESPACE = pg_default
>     CONNECTION LIMIT = -1;
> 
> Andrus.
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Hi!

>> Same warning appears two times. This command execute by pg_restore 
>> probably causes this (harmless?)  warning:
>What warning?

pg_restore: WARNING:  could not determine encoding for locale "et_EE.UTF-8":
codeset is "CPUTF-8"

>I cranked up a Windows 7 instance and tried to migrate a Postgres 11 
>database from Ubuntu and it failed on the CREATE DATABASE step because of 
>this line in the dump file:
>CREATE DATABASE redmine WITH TEMPLATE = template0 ENCODING = 'UTF8' 
>LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8';

I ran this statemnt it in Windows 10 with Postgres 12 successfully. Result 
was:

WARNING:  could not determine encoding for locale "en_US.UTF-8": codeset is 
"CPUTF-8"
WARNING:  could not determine encoding for locale "en_US.UTF-8": codeset is 
"CPUTF-8"
CREATE DATABASE

Query returned successfully in 1 secs 75 msec.

redmine database was created. I dont understand why it failed in your test.

>When I manually changed it in the plain text version of the dump file to:
>CREATE DATABASE redmine WITH TEMPLATE = template0 ENCODING = 'UTF8' 
>LC_COLLATE = 'English_United States.1252' LC_CTYPE = 'English_United 
>States.1252';

I verifed that data was restored using pg_restore without manually changing 
anything.

Andrus.




Re: could not determine encoding for locale "et_EE.UTF-8": codeset is"CPUTF-8" in pg_restore

От
Adrian Klaver
Дата:
On 3/29/20 2:47 PM, Andrus wrote:
> Hi!
> 
>>> Same warning appears two times. This command execute by pg_restore 
>>> probably causes this (harmless?)  warning:
>> What warning?
> 
> pg_restore: WARNING:  could not determine encoding for locale 
> "et_EE.UTF-8":
> codeset is "CPUTF-8"
> 
>> I cranked up a Windows 7 instance and tried to migrate a Postgres 11 
>> database from Ubuntu and it failed on the CREATE DATABASE step because 
>> of this line in the dump file:
>> CREATE DATABASE redmine WITH TEMPLATE = template0 ENCODING = 'UTF8' 
>> LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8';
> 
> I ran this statemnt it in Windows 10 with Postgres 12 successfully. 
> Result was:
> 
> WARNING:  could not determine encoding for locale "en_US.UTF-8": codeset 
> is "CPUTF-8"
> WARNING:  could not determine encoding for locale "en_US.UTF-8": codeset 
> is "CPUTF-8"
> CREATE DATABASE
> 
> Query returned successfully in 1 secs 75 msec.
> 
> redmine database was created. I dont understand why it failed in your test.

Not sure but:

1) I was on Windows 7

2) Using Postgres 11

3) My Windows skills have atrophied, especially with the Windows command 
line.

> 
>> When I manually changed it in the plain text version of the dump file to:
>> CREATE DATABASE redmine WITH TEMPLATE = template0 ENCODING = 'UTF8' 
>> LC_COLLATE = 'English_United States.1252' LC_CTYPE = 'English_United 
>> States.1252';
> 
> I verifed that data was restored using pg_restore without manually 
> changing anything.

So was this the same for the database you originally posted about, it 
actually restored it just threw warnings?

If so I misunderstood the situation and thought the database was not 
loading.

> 
> Andrus.
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Hi!

>Not sure but:
>1) I was on Windows 7
>2) Using Postgres 11
>3) My Windows skills have atrophied, especially with the Windows command 
>line.
>So was this the same for the database you originally posted about, it 
>actually restored it just threw warnings?

Looks like it restored.  I havent checked restored data.

>If so I misunderstood the situation and thought the database was not 
>loading.

I tried

CREATE DATABASE redmine
    WITH
    ENCODING = 'UTF8'
    LC_COLLATE = 'foo'
    LC_CTYPE = 'bar' template template0

in Linux and in Windows using Postgres 12.2
In Linux it throws error

ERROR:  invalid locale name: "foo"

In Windows it creates database and throws warning only.
Without template template0 clause it throws error in Windows also.

In Linux

CREATE DATABASE redmine WITH TEMPLATE = template0 ENCODING = 'UTF8'
LC_COLLATE = 'English_United States.1252' LC_CTYPE = 'English_United
States.1252';


also throws error

ERROR:  invalid locale name: "English_United States.1252"

So it looks like pg_dump/pg_restore with --create works only from Linux to 
Windows and does not work from Windows to Linux.

I expect that it should work from Windows to Linux also.

Andrus.