Обсуждение: Chinese initdb on Windows
On windows, if you have OS locale set to "Chinese (Simplified, PRC)", initdb fails: X:\>C:\pgsql-install\bin\initdb.exe -D data2 The files belonging to this database system will be owned by user "Heikki". This user must also own the server process. The database cluster will be initialized with locale Chinese (Simplified)_People 's Republic of China.936. initdb: locale Chinese (Simplified)_People's Republic of China.936 requires unsu pported encoding GBK Encoding GBK is not allowed as a server-side encoding. Rerun initdb with a different locale selection. The easy workaround for that is to specify --encoding=UTF-8, as UTF-8 can be used with any locale on Windows. How about doing that automatically in initdb? Now that we have the smarts in psql to detect current encoding from the environment and set client_encoding accordingly, it Just Works. Attached is a patch for that. Once you get past that, however, there's another issue: > ... > creating directory data2 ... ok > creating subdirectories ... ok > selecting default max_connections ... 100 > selecting default shared_buffers ... 32MB > creating configuration files ... ok > creating template1 database in data2/base/1 ... ok > initializing pg_authid ... FATAL: database locale is incompatible with operatin > g system > DETAIL: The database was initialized with LC_COLLATE "Chinese (Simplified)_Peoples Republic of China.936", which is notrecognized by setlocale(). > HINT: Recreate the database with another locale or install the missing locale. > child process exited with exit code 1 The problem is probably the apostrophe in the locale name, although it seems to be missing from the above error message. setlocale() has a known problem with locale names that have dots in the country name, and looks like it has similar issues with apostrophes. Fortunately, there are aliases for those problematic locales on Windows, that don't have dots or apostrophes in the names. We did some testing in EnterpriseDB of various locales on various versions of Windows, and came up with the following mappings: "*_Hong Kong S.A.R.*" -> "*_HKG.*" "*_U.A.E.*" -> "*_ARE.*" "*_People's Republic of China.*" -> "*_China.*" "China_Macau S.A.R..950" -> "ZHM" The first three mappings map the full country name to an abbreviation that is also accepted by Windows' setlocale(). See http://msdn.microsoft.com/en-us/library/cdax410z%28v=vs.71%29.aspx. ARE is not on that list, but seems to work. Macau is trickier. ZHM is not an abbreviation of the country, but of the whole locale, so we can't replace just the country part. So this will not work for "Finnish_Macau S.A.R..950", like the other mappings do. Nevertheless, it works for the common case. Any objections to the 2nd attached patch, which adds the mapping of those locale names on Windows? I'm thinking it's not too late to do this in 9.1. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Вложения
On Mon, Mar 21, 2011 at 7:29 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On windows, if you have OS locale set to "Chinese (Simplified, PRC)", initdb > fails: > > X:\>C:\pgsql-install\bin\initdb.exe -D data2 > The files belonging to this database system will be owned by user "Heikki". > This user must also own the server process. > > The database cluster will be initialized with locale Chinese > (Simplified)_People > 's Republic of China.936. > initdb: locale Chinese (Simplified)_People's Republic of China.936 requires > unsu > pported encoding GBK > Encoding GBK is not allowed as a server-side encoding. > Rerun initdb with a different locale selection. > > The easy workaround for that is to specify --encoding=UTF-8, as UTF-8 can be > used with any locale on Windows. How about doing that automatically in > initdb? Now that we have the smarts in psql to detect current encoding from > the environment and set client_encoding accordingly, it Just Works. Attached > is a patch for that. > > > Once you get past that, however, there's another issue: > >> ... >> >> creating directory data2 ... ok >> creating subdirectories ... ok >> selecting default max_connections ... 100 >> selecting default shared_buffers ... 32MB >> creating configuration files ... ok >> creating template1 database in data2/base/1 ... ok >> initializing pg_authid ... FATAL: database locale is incompatible with >> operatin >> g system >> DETAIL: The database was initialized with LC_COLLATE "Chinese >> (Simplified)_Peoples Republic of China.936", which is not recognized by >> setlocale(). >> HINT: Recreate the database with another locale or install the missing >> locale. >> child process exited with exit code 1 > > The problem is probably the apostrophe in the locale name, although it seems > to be missing from the above error message. setlocale() has a known problem > with locale names that have dots in the country name, and looks like it has > similar issues with apostrophes. > > Fortunately, there are aliases for those problematic locales on Windows, > that don't have dots or apostrophes in the names. We did some testing in > EnterpriseDB of various locales on various versions of Windows, and came up > with the following mappings: > > "*_Hong Kong S.A.R.*" -> "*_HKG.*" > "*_U.A.E.*" -> "*_ARE.*" > "*_People's Republic of China.*" -> "*_China.*" > "China_Macau S.A.R..950" -> "ZHM" > > The first three mappings map the full country name to an abbreviation that > is also accepted by Windows' setlocale(). See > http://msdn.microsoft.com/en-us/library/cdax410z%28v=vs.71%29.aspx. ARE is > not on that list, but seems to work. > > Macau is trickier. ZHM is not an abbreviation of the country, but of the > whole locale, so we can't replace just the country part. So this will not > work for "Finnish_Macau S.A.R..950", like the other mappings do. > Nevertheless, it works for the common case. > > Any objections to the 2nd attached patch, which adds the mapping of those > locale names on Windows? > > I'm thinking it's not too late to do this in 9.1. I've heard complaints a number of times from Chinese users who I believe this would help. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Any objections to the 2nd attached patch, which adds the mapping of > those locale names on Windows? I think the added initdb message isn't following our style guidelines --- it certainly doesn't match the adjacent existing message. Other than that quibble, ok here. regards, tom lane
On 22.03.2011 01:06, Tom Lane wrote: > Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: >> Any objections to the 2nd attached patch, which adds the mapping of >> those locale names on Windows? > > I think the added initdb message isn't following our style guidelines > --- it certainly doesn't match the adjacent existing message. Other > than that quibble, ok here. What you usually get is something like this: > ... > The database cluster will be initialized with locale Lithuanian_Lithuania.1257. > The default database encoding has accordingly been set to WIN1257. > initdb: could not find suitable text search configuration for locale Lithuanian_ > Lithuania.1257 > The default text search configuration will be set to "simple". > > creating directory data2 ... ok > creating subdirectories ... ok > ... And when initdb falls back to UTF-8 with the patch you get: > The database cluster will be initialized with locale Chinese (Simplified)_China.936. > Encoding GBK implied by locale is not allowed as a server-side encoding. > The default database encoding has been set to UTF8 instead. > initdb: could not find suitable text search configuration for locale Chinese (Simplified)_China.936 > The default text search configuration will be set to "simple". > > creating directory data2 ... ok > creating subdirectories ... ok > ... The new message fits in nicely with the surrounding messages IMHO. Or are you thinking that it should be more warning like, similar to the message about missing text search configuration? Something like: > The database cluster will be initialized with locale Chinese (Simplified)_China.936.> initdb: encoding GBK implied by locale is not allowed as a server-side encoding.> The default database encoding has been set to UTF8 instead.> initdb: could not find suitable textsearch configuration for locale Chinese (Simplified)_China.936> The default text search configuration will be set to "simple".>> creating directory data2... ok> creating subdirectories ... ok> ... That's fine with me as well.. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com