Обсуждение: [COMMITTERS] pgsql: Add function to import operating system collations
Add function to import operating system collations Move this logic out of initdb into a user-callable function. This simplifies the code and makes it possible to update the standard collations later on if additional operating system collations appear. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Euler Taveira <euler@timbira.com.br> Branch ------ master Details ------- http://git.postgresql.org/pg/commitdiff/aa17c06fb58533d09c79c68a4d34a6f56687ee38 Modified Files -------------- doc/src/sgml/charset.sgml | 2 +- doc/src/sgml/func.sgml | 40 ++++++++ src/backend/catalog/pg_collation.c | 31 ++++++- src/backend/commands/collationcmds.c | 154 ++++++++++++++++++++++++++++++- src/bin/initdb/initdb.c | 166 +--------------------------------- src/include/catalog/catversion.h | 2 +- src/include/catalog/pg_collation_fn.h | 3 +- src/include/catalog/pg_proc.h | 3 + 8 files changed, 229 insertions(+), 172 deletions(-)
Peter Eisentraut <peter_e@gmx.net> writes: > Add function to import operating system collations The buildfarm's not happy with this, and neither am I: ... creating configuration files ... ok running bootstrap script ... ok performing post-bootstrap initialization ... 2017-01-18 09:49:45.019 EST [25919] FATAL: collation "aa_ER@saaho" for encoding"UTF8" already exists 2017-01-18 09:49:45.019 EST [25919] STATEMENT: SELECT pg_import_system_collations(if_not_exists => false, schema => 'pg_catalog'); child process exited with exit code 1 initdb: removing data directory "/home/postgres/testversion/data" For reference, I get this on my RHEL6 installation: $ locale -a | grep ^aa_ER aa_ER aa_ER.utf8 aa_ER.utf8@saaho aa_ER@saaho $ Please fix ASAP, or revert so other people can get work done. regards, tom lane
I wrote: > running bootstrap script ... ok > performing post-bootstrap initialization ... 2017-01-18 09:49:45.019 EST [25919] FATAL: collation "aa_ER@saaho" for encoding"UTF8" already exists > 2017-01-18 09:49:45.019 EST [25919] STATEMENT: SELECT pg_import_system_collations(if_not_exists => false, schema => 'pg_catalog'); As a stopgap so I could get some work done, I did - PG_CMD_PUTS("SELECT pg_import_system_collations(if_not_exists => false, schema => 'pg_catalog');\n\n"); + PG_CMD_PUTS("SELECT pg_import_system_collations(if_not_exists => true, schema => 'pg_catalog');\n\n"); and what I now see in pg_collation is regression=# select * from pg_collation where collname like 'aa_ER%'; collname | collnamespace | collowner | collencoding | collcollate | collctype ------------------+---------------+-----------+--------------+------------------+------------------ aa_ER | 11 | 10 | 6 | aa_ER | aa_ER aa_ER.utf8 | 11 | 10 | 6 | aa_ER.utf8 | aa_ER.utf8 aa_ER.utf8@saaho | 11 | 10 | 6 | aa_ER.utf8@saaho | aa_ER.utf8@saaho aa_ER@saaho | 11 | 10 | 6 | aa_ER.utf8@saaho | aa_ER.utf8@saaho (4 rows) Maybe an appropriate fix would be to ignore collations whose names aren't equal to what we get for collcollate/collctype. Presumably the latter are getting canonicalized somehow. regards, tom lane
On 18-01-2017 12:43, Tom Lane wrote: > I wrote: >> running bootstrap script ... ok >> performing post-bootstrap initialization ... 2017-01-18 09:49:45.019 EST [25919] FATAL: collation "aa_ER@saaho" for encoding"UTF8" already exists >> 2017-01-18 09:49:45.019 EST [25919] STATEMENT: SELECT pg_import_system_collations(if_not_exists => false, schema => 'pg_catalog'); > > As a stopgap so I could get some work done, I did > > - PG_CMD_PUTS("SELECT pg_import_system_collations(if_not_exists => false, schema => 'pg_catalog');\n\n"); > + PG_CMD_PUTS("SELECT pg_import_system_collations(if_not_exists => true, schema => 'pg_catalog');\n\n"); > > and what I now see in pg_collation is > > regression=# select * from pg_collation where collname like 'aa_ER%'; > collname | collnamespace | collowner | collencoding | collcollate | collctype > ------------------+---------------+-----------+--------------+------------------+------------------ > aa_ER | 11 | 10 | 6 | aa_ER | aa_ER > aa_ER.utf8 | 11 | 10 | 6 | aa_ER.utf8 | aa_ER.utf8 > aa_ER.utf8@saaho | 11 | 10 | 6 | aa_ER.utf8@saaho | aa_ER.utf8@saaho > aa_ER@saaho | 11 | 10 | 6 | aa_ER.utf8@saaho | aa_ER.utf8@saaho > (4 rows) > > Maybe an appropriate fix would be to ignore collations whose names aren't > equal to what we get for collcollate/collctype. Presumably the latter > are getting canonicalized somehow. > collname 'en_US' seems to be more popular than 'en_US.utf8' (it is shorter). We can't ignore locales without .utf8 or @something because it would break queries with COLLATE clause. Ignore collations at initdb time seems to fix the error. However, do collations remain the same as 9.6? -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
On 1/18/17 10:43 AM, Tom Lane wrote: > Maybe an appropriate fix would be to ignore collations whose names aren't > equal to what we get for collcollate/collctype. Presumably the latter > are getting canonicalized somehow. Yeah, it's supposed to do that. Note that the second call to CollationCreate() in pg_import_system_collations() has the "if not exists" argument true regardless. So the error appears to come from the first one. What's fishy about this is that I can't reproduce it locally in a variety of VMs, and the buildfarm is not unanimous either. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1/18/17 12:23 PM, Peter Eisentraut wrote: > Yeah, it's supposed to do that. Note that the second call to > CollationCreate() in pg_import_system_collations() has the "if not > exists" argument true regardless. So the error appears to come from the > first one. > > What's fishy about this is that I can't reproduce it locally in a > variety of VMs, and the buildfarm is not unanimous either. OK, the problem has to do with the order of the locale -a output, which is in turn dependent on the current locale. I'll think of something. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > What's fishy about this is that I can't reproduce it locally in a > variety of VMs, and the buildfarm is not unanimous either. Well, as I said, I get $ locale -a | grep ^aa_ER aa_ER aa_ER.utf8 aa_ER.utf8@saaho aa_ER@saaho What it looks like to me is that we see "aa_ER.utf8@saaho", enter that, strip it to "aa_ER@saaho" and enter that (if_not_exists, which it doesn't), and then see "aa_ER@saaho" which we try to enter and fail. IOW, the behavior is dependent on the order in which "locale -a" returns the names, which I already mentioned I do not think we should trust to be consistent. The previous coding applied a sort so as not to depend on what order "locale -a" had returned things in, and I think we need to retain that. At the very least, all the normalized names need to be saved up and entered in a second pass. regards, tom lane
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > OK, the problem has to do with the order of the locale -a output, which > is in turn dependent on the current locale. Ah, right, it succeeds unpatched if I run initdb in en_US rather than C locale. regards, tom lane
I wrote: > The previous coding applied a sort so as not to depend on what > order "locale -a" had returned things in, and I think we need > to retain that. At the very least, all the normalized names > need to be saved up and entered in a second pass. Actually, it seems like doing precisely that should be enough to fix it. The original names shouldn't have any dups, and if we generate dup names by stripping, those will be for different encodings so it's OK. I've pushed a fix based on that. regards, tom lane
On 1/18/17 1:46 PM, Tom Lane wrote: > I wrote: >> The previous coding applied a sort so as not to depend on what >> order "locale -a" had returned things in, and I think we need >> to retain that. At the very least, all the normalized names >> need to be saved up and entered in a second pass. > > Actually, it seems like doing precisely that should be enough to fix > it. The original names shouldn't have any dups, and if we generate > dup names by stripping, those will be for different encodings so it's > OK. I've pushed a fix based on that. Yeah, I was just about to write the exact same code. Looks good, thanks. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jan 18, 2017 at 8:06 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > Add function to import operating system collations > After this commit, initdb is failing with below error on one of my VM m/c (Linux amitkapila-centos-vm 2.6.32-358.11.1.el6.x86_64): ./initdb -D ../../data The files belonging to this database system will be owned by user "Amitkapila". This user must also own the server process. The database cluster will be initialized with locale "en_US.UTF-8". The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "english". Data page checksums are disabled. fixing permissions on existing directory ../../data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting dynamic shared memory implementation ... posix creating configuration files ... ok running bootstrap script ... ok performing post-bootstrap initialization ... 2017-01-19 15:19:14.409 IST [3611] FATAL: role "amitkapila" does not exist at character 150 2017-01-19 15:19:14.409 IST [3611] STATEMENT: INSERT INTO pg_collation (collname, collnamespace, collowner, collencoding, collcollate, collctype) VALUES ('ucs_basic', 'pg_catalog'::regnamespace, 'Amitkapila'::regrole, 6, 'C', 'C'); I have tried on latest commit, it still fails, however trying with a commit (193a7d791ebe2adf32b36d5538e4602a90c3e0fb), everything works fine. I have noticed that if I use initdb command as below, then it passes ./initdb -D ../../data -U amitkapila echo $USER prints Amitkapila -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Amit Kapila <amit.kapila16@gmail.com> writes: > On Wed, Jan 18, 2017 at 8:06 PM, Peter Eisentraut <peter_e@gmx.net> wrote: >> Add function to import operating system collations > After this commit, initdb is failing with below error on one of my VM > m/c (Linux amitkapila-centos-vm 2.6.32-358.11.1.el6.x86_64): > performing post-bootstrap initialization ... 2017-01-19 15:19:14.409 > IST [3611] FATAL: role "amitkapila" does not exist at character 150 > 2017-01-19 15:19:14.409 IST [3611] STATEMENT: INSERT INTO > pg_collation (collname, collnamespace, collowner, collencoding, > collcollate, collctype) VALUES ('ucs_basic', > 'pg_catalog'::regnamespace, 'Amitkapila'::regrole, 6, 'C', 'C'); Hm. I see that the patch randomly changed the way that the collation owner is generated ... looks like it no longer works for mixed-case usernames. Perhaps follow this model instead: if (superuser_password) PG_CMD_PRINTF2("ALTER USER \"%s\" WITH PASSWORD E'%s';\n\n", username, escape_quotes(superuser_password)); although TBH that doesn't look too darn safe either. I wonder how initdb has gotten along so far with no quote_ident() function. regards, tom lane
On 1/19/17 7:53 AM, Tom Lane wrote: > Hm. I see that the patch randomly changed the way that the collation > owner is generated ... looks like it no longer works for mixed-case > usernames. Perhaps follow this model instead: > > if (superuser_password) > PG_CMD_PRINTF2("ALTER USER \"%s\" WITH PASSWORD E'%s';\n\n", > username, escape_quotes(superuser_password)); > > although TBH that doesn't look too darn safe either. I wonder how > initdb has gotten along so far with no quote_ident() function. We could just use the numeric value, like in the attached patch. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 1/19/17 7:53 AM, Tom Lane wrote: >> Hm. I see that the patch randomly changed the way that the collation >> owner is generated ... looks like it no longer works for mixed-case >> usernames. Perhaps follow this model instead: > We could just use the numeric value, like in the attached patch. WFM. Btw, I noticed that BOOTSTRAP_SUPERUSERID is hard-coded as "10" in this bit in setup_privileges(): " (SELECT E'=r/\"$POSTGRES_SUPERUSERNAME\"' as acl " " UNION SELECT unnest(pg_catalog.acldefault(" " CASE WHEN relkind = 'S' THEN 's' ELSE 'r' END::\"char\",10::oid))" " ) as a) " Is there a reasonable way to fix that? Maybe do another replace_token call for it? regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > > On 1/19/17 7:53 AM, Tom Lane wrote: > >> Hm. I see that the patch randomly changed the way that the collation > >> owner is generated ... looks like it no longer works for mixed-case > >> usernames. Perhaps follow this model instead: > > > We could just use the numeric value, like in the attached patch. > > WFM. Btw, I noticed that BOOTSTRAP_SUPERUSERID is hard-coded as "10" > in this bit in setup_privileges(): > > " (SELECT E'=r/\"$POSTGRES_SUPERUSERNAME\"' as acl " > " UNION SELECT unnest(pg_catalog.acldefault(" > " CASE WHEN relkind = 'S' THEN 's' ELSE 'r' END::\"char\",10::oid))" > " ) as a) " > > Is there a reasonable way to fix that? Maybe do another replace_token > call for it? Hm. I seem to recall trying to avoid having the hard-coded value there but we don't have BOOTSTRAP_SUPERUSERID defined somewhere that initdb.c could include it from, do we? It's only in catalog/pg_authid.h. We could re-define it in initdb.c, of course, and perhaps that'd be better than having it hard-coded. I'm not sure that we really want to expose BOOTSTRAP_SUPERUSERID to regular client code, or create some additional set of headers which are just for initdb and the backend.. Of course, I might be missing something here, but I'm pretty sure that was my thinking when I wrote that code. Thanks! Stephen
Вложения
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> WFM. Btw, I noticed that BOOTSTRAP_SUPERUSERID is hard-coded as "10" >> in this bit in setup_privileges(): > Hm. I seem to recall trying to avoid having the hard-coded value there > but we don't have BOOTSTRAP_SUPERUSERID defined somewhere that initdb.c > could include it from, do we? It's only in catalog/pg_authid.h. Looks to me like including catalog/pg_authid.h in initdb would work fine. pg_upgrade does it. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >> WFM. Btw, I noticed that BOOTSTRAP_SUPERUSERID is hard-coded as "10" > >> in this bit in setup_privileges(): > > > Hm. I seem to recall trying to avoid having the hard-coded value there > > but we don't have BOOTSTRAP_SUPERUSERID defined somewhere that initdb.c > > could include it from, do we? It's only in catalog/pg_authid.h. > > Looks to me like including catalog/pg_authid.h in initdb would work fine. > pg_upgrade does it. Ah, alright then. I'm happy to let whomever make that change, or, if no one else wants to, I'll do it since I'm the author of those lines. Thanks! Stephen
Вложения
On 1/19/17 11:58 AM, Tom Lane wrote: > Stephen Frost <sfrost@snowman.net> writes: >> * Tom Lane (tgl@sss.pgh.pa.us) wrote: >>> WFM. Btw, I noticed that BOOTSTRAP_SUPERUSERID is hard-coded as "10" >>> in this bit in setup_privileges(): > >> Hm. I seem to recall trying to avoid having the hard-coded value there >> but we don't have BOOTSTRAP_SUPERUSERID defined somewhere that initdb.c >> could include it from, do we? It's only in catalog/pg_authid.h. > > Looks to me like including catalog/pg_authid.h in initdb would work fine. > pg_upgrade does it. I have fixed that together with Amit's issue. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services