Обсуждение: [COMMITTERS] pgsql: Add function to import operating system collations

Поиск
Список
Период
Сортировка

[COMMITTERS] pgsql: Add function to import operating system collations

От
Peter Eisentraut
Дата:
Add function to import operating system collations

Move this logic out of initdb into a user-callable function.  This
simplifies the code and makes it possible to update the standard
collations later on if additional operating system collations appear.

Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Euler Taveira <euler@timbira.com.br>

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/aa17c06fb58533d09c79c68a4d34a6f56687ee38

Modified Files
--------------
doc/src/sgml/charset.sgml             |   2 +-
doc/src/sgml/func.sgml                |  40 ++++++++
src/backend/catalog/pg_collation.c    |  31 ++++++-
src/backend/commands/collationcmds.c  | 154 ++++++++++++++++++++++++++++++-
src/bin/initdb/initdb.c               | 166 +---------------------------------
src/include/catalog/catversion.h      |   2 +-
src/include/catalog/pg_collation_fn.h |   3 +-
src/include/catalog/pg_proc.h         |   3 +
8 files changed, 229 insertions(+), 172 deletions(-)


Re: [COMMITTERS] pgsql: Add function to import operating system collations

От
Tom Lane
Дата:
Peter Eisentraut <peter_e@gmx.net> writes:
> Add function to import operating system collations

The buildfarm's not happy with this, and neither am I:

...
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... 2017-01-18 09:49:45.019 EST [25919] FATAL:  collation "aa_ER@saaho" for
encoding"UTF8" already exists 
2017-01-18 09:49:45.019 EST [25919] STATEMENT:  SELECT pg_import_system_collations(if_not_exists => false, schema =>
'pg_catalog');

child process exited with exit code 1
initdb: removing data directory "/home/postgres/testversion/data"

For reference, I get this on my RHEL6 installation:

$ locale -a | grep ^aa_ER
aa_ER
aa_ER.utf8
aa_ER.utf8@saaho
aa_ER@saaho
$

Please fix ASAP, or revert so other people can get work done.

            regards, tom lane


Re: [COMMITTERS] pgsql: Add function to import operating system collations

От
Tom Lane
Дата:
I wrote:
> running bootstrap script ... ok
> performing post-bootstrap initialization ... 2017-01-18 09:49:45.019 EST [25919] FATAL:  collation "aa_ER@saaho" for
encoding"UTF8" already exists 
> 2017-01-18 09:49:45.019 EST [25919] STATEMENT:  SELECT pg_import_system_collations(if_not_exists => false, schema =>
'pg_catalog');

As a stopgap so I could get some work done, I did

-   PG_CMD_PUTS("SELECT pg_import_system_collations(if_not_exists => false, schema => 'pg_catalog');\n\n");
+   PG_CMD_PUTS("SELECT pg_import_system_collations(if_not_exists => true, schema => 'pg_catalog');\n\n");

and what I now see in pg_collation is

regression=# select * from pg_collation where collname like 'aa_ER%';
     collname     | collnamespace | collowner | collencoding |   collcollate    |    collctype
------------------+---------------+-----------+--------------+------------------+------------------
 aa_ER            |            11 |        10 |            6 | aa_ER            | aa_ER
 aa_ER.utf8       |            11 |        10 |            6 | aa_ER.utf8       | aa_ER.utf8
 aa_ER.utf8@saaho |            11 |        10 |            6 | aa_ER.utf8@saaho | aa_ER.utf8@saaho
 aa_ER@saaho      |            11 |        10 |            6 | aa_ER.utf8@saaho | aa_ER.utf8@saaho
(4 rows)

Maybe an appropriate fix would be to ignore collations whose names aren't
equal to what we get for collcollate/collctype.  Presumably the latter
are getting canonicalized somehow.

            regards, tom lane


Re: [COMMITTERS] pgsql: Add function to import operating systemcollations

От
Euler Taveira
Дата:
On 18-01-2017 12:43, Tom Lane wrote:
> I wrote:
>> running bootstrap script ... ok
>> performing post-bootstrap initialization ... 2017-01-18 09:49:45.019 EST [25919] FATAL:  collation "aa_ER@saaho" for
encoding"UTF8" already exists 
>> 2017-01-18 09:49:45.019 EST [25919] STATEMENT:  SELECT pg_import_system_collations(if_not_exists => false, schema =>
'pg_catalog');
>
> As a stopgap so I could get some work done, I did
>
> -   PG_CMD_PUTS("SELECT pg_import_system_collations(if_not_exists => false, schema => 'pg_catalog');\n\n");
> +   PG_CMD_PUTS("SELECT pg_import_system_collations(if_not_exists => true, schema => 'pg_catalog');\n\n");
>
> and what I now see in pg_collation is
>
> regression=# select * from pg_collation where collname like 'aa_ER%';
>      collname     | collnamespace | collowner | collencoding |   collcollate    |    collctype
> ------------------+---------------+-----------+--------------+------------------+------------------
>  aa_ER            |            11 |        10 |            6 | aa_ER            | aa_ER
>  aa_ER.utf8       |            11 |        10 |            6 | aa_ER.utf8       | aa_ER.utf8
>  aa_ER.utf8@saaho |            11 |        10 |            6 | aa_ER.utf8@saaho | aa_ER.utf8@saaho
>  aa_ER@saaho      |            11 |        10 |            6 | aa_ER.utf8@saaho | aa_ER.utf8@saaho
> (4 rows)
>
> Maybe an appropriate fix would be to ignore collations whose names aren't
> equal to what we get for collcollate/collctype.  Presumably the latter
> are getting canonicalized somehow.
>
collname 'en_US' seems to be more popular than 'en_US.utf8' (it is
shorter). We can't ignore locales without .utf8 or @something because it
would break queries with COLLATE clause.

Ignore collations at initdb time seems to fix the error. However, do
collations remain the same as 9.6?


--
   Euler Taveira                   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


Re: [COMMITTERS] pgsql: Add function to import operating systemcollations

От
Peter Eisentraut
Дата:
On 1/18/17 10:43 AM, Tom Lane wrote:
> Maybe an appropriate fix would be to ignore collations whose names aren't
> equal to what we get for collcollate/collctype.  Presumably the latter
> are getting canonicalized somehow.

Yeah, it's supposed to do that.  Note that the second call to
CollationCreate() in pg_import_system_collations() has the "if not
exists" argument true regardless.  So the error appears to come from the
first one.

What's fishy about this is that I can't reproduce it locally in a
variety of VMs, and the buildfarm is not unanimous either.


--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [COMMITTERS] pgsql: Add function to import operating systemcollations

От
Peter Eisentraut
Дата:
On 1/18/17 12:23 PM, Peter Eisentraut wrote:
> Yeah, it's supposed to do that.  Note that the second call to
> CollationCreate() in pg_import_system_collations() has the "if not
> exists" argument true regardless.  So the error appears to come from the
> first one.
>
> What's fishy about this is that I can't reproduce it locally in a
> variety of VMs, and the buildfarm is not unanimous either.

OK, the problem has to do with the order of the locale -a output, which
is in turn dependent on the current locale.

I'll think of something.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [COMMITTERS] pgsql: Add function to import operating system collations

От
Tom Lane
Дата:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> What's fishy about this is that I can't reproduce it locally in a
> variety of VMs, and the buildfarm is not unanimous either.

Well, as I said, I get

$ locale -a | grep ^aa_ER
aa_ER
aa_ER.utf8
aa_ER.utf8@saaho
aa_ER@saaho

What it looks like to me is that we see "aa_ER.utf8@saaho", enter
that, strip it to "aa_ER@saaho" and enter that (if_not_exists,
which it doesn't), and then see "aa_ER@saaho" which we try to
enter and fail.  IOW, the behavior is dependent on the order in
which "locale -a" returns the names, which I already mentioned
I do not think we should trust to be consistent.

The previous coding applied a sort so as not to depend on what
order "locale -a" had returned things in, and I think we need
to retain that.  At the very least, all the normalized names
need to be saved up and entered in a second pass.

            regards, tom lane


Re: [COMMITTERS] pgsql: Add function to import operating system collations

От
Tom Lane
Дата:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> OK, the problem has to do with the order of the locale -a output, which
> is in turn dependent on the current locale.

Ah, right, it succeeds unpatched if I run initdb in en_US rather than
C locale.

            regards, tom lane


Re: [COMMITTERS] pgsql: Add function to import operating system collations

От
Tom Lane
Дата:
I wrote:
> The previous coding applied a sort so as not to depend on what
> order "locale -a" had returned things in, and I think we need
> to retain that.  At the very least, all the normalized names
> need to be saved up and entered in a second pass.

Actually, it seems like doing precisely that should be enough to fix
it.  The original names shouldn't have any dups, and if we generate
dup names by stripping, those will be for different encodings so it's
OK.  I've pushed a fix based on that.

            regards, tom lane


Re: [COMMITTERS] pgsql: Add function to import operating systemcollations

От
Peter Eisentraut
Дата:
On 1/18/17 1:46 PM, Tom Lane wrote:
> I wrote:
>> The previous coding applied a sort so as not to depend on what
>> order "locale -a" had returned things in, and I think we need
>> to retain that.  At the very least, all the normalized names
>> need to be saved up and entered in a second pass.
>
> Actually, it seems like doing precisely that should be enough to fix
> it.  The original names shouldn't have any dups, and if we generate
> dup names by stripping, those will be for different encodings so it's
> OK.  I've pushed a fix based on that.

Yeah, I was just about to write the exact same code.  Looks good, thanks.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [COMMITTERS] pgsql: Add function to import operating system collations

От
Amit Kapila
Дата:
On Wed, Jan 18, 2017 at 8:06 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> Add function to import operating system collations
>

After this commit, initdb is failing with below error on one of my VM
m/c (Linux amitkapila-centos-vm 2.6.32-358.11.1.el6.x86_64):

./initdb -D ../../data

The files belonging to this database system will be owned by user "Amitkapila".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory ../../data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... 2017-01-19 15:19:14.409
IST [3611] FATAL:  role "amitkapila" does not exist at character 150
2017-01-19 15:19:14.409 IST [3611] STATEMENT:  INSERT INTO
pg_collation (collname, collnamespace, collowner, collencoding,
collcollate, collctype) VALUES ('ucs_basic',
'pg_catalog'::regnamespace, 'Amitkapila'::regrole, 6, 'C', 'C');

I have tried on latest commit, it still fails, however trying with a
commit (193a7d791ebe2adf32b36d5538e4602a90c3e0fb), everything works
fine.  I have noticed that if I use initdb command as below, then it
passes

./initdb -D ../../data -U amitkapila

echo $USER prints
Amitkapila

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [COMMITTERS] pgsql: Add function to import operating system collations

От
Tom Lane
Дата:
Amit Kapila <amit.kapila16@gmail.com> writes:
> On Wed, Jan 18, 2017 at 8:06 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> Add function to import operating system collations

> After this commit, initdb is failing with below error on one of my VM
> m/c (Linux amitkapila-centos-vm 2.6.32-358.11.1.el6.x86_64):

> performing post-bootstrap initialization ... 2017-01-19 15:19:14.409
> IST [3611] FATAL:  role "amitkapila" does not exist at character 150
> 2017-01-19 15:19:14.409 IST [3611] STATEMENT:  INSERT INTO
> pg_collation (collname, collnamespace, collowner, collencoding,
> collcollate, collctype) VALUES ('ucs_basic',
> 'pg_catalog'::regnamespace, 'Amitkapila'::regrole, 6, 'C', 'C');

Hm.  I see that the patch randomly changed the way that the collation
owner is generated ... looks like it no longer works for mixed-case
usernames.  Perhaps follow this model instead:

    if (superuser_password)
        PG_CMD_PRINTF2("ALTER USER \"%s\" WITH PASSWORD E'%s';\n\n",
                       username, escape_quotes(superuser_password));

although TBH that doesn't look too darn safe either.  I wonder how
initdb has gotten along so far with no quote_ident() function.

            regards, tom lane


Re: [COMMITTERS] pgsql: Add function to import operating systemcollations

От
Peter Eisentraut
Дата:
On 1/19/17 7:53 AM, Tom Lane wrote:
> Hm.  I see that the patch randomly changed the way that the collation
> owner is generated ... looks like it no longer works for mixed-case
> usernames.  Perhaps follow this model instead:
>
>     if (superuser_password)
>         PG_CMD_PRINTF2("ALTER USER \"%s\" WITH PASSWORD E'%s';\n\n",
>                        username, escape_quotes(superuser_password));
>
> although TBH that doesn't look too darn safe either.  I wonder how
> initdb has gotten along so far with no quote_ident() function.

We could just use the numeric value, like in the attached patch.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: [COMMITTERS] pgsql: Add function to import operating system collations

От
Tom Lane
Дата:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 1/19/17 7:53 AM, Tom Lane wrote:
>> Hm.  I see that the patch randomly changed the way that the collation
>> owner is generated ... looks like it no longer works for mixed-case
>> usernames.  Perhaps follow this model instead:

> We could just use the numeric value, like in the attached patch.

WFM.  Btw, I noticed that BOOTSTRAP_SUPERUSERID is hard-coded as "10"
in this bit in setup_privileges():

        " (SELECT E'=r/\"$POSTGRES_SUPERUSERNAME\"' as acl "
        "  UNION SELECT unnest(pg_catalog.acldefault("
        "    CASE WHEN relkind = 'S' THEN 's' ELSE 'r' END::\"char\",10::oid))"
        " ) as a) "

Is there a reasonable way to fix that?  Maybe do another replace_token
call for it?

            regards, tom lane


Re: [COMMITTERS] pgsql: Add function to import operating systemcollations

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> > On 1/19/17 7:53 AM, Tom Lane wrote:
> >> Hm.  I see that the patch randomly changed the way that the collation
> >> owner is generated ... looks like it no longer works for mixed-case
> >> usernames.  Perhaps follow this model instead:
>
> > We could just use the numeric value, like in the attached patch.
>
> WFM.  Btw, I noticed that BOOTSTRAP_SUPERUSERID is hard-coded as "10"
> in this bit in setup_privileges():
>
>         " (SELECT E'=r/\"$POSTGRES_SUPERUSERNAME\"' as acl "
>         "  UNION SELECT unnest(pg_catalog.acldefault("
>         "    CASE WHEN relkind = 'S' THEN 's' ELSE 'r' END::\"char\",10::oid))"
>         " ) as a) "
>
> Is there a reasonable way to fix that?  Maybe do another replace_token
> call for it?

Hm.  I seem to recall trying to avoid having the hard-coded value there
but we don't have BOOTSTRAP_SUPERUSERID defined somewhere that initdb.c
could include it from, do we?  It's only in catalog/pg_authid.h.

We could re-define it in initdb.c, of course, and perhaps that'd be
better than having it hard-coded.  I'm not sure that we really want to
expose BOOTSTRAP_SUPERUSERID to regular client code, or create some
additional set of headers which are just for initdb and the backend..

Of course, I might be missing something here, but I'm pretty sure that
was my thinking when I wrote that code.

Thanks!

Stephen

Вложения

Re: [COMMITTERS] pgsql: Add function to import operating system collations

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> WFM.  Btw, I noticed that BOOTSTRAP_SUPERUSERID is hard-coded as "10"
>> in this bit in setup_privileges():

> Hm.  I seem to recall trying to avoid having the hard-coded value there
> but we don't have BOOTSTRAP_SUPERUSERID defined somewhere that initdb.c
> could include it from, do we?  It's only in catalog/pg_authid.h.

Looks to me like including catalog/pg_authid.h in initdb would work fine.
pg_upgrade does it.

            regards, tom lane


Re: [COMMITTERS] pgsql: Add function to import operating systemcollations

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> >> WFM.  Btw, I noticed that BOOTSTRAP_SUPERUSERID is hard-coded as "10"
> >> in this bit in setup_privileges():
>
> > Hm.  I seem to recall trying to avoid having the hard-coded value there
> > but we don't have BOOTSTRAP_SUPERUSERID defined somewhere that initdb.c
> > could include it from, do we?  It's only in catalog/pg_authid.h.
>
> Looks to me like including catalog/pg_authid.h in initdb would work fine.
> pg_upgrade does it.

Ah, alright then.  I'm happy to let whomever make that change, or, if no
one else wants to, I'll do it since I'm the author of those lines.

Thanks!

Stephen

Вложения

Re: [COMMITTERS] pgsql: Add function to import operating systemcollations

От
Peter Eisentraut
Дата:
On 1/19/17 11:58 AM, Tom Lane wrote:
> Stephen Frost <sfrost@snowman.net> writes:
>> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>>> WFM.  Btw, I noticed that BOOTSTRAP_SUPERUSERID is hard-coded as "10"
>>> in this bit in setup_privileges():
>
>> Hm.  I seem to recall trying to avoid having the hard-coded value there
>> but we don't have BOOTSTRAP_SUPERUSERID defined somewhere that initdb.c
>> could include it from, do we?  It's only in catalog/pg_authid.h.
>
> Looks to me like including catalog/pg_authid.h in initdb would work fine.
> pg_upgrade does it.

I have fixed that together with Amit's issue.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services