Обсуждение: Have pg_basebackup write "dbname" in "primary_conninfo"?

Поиск
Список
Период
Сортировка

Have pg_basebackup write "dbname" in "primary_conninfo"?

От
Ian Lawrence Barwick
Дата:
Hi

Hi


With the addition of "pg_sync_replication_slots()", there is now a use-case for
including "dbname" in "primary_conninfo" and the docs have changed from
stating [1]:

  Do not specify a database name in the primary_conninfo string.

to [2]:

  For replication slot synchronization (see Section 48.2.3), it is also
  necessary to specify a valid dbname in the primary_conninfo string. This will
  only be used for slot synchronization. It is ignored for streaming.

[1] https://www.postgresql.org/docs/16/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-STANDBY
[2] https://www.postgresql.org/docs/devel/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-STANDBY

However, when setting up a standby (with the intent of creating a logical
standby) with pg_basebackup, providing the -R/-write-recovery-conf option
results in a "primary_conninfo" string being generated without a "dbname"
parameter, which requires a manual change should one intend to use
"pg_sync_replication_slots()".

I can't see any reason for continuing to omit "dbname", so suggest it should
only continue to be omitted for 16 and earlier; see attached patch.

Note that this does mean that if the conninfo string provided to pg_basebackup
does not contain "dbname", the generated "primary_conninfo" GUC will default to
"dbname=replication". It would be easy enough to suppress this, but AFAICS
there's no way to tell if it was explicitly supplied by the user, in which case
it would be surprising if it were omitted from the final "primary_conninfo"
string.

The only other place where GenerateRecoveryConfig() is called is pg_rewind;
I can't see any adverse affects from the proposed change. But it's perfectly
possible there's something blindingly obvious I'm overlooking.



Regards

Ian Barwick

Вложения

Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
Jelte Fennema-Nio
Дата:
On Tue, 20 Feb 2024 at 00:34, Ian Lawrence Barwick <barwick@gmail.com> wrote:
> With the addition of "pg_sync_replication_slots()", there is now a use-case for
> including "dbname" in "primary_conninfo" and the docs have changed from
> stating [1]:
>
>   Do not specify a database name in the primary_conninfo string.
>
> to [2]:
>
>   For replication slot synchronization (see Section 48.2.3), it is also
>   necessary to specify a valid dbname in the primary_conninfo string. This will
>   only be used for slot synchronization. It is ignored for streaming.
>
> [1] https://www.postgresql.org/docs/16/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-STANDBY
> [2] https://www.postgresql.org/docs/devel/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-STANDBY

Sounds like that documentation should be updated in the same way as
was done for pg_basebackup/pg_receivewal in commit cca97ce6a665. When
considering middleware/proxies having dbname in there can be useful
even for older PG versions.

> I can't see any reason for continuing to omit "dbname", so suggest it should
> only continue to be omitted for 16 and earlier; see attached patch.

Yeah, that seems like a good change. Though, I'm wondering if the
version check is actually necessary.



Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
Amit Kapila
Дата:
On Tue, Feb 20, 2024 at 5:04 AM Ian Lawrence Barwick <barwick@gmail.com> wrote:
>
>
> With the addition of "pg_sync_replication_slots()", there is now a use-case for
> including "dbname" in "primary_conninfo" and the docs have changed from
> stating [1]:
>
>   Do not specify a database name in the primary_conninfo string.
>
> to [2]:
>
>   For replication slot synchronization (see Section 48.2.3), it is also
>   necessary to specify a valid dbname in the primary_conninfo string. This will
>   only be used for slot synchronization. It is ignored for streaming.
>
> [1] https://www.postgresql.org/docs/16/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-STANDBY
> [2] https://www.postgresql.org/docs/devel/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-STANDBY
>
> However, when setting up a standby (with the intent of creating a logical
> standby) with pg_basebackup, providing the -R/-write-recovery-conf option
> results in a "primary_conninfo" string being generated without a "dbname"
> parameter, which requires a manual change should one intend to use
> "pg_sync_replication_slots()".
>
> I can't see any reason for continuing to omit "dbname", so suggest it should
> only continue to be omitted for 16 and earlier; see attached patch.
>
> Note that this does mean that if the conninfo string provided to pg_basebackup
> does not contain "dbname", the generated "primary_conninfo" GUC will default to
> "dbname=replication".
>

When I tried, it defaulted to user name:
Default case connection string:
primary_conninfo = 'user=KapilaAm
passfile=''C:\\\\Users\\\\kapilaam\\\\AppData\\\\Roaming/postgresql/pgpass.conf''
channel_binding=disable dbname=KapilaAm port=5432 sslmode=disable
sslcompression=0 sslcertmode=disable sslsni=1
ssl_min_protocol_version=TLSv1.2 gssencmode=disable
krbsrvname=postgres gssdelegation=0 target_session_attrs=any
load_balance_hosts=disable'

When I specified -d "dbname = postgres" during backup:
primary_conninfo = 'user=KapilaAm
passfile=''C:\\\\Users\\\\kapilaam\\\\AppData\\\\Roaming/postgresql/pgpass.conf''
channel_binding=disable dbname=KapilaAm port=5432 sslmode=disable
sslcompression=0 sslcertmode=disable sslsni=1
ssl_min_protocol_version=TLSv1.2 gssencmode=disable
krbsrvname=postgres gssdelegation=0 target_session_attrs=any
load_balance_hosts=disable'

I think it makes sense to include dbname in the connection string
programmatically (with some option) for slot sync functionality but it
is not clear to me if there is any impact of the same when the standby
is not set up to sync up logical slots. I haven't checked but if there
is a way to distinguish the case where we write dbname only when
specified by the user then that would be great but this is worth
considering even without that.

--
With Regards,
Amit Kapila.



RE: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Ian,

Thanks for making the patch.

> With the addition of "pg_sync_replication_slots()", there is now a use-case for
> including "dbname" in "primary_conninfo" and the docs have changed from
> stating [1]:
> 
>   Do not specify a database name in the primary_conninfo string.
> 
> to [2]:
> 
>   For replication slot synchronization (see Section 48.2.3), it is also
>   necessary to specify a valid dbname in the primary_conninfo string. This will
>   only be used for slot synchronization. It is ignored for streaming.
> 
> [1]
> https://www.postgresql.org/docs/16/runtime-config-replication.html#RUNTIME
> -CONFIG-REPLICATION-STANDBY
> [2]
> https://www.postgresql.org/docs/devel/runtime-config-replication.html#RUNTI
> ME-CONFIG-REPLICATION-STANDBY
> 
> However, when setting up a standby (with the intent of creating a logical
> standby) with pg_basebackup, providing the -R/-write-recovery-conf option
> results in a "primary_conninfo" string being generated without a "dbname"
> parameter, which requires a manual change should one intend to use
> "pg_sync_replication_slots()".
> 
> I can't see any reason for continuing to omit "dbname", so suggest it should
> only continue to be omitted for 16 and earlier; see attached patch.

Hmm, I also cannot find a reason, but we can document the change.

> Note that this does mean that if the conninfo string provided to pg_basebackup
> does not contain "dbname", the generated "primary_conninfo" GUC will default to
> "dbname=replication". It would be easy enough to suppress this, but AFAICS
> there's no way to tell if it was explicitly supplied by the user, in which case
> it would be surprising if it were omitted from the final "primary_conninfo"
> string.

I found an inconsistency. When I ran ` pg_basebackup -D data_N2 -U postgres -R`,
dbname would be set as username.

```
primary_conninfo = 'user=postgres ... dbname=postgres
```

However, when I ran `pg_basebackup -D data_N2 -d "user=postgres" -R`,
dbname would be set as "replication". Is it an intentional item?

```
primary_conninfo = 'user=postgres ... dbname=replication...
```

> The only other place where GenerateRecoveryConfig() is called is pg_rewind;
> I can't see any adverse affects from the proposed change. But it's perfectly
> possible there's something blindingly obvious I'm overlooking.

On-going feature pg_createsubscriber[1] also uses GenerateRecoveryConfig(), but
I can't see any bad effects. The output is being used to make consistent standby
from the primary. Even if dbname is set in the primary_conninfo, it would be ignored.

[1]: https://commitfest.postgresql.org/47/4637/

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
Robert Haas
Дата:
On Tue, Feb 20, 2024 at 4:18 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
> I found an inconsistency. When I ran ` pg_basebackup -D data_N2 -U postgres -R`,
> dbname would be set as username.
>
> ```
> primary_conninfo = 'user=postgres ... dbname=postgres
> ```
>
> However, when I ran `pg_basebackup -D data_N2 -d "user=postgres" -R`,
> dbname would be set as "replication". Is it an intentional item?
>
> ```
> primary_conninfo = 'user=postgres ... dbname=replication...
> ```

Seems weird to me. You don't use dbname=replication to ask for a
replication connection, so why would we ever end up with that
anywhere? And especially in only one of two such closely related
cases?

--
Robert Haas
EDB: http://www.enterprisedb.com



RE: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Robert,

> Seems weird to me. You don't use dbname=replication to ask for a
> replication connection, so why would we ever end up with that
> anywhere? And especially in only one of two such closely related
> cases?

Just FYI - here is an extreme case. And note that I have applied proposed patch.

When `pg_basebackup -D data_N2 -R` is used:
```
primary_conninfo = 'user=hayato ... dbname=hayato ...
```

But when `pg_basebackup -d "" -D data_N2 -R` is used:
```
primary_conninfo = 'user=hayato ... dbname=replication
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
Robert Haas
Дата:
On Tue, Feb 20, 2024 at 5:58 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
> > Seems weird to me. You don't use dbname=replication to ask for a
> > replication connection, so why would we ever end up with that
> > anywhere? And especially in only one of two such closely related
> > cases?
>
> Just FYI - here is an extreme case. And note that I have applied proposed patch.
>
> When `pg_basebackup -D data_N2 -R` is used:
> ```
> primary_conninfo = 'user=hayato ... dbname=hayato ...
> ```
>
> But when `pg_basebackup -d "" -D data_N2 -R` is used:
> ```
> primary_conninfo = 'user=hayato ... dbname=replication
> ```

It seems like maybe somebody should look into why this is happening,
and perhaps fix it.

--
Robert Haas
EDB: http://www.enterprisedb.com



RE: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Robert,

> > Just FYI - here is an extreme case. And note that I have applied proposed patch.
> >
> > When `pg_basebackup -D data_N2 -R` is used:
> > ```
> > primary_conninfo = 'user=hayato ... dbname=hayato ...
> > ```
> >
> > But when `pg_basebackup -d "" -D data_N2 -R` is used:
> > ```
> > primary_conninfo = 'user=hayato ... dbname=replication
> > ```
> 
> It seems like maybe somebody should look into why this is happening,
> and perhaps fix it.

I think this caused from below part [1] in GetConnection().

If both dbname and connection_string are the NULL, we will enter the else part
and NULL would be substituted - {"dbnmae", NULL} key-value pair is generated
only here.

Then, in PQconnectdbParams()->PQconnectStartParams->pqConnectOptions2(),
the strange part would be found and replaced to the username [2].

I think if both the connection string and the dbname are NULL, it should be
considered as the physical replication connection. here is a patch to fix it.
After the application, below two examples can output "dbname=replication".
You can also confirm.

```
pg_basebackup -D data_N2 -U postgres
pg_basebackup -D data_N2 -R -v

-> primary_conninfo = 'user=postgres ... dbname=replication ...
```

[1]
```
    else
    {
        keywords = pg_malloc0((argcount + 1) * sizeof(*keywords));
        values = pg_malloc0((argcount + 1) * sizeof(*values));
        keywords[i] = "dbname";
        values[i] = dbname;
        i++;
    }
```

[2]
```
    /*
     * If database name was not given, default it to equal user name
     */
    if (conn->dbName == NULL || conn->dbName[0] == '\0')
    {
        free(conn->dbName);
        conn->dbName = strdup(conn->pguser);
        if (!conn->dbName)
            goto oom_error;
    }
```


Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/


Вложения

Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
Kyotaro Horiguchi
Дата:
At Tue, 20 Feb 2024 19:56:10 +0530, Robert Haas <robertmhaas@gmail.com> wrote in 
> It seems like maybe somebody should look into why this is happening,
> and perhaps fix it.

GetConnection()@streamutil.c wants to ensure conninfo has a fallback
database name ("replication"). However, the function seems to be
ignoring the case where neither dbname nor connection string is given,
which is the first case Kuroda-san raised. The second case is the
intended behavior of the function.

>    /* pg_recvlogical uses dbname only; others use connection_string only. */
>    Assert(dbname == NULL || connection_string == NULL);

And the function incorrectly assumes that the connection string
requires "dbname=replication".

>     * Merge the connection info inputs given in form of connection string,
>     * options and default values (dbname=replication, replication=true, etc.)

But the name is a pseudo database name only used by pg_hba.conf
(maybe) , which cannot be used as an actual database name (without
quotation marks, or unless it is actually created). The function
should not add the fallback database name because the connection
string for physical replication connection doesn't require the dbname
parameter. (attached patch)

About the proposed patch, pg_basebackup cannot verify the validity of
the dbname. It could be problematic.

Although I haven't looked the original thread, it seems that the
dbname is used only by pg_sync_replication_slots(). If it is true,
couldn't we make the SQL function require a database name to make a
connection, instead of requiring it in physical-replication conninfo?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Вложения

Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
Ajin Cherian
Дата:


On Wed, Feb 21, 2024 at 2:04 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

About the proposed patch, pg_basebackup cannot verify the validity of
the dbname. It could be problematic.

Although I haven't looked the original thread, it seems that the
dbname is used only by pg_sync_replication_slots(). If it is true,
couldn't we make the SQL function require a database name to make a
connection, instead of requiring it in physical-replication conninfo?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

I agree. If the intention is to meet the new requirement of the sync-slot patch which requires a dbname in the primary_conninfo, then pseudo dbnames will not work, whether it be the username or just "replication". I feel if the user does not specify dbname explicitly in pg_basebackup it should be left blank in the generated primary_conninfo string as well.

regards,
Ajin Cherian
Fujitsu Australia

Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
Ajin Cherian
Дата:


On Wed, Feb 21, 2024 at 2:04 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

Although I haven't looked the original thread, it seems that the
dbname is used only by pg_sync_replication_slots(). If it is true,
couldn't we make the SQL function require a database name to make a
connection, instead of requiring it in physical-replication conninfo?



In the original thread, the intention is to not just provide this functionality  using the function pg_sync_replication_slots(), but provide a GUC option on standbys to sync logical replication slots periodically even without calling that function. This requires connecting to a database.

regards,
Ajin Cherian
Fujitsu Australia

RE: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Horiguchi-san,

> GetConnection()@streamutil.c wants to ensure conninfo has a fallback
> database name ("replication"). However, the function seems to be
> ignoring the case where neither dbname nor connection string is given,
> which is the first case Kuroda-san raised. The second case is the
> intended behavior of the function.
>
> >    /* pg_recvlogical uses dbname only; others use connection_string only.
> */
> >    Assert(dbname == NULL || connection_string == NULL);
>
> And the function incorrectly assumes that the connection string
> requires "dbname=replication".
>
> >     * Merge the connection info inputs given in form of connection string,
> >     * options and default values (dbname=replication, replication=true,
> etc.)
>
> But the name is a pseudo database name only used by pg_hba.conf
> (maybe) , which cannot be used as an actual database name (without
> quotation marks, or unless it is actually created). The function
> should not add the fallback database name because the connection
> string for physical replication connection doesn't require the dbname
> parameter. (attached patch)

I was also missing, but the requirement was that the dbname should be included
only when the dbname option was explicitly specified [1]. Even mine and yours
cannot handle like that. Libpq function PQconnectdbParams()->pqConnectOptions2()
fills all the parameter to PGconn, at that time the information whether it is
intentionally specified or not is discarded. Then, GenerateRecoveryConfig() would
just write down all the connection parameters from PGconn, they cannot recognize.

One approach is that based on Horiguchi-san's one and initial patch, we can
avoid writing when the dbname is same as the username.

[1]: https://www.postgresql.org/message-id/CAA4eK1KH1d1J5giPMZVOtMe0iqncf1CpNwkBKoYAmXdC-kEGZg%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/




Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
Ajin Cherian
Дата:


On Wed, Feb 21, 2024 at 4:09 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote:
Dear Horiguchi-san,

> GetConnection()@streamutil.c wants to ensure conninfo has a fallback
> database name ("replication"). However, the function seems to be
> ignoring the case where neither dbname nor connection string is given,
> which is the first case Kuroda-san raised. The second case is the
> intended behavior of the function.
>
> >     /* pg_recvlogical uses dbname only; others use connection_string only.
> */
> >     Assert(dbname == NULL || connection_string == NULL);
>
> And the function incorrectly assumes that the connection string
> requires "dbname=replication".
>
> >      * Merge the connection info inputs given in form of connection string,
> >      * options and default values (dbname=replication, replication=true,
> etc.)
>
> But the name is a pseudo database name only used by pg_hba.conf
> (maybe) , which cannot be used as an actual database name (without
> quotation marks, or unless it is actually created). The function
> should not add the fallback database name because the connection
> string for physical replication connection doesn't require the dbname
> parameter. (attached patch)

I was also missing, but the requirement was that the dbname should be included
only when the dbname option was explicitly specified [1]. Even mine and yours
cannot handle like that. Libpq function PQconnectdbParams()->pqConnectOptions2()
fills all the parameter to PGconn, at that time the information whether it is
intentionally specified or not is discarded. Then, GenerateRecoveryConfig() would
just write down all the connection parameters from PGconn, they cannot recognize.

Well, one option is that if a default dbname should be written to the configuration file, then "postgres' is a better option than "replication" or "username" as the default option, at least a db of that name exists.

regards,
Ajin Cherian
Fujitsu Australia

Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
Amit Kapila
Дата:
On Wed, Feb 21, 2024 at 7:46 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > > Just FYI - here is an extreme case. And note that I have applied proposed patch.
> > >
> > > When `pg_basebackup -D data_N2 -R` is used:
> > > ```
> > > primary_conninfo = 'user=hayato ... dbname=hayato ...
> > > ```
> > >
> > > But when `pg_basebackup -d "" -D data_N2 -R` is used:
> > > ```
> > > primary_conninfo = 'user=hayato ... dbname=replication
> > > ```
> >
> > It seems like maybe somebody should look into why this is happening,
> > and perhaps fix it.
>
> I think this caused from below part [1] in GetConnection().
>
> If both dbname and connection_string are the NULL, we will enter the else part
> and NULL would be substituted - {"dbnmae", NULL} key-value pair is generated
> only here.
>
> Then, in PQconnectdbParams()->PQconnectStartParams->pqConnectOptions2(),
> the strange part would be found and replaced to the username [2].
>
> I think if both the connection string and the dbname are NULL, it should be
> considered as the physical replication connection. here is a patch to fix it.
>

When dbname is NULL or not given, it defaults to username. This
follows the specs of the connection string. See (dbname #
The database name. Defaults to be the same as the user name...) [1].
Your patch breaks that specs, so I don't think it is correct.

[1] - https://www.postgresql.org/docs/devel/libpq-connect.html#LIBPQ-CONNSTRING

--
With Regards,
Amit Kapila.



Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
Amit Kapila
Дата:
On Wed, Feb 21, 2024 at 8:34 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Tue, 20 Feb 2024 19:56:10 +0530, Robert Haas <robertmhaas@gmail.com> wrote in
> > It seems like maybe somebody should look into why this is happening,
> > and perhaps fix it.
>
> GetConnection()@streamutil.c wants to ensure conninfo has a fallback
> database name ("replication"). However, the function seems to be
> ignoring the case where neither dbname nor connection string is given,
> which is the first case Kuroda-san raised. The second case is the
> intended behavior of the function.
>
> >       /* pg_recvlogical uses dbname only; others use connection_string only. */
> >       Assert(dbname == NULL || connection_string == NULL);
>
> And the function incorrectly assumes that the connection string
> requires "dbname=replication".
>
> >        * Merge the connection info inputs given in form of connection string,
> >        * options and default values (dbname=replication, replication=true, etc.)
>
> But the name is a pseudo database name only used by pg_hba.conf
> (maybe) , which cannot be used as an actual database name (without
> quotation marks, or unless it is actually created). The function
> should not add the fallback database name because the connection
> string for physical replication connection doesn't require the dbname
> parameter. (attached patch)
>

We do append dbname=replication even in libpqrcv_connect for .pgpass
lookup as mentioned in comments. See below:
libpqrcv_connect()
{
....
else
{
/*
* The database name is ignored by the server in replication mode,
* but specify "replication" for .pgpass lookup.
*/
keys[++i] = "dbname";
vals[i] = "replication";
}
...
}

I think as part of this effort we should just add dbname to
primary_conninfo written in postgresql.auto.conf file. As above, the
question is valid whether we should do it just for 17 or for prior
versions. Let's discuss more on that. I am not sure of the use case
for versions before 17 but commit cca97ce6a665 mentioned that some
middleware or proxies might however need to know the dbname to make
the correct routing decision for the connection. Does that apply here
as well? If so, we can do it and update the docs, otherwise, let's do
it just for 17.

--
With Regards,
Amit Kapila.



RE: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Amit,

> When dbname is NULL or not given, it defaults to username. This
> follows the specs of the connection string. See (dbname #
> The database name. Defaults to be the same as the user name...) [1].
> Your patch breaks that specs, so I don't think it is correct.

I have proposed the point because I thought pg_basebackup basically wanted to do
a physical replication. But if the general libpq rule is stronger than it, we
should not apply my add_NULL_check.txt. Let's forget it.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


RE: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Amit,

> We do append dbname=replication even in libpqrcv_connect for .pgpass
> lookup as mentioned in comments. See below:
> libpqrcv_connect()
> {
> ....
> else
> {
> /*
> * The database name is ignored by the server in replication mode,
> * but specify "replication" for .pgpass lookup.
> */
> keys[++i] = "dbname";
> vals[i] = "replication";
> }
> ...
> }

OK. So we must add the value for the authorization, right?
I think it should be described even in GetConnection().

> I think as part of this effort we should just add dbname to
> primary_conninfo written in postgresql.auto.conf file. As above, the
> question is valid whether we should do it just for 17 or for prior
> versions. Let's discuss more on that. I am not sure of the use case
> for versions before 17 but commit cca97ce6a665 mentioned that some
> middleware or proxies might however need to know the dbname to make
> the correct routing decision for the connection. Does that apply here
> as well? If so, we can do it and update the docs, otherwise, let's do
> it just for 17.

Hmm, I might lose your requirements again. So, we must satisfy all of below
ones, right?
1) add {"dbname", "replication"} key-value pair to look up .pgpass file correctly.
2) If the -R is given, output dbname=xxx value to be used by slotsync worker.
3) If the dbname is not given in the connection string, the same string as
   username must be used to keep the libpq connection rule.
4) No need to add dbname=replication pair 

PSA the patch for implementing it. It is basically same as Ian's one.
However, this patch still cannot satisfy the condition 3).

`pg_basebackup -D data_N2 -d "user=postgres" -R`
-> dbname would not be appeared in primary_conninfo.

This is because `if (connection_string)` case in GetConnection() explicy override
a dbname to "replication". I've tried to add a dummy entry {"dbname", NULL} pair
before the overriding, but it is no-op. Because The replacement of the dbname in
pqConnectOptions2() would be done only for the valid (=lastly specified)
connection options.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 

Вложения

RE: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
"Hayato Kuroda (Fujitsu)"
Дата:
> PSA the patch for implementing it. It is basically same as Ian's one.
> However, this patch still cannot satisfy the condition 3).
> 
> `pg_basebackup -D data_N2 -d "user=postgres" -R`
> -> dbname would not be appeared in primary_conninfo.
> 
> This is because `if (connection_string)` case in GetConnection() explicy override
> a dbname to "replication". I've tried to add a dummy entry {"dbname", NULL} pair
> before the overriding, but it is no-op. Because The replacement of the dbname in
> pqConnectOptions2() would be done only for the valid (=lastly specified)
> connection options.

Oh, this patch missed the straightforward case:

pg_basebackup -D data_N2 -d "user=postgres dbname=replication" -R
-> dbname would not be appeared in primary_conninfo.

So I think it cannot be applied as-is. Sorry for sharing the bad item.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
Amit Kapila
Дата:
On Tue, Feb 27, 2024 at 2:00 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > We do append dbname=replication even in libpqrcv_connect for .pgpass
> > lookup as mentioned in comments. See below:
> > libpqrcv_connect()
> > {
> > ....
> > else
> > {
> > /*
> > * The database name is ignored by the server in replication mode,
> > * but specify "replication" for .pgpass lookup.
> > */
> > keys[++i] = "dbname";
> > vals[i] = "replication";
> > }
> > ...
> > }
>
> OK. So we must add the value for the authorization, right?
> I think it should be described even in GetConnection().
>
> > I think as part of this effort we should just add dbname to
> > primary_conninfo written in postgresql.auto.conf file. As above, the
> > question is valid whether we should do it just for 17 or for prior
> > versions. Let's discuss more on that. I am not sure of the use case
> > for versions before 17 but commit cca97ce6a665 mentioned that some
> > middleware or proxies might however need to know the dbname to make
> > the correct routing decision for the connection. Does that apply here
> > as well? If so, we can do it and update the docs, otherwise, let's do
> > it just for 17.
>
> Hmm, I might lose your requirements again. So, we must satisfy all of below
> ones, right?
> 1) add {"dbname", "replication"} key-value pair to look up .pgpass file correctly.
> 2) If the -R is given, output dbname=xxx value to be used by slotsync worker.
> 3) If the dbname is not given in the connection string, the same string as
>    username must be used to keep the libpq connection rule.
> 4) No need to add dbname=replication pair
>

Point 1) and 4) seems contradictory or maybe I am missing something.

--
With Regards,
Amit Kapila.



Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
Amit Kapila
Дата:
On Tue, Feb 27, 2024 at 2:07 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > PSA the patch for implementing it. It is basically same as Ian's one.
> > However, this patch still cannot satisfy the condition 3).
> >
> > `pg_basebackup -D data_N2 -d "user=postgres" -R`
> > -> dbname would not be appeared in primary_conninfo.
> >
> > This is because `if (connection_string)` case in GetConnection() explicy override
> > a dbname to "replication". I've tried to add a dummy entry {"dbname", NULL} pair
> > before the overriding, but it is no-op. Because The replacement of the dbname in
> > pqConnectOptions2() would be done only for the valid (=lastly specified)
> > connection options.
>
> Oh, this patch missed the straightforward case:
>
> pg_basebackup -D data_N2 -d "user=postgres dbname=replication" -R
> -> dbname would not be appeared in primary_conninfo.
>
> So I think it cannot be applied as-is. Sorry for sharing the bad item.
>

Can you please share the patch that can be considered for review?

--
With Regards,
Amit Kapila.



Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
vignesh C
Дата:
On Mon, 11 Mar 2024 at 17:16, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Feb 27, 2024 at 2:07 PM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > > PSA the patch for implementing it. It is basically same as Ian's one.
> > > However, this patch still cannot satisfy the condition 3).
> > >
> > > `pg_basebackup -D data_N2 -d "user=postgres" -R`
> > > -> dbname would not be appeared in primary_conninfo.
> > >
> > > This is because `if (connection_string)` case in GetConnection() explicy override
> > > a dbname to "replication". I've tried to add a dummy entry {"dbname", NULL} pair
> > > before the overriding, but it is no-op. Because The replacement of the dbname in
> > > pqConnectOptions2() would be done only for the valid (=lastly specified)
> > > connection options.
> >
> > Oh, this patch missed the straightforward case:
> >
> > pg_basebackup -D data_N2 -d "user=postgres dbname=replication" -R
> > -> dbname would not be appeared in primary_conninfo.
> >
> > So I think it cannot be applied as-is. Sorry for sharing the bad item.
> >
>
> Can you please share the patch that can be considered for review?

Here is a patch with few changes: a) Removed the version check based
on discussion from [1] b) updated the documentation.
I have tested various scenarios with the attached patch, the dbname
that is written in postgresql.auto.conf for each of the cases with
pg_basebackup is given below:
1) ./pg_basebackup -U vignesh -R
-> primary_conninfo = "dbname=vignesh" (In this case primary_conninfo
will have dbname as username specified)
2) ./pg_basebackup -D data -R
-> primary_conninfo = "dbname=vignesh" (In this case primary_conninfo
will have the dbname as username (which is the same as the OS user
while setting defaults))
3) ./pg_basebackup -d "user=vignesh"  -D data -R
-> primary_conninfo = "dbname=replication"  (In this case
primary_conninfo will have dbname as replication which is the default
value from GetConnection as connection string is specified)
4) ./pg_basebackup -d "user=vignesh dbname=postgres"  -D data -R
-> primary_conninfo = "dbname=postgres" (In this case primary_conninfo
will have the dbname as the dbname specified)
5) ./pg_basebackup -d ""  -D data -R
-> primary_conninfo = "dbname=replication" (In this case
primary_conninfo will have dbname as replication which is the default
value from GetConnection as connection string is specified)

I have mentioned the reasons as to why dbname is being set to
replication or username in each of the cases. How about replacing
these values in postgresql.auto.conf manually.

[1] - https://www.postgresql.org/message-id/CAGECzQTh9oB3nu98DsHMpRaVaqXPDRgTDEikY82OAKYF0%3DhVMA%40mail.gmail.com

Regards,
Vignesh

Вложения

Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
Amit Kapila
Дата:
On Tue, Mar 12, 2024 at 5:13 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Mon, 11 Mar 2024 at 17:16, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Feb 27, 2024 at 2:07 PM Hayato Kuroda (Fujitsu)
> > <kuroda.hayato@fujitsu.com> wrote:
> > >
> > > > PSA the patch for implementing it. It is basically same as Ian's one.
> > > > However, this patch still cannot satisfy the condition 3).
> > > >
> > > > `pg_basebackup -D data_N2 -d "user=postgres" -R`
> > > > -> dbname would not be appeared in primary_conninfo.
> > > >
> > > > This is because `if (connection_string)` case in GetConnection() explicy override
> > > > a dbname to "replication". I've tried to add a dummy entry {"dbname", NULL} pair
> > > > before the overriding, but it is no-op. Because The replacement of the dbname in
> > > > pqConnectOptions2() would be done only for the valid (=lastly specified)
> > > > connection options.
> > >
> > > Oh, this patch missed the straightforward case:
> > >
> > > pg_basebackup -D data_N2 -d "user=postgres dbname=replication" -R
> > > -> dbname would not be appeared in primary_conninfo.
> > >
> > > So I think it cannot be applied as-is. Sorry for sharing the bad item.
> > >
> >
> > Can you please share the patch that can be considered for review?
>
> Here is a patch with few changes: a) Removed the version check based
> on discussion from [1] b) updated the documentation.
> I have tested various scenarios with the attached patch, the dbname
> that is written in postgresql.auto.conf for each of the cases with
> pg_basebackup is given below:
> 1) ./pg_basebackup -U vignesh -R
> -> primary_conninfo = "dbname=vignesh" (In this case primary_conninfo
> will have dbname as username specified)
> 2) ./pg_basebackup -D data -R
> -> primary_conninfo = "dbname=vignesh" (In this case primary_conninfo
> will have the dbname as username (which is the same as the OS user
> while setting defaults))
> 3) ./pg_basebackup -d "user=vignesh"  -D data -R
> -> primary_conninfo = "dbname=replication"  (In this case
> primary_conninfo will have dbname as replication which is the default
> value from GetConnection as connection string is specified)
> 4) ./pg_basebackup -d "user=vignesh dbname=postgres"  -D data -R
> -> primary_conninfo = "dbname=postgres" (In this case primary_conninfo
> will have the dbname as the dbname specified)
> 5) ./pg_basebackup -d ""  -D data -R
> -> primary_conninfo = "dbname=replication" (In this case
> primary_conninfo will have dbname as replication which is the default
> value from GetConnection as connection string is specified)
>
> I have mentioned the reasons as to why dbname is being set to
> replication or username in each of the cases.
>

IIUC, the dbname is already allowed in connstring for pg_basebackup by
commit cca97ce6a6 and with this patch we are just writing it in
postgresql.auto.conf if -R option is used to write recovery info. This
will help slot syncworker to automatically connect with database
without user manually specifying the dbname and replication
connection, the same will still be ignored. I don't see any problem
with this. Does anyone else see any problem?

The <filename>postgresql.auto.conf</filename> file will record the connection
         settings and, if specified, the replication slot
         that <application>pg_basebackup</application> is using, so that
-        streaming replication will use the same settings later on.
+        a) synchronization of logical replication slots on the primary to the
+        hot_standby from <link linkend="pg-sync-replication-slots">
+        <function>pg_sync_replication_slots</function></link> or slot
sync worker
+        and b) streaming replication will use the same settings later on.

We can simply modify the last line as: ".. streaming replication or
logical replication slots synchronization will use the same settings
later on." Additionally, you can give a link reference to [1].

[1] -
https://www.postgresql.org/docs/devel/logicaldecoding-explanation.html#LOGICALDECODING-REPLICATION-SLOTS-SYNCHRONIZATION
--
With Regards,
Amit Kapila.



Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
vignesh C
Дата:
On Wed, 13 Mar 2024 at 16:58, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Mar 12, 2024 at 5:13 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Mon, 11 Mar 2024 at 17:16, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Feb 27, 2024 at 2:07 PM Hayato Kuroda (Fujitsu)
> > > <kuroda.hayato@fujitsu.com> wrote:
> > > >
> > > > > PSA the patch for implementing it. It is basically same as Ian's one.
> > > > > However, this patch still cannot satisfy the condition 3).
> > > > >
> > > > > `pg_basebackup -D data_N2 -d "user=postgres" -R`
> > > > > -> dbname would not be appeared in primary_conninfo.
> > > > >
> > > > > This is because `if (connection_string)` case in GetConnection() explicy override
> > > > > a dbname to "replication". I've tried to add a dummy entry {"dbname", NULL} pair
> > > > > before the overriding, but it is no-op. Because The replacement of the dbname in
> > > > > pqConnectOptions2() would be done only for the valid (=lastly specified)
> > > > > connection options.
> > > >
> > > > Oh, this patch missed the straightforward case:
> > > >
> > > > pg_basebackup -D data_N2 -d "user=postgres dbname=replication" -R
> > > > -> dbname would not be appeared in primary_conninfo.
> > > >
> > > > So I think it cannot be applied as-is. Sorry for sharing the bad item.
> > > >
> > >
> > > Can you please share the patch that can be considered for review?
> >
> > Here is a patch with few changes: a) Removed the version check based
> > on discussion from [1] b) updated the documentation.
> > I have tested various scenarios with the attached patch, the dbname
> > that is written in postgresql.auto.conf for each of the cases with
> > pg_basebackup is given below:
> > 1) ./pg_basebackup -U vignesh -R
> > -> primary_conninfo = "dbname=vignesh" (In this case primary_conninfo
> > will have dbname as username specified)
> > 2) ./pg_basebackup -D data -R
> > -> primary_conninfo = "dbname=vignesh" (In this case primary_conninfo
> > will have the dbname as username (which is the same as the OS user
> > while setting defaults))
> > 3) ./pg_basebackup -d "user=vignesh"  -D data -R
> > -> primary_conninfo = "dbname=replication"  (In this case
> > primary_conninfo will have dbname as replication which is the default
> > value from GetConnection as connection string is specified)
> > 4) ./pg_basebackup -d "user=vignesh dbname=postgres"  -D data -R
> > -> primary_conninfo = "dbname=postgres" (In this case primary_conninfo
> > will have the dbname as the dbname specified)
> > 5) ./pg_basebackup -d ""  -D data -R
> > -> primary_conninfo = "dbname=replication" (In this case
> > primary_conninfo will have dbname as replication which is the default
> > value from GetConnection as connection string is specified)
> >
> > I have mentioned the reasons as to why dbname is being set to
> > replication or username in each of the cases.
> >
>
> IIUC, the dbname is already allowed in connstring for pg_basebackup by
> commit cca97ce6a6 and with this patch we are just writing it in
> postgresql.auto.conf if -R option is used to write recovery info. This
> will help slot syncworker to automatically connect with database
> without user manually specifying the dbname and replication
> connection, the same will still be ignored. I don't see any problem
> with this. Does anyone else see any problem?
>
> The <filename>postgresql.auto.conf</filename> file will record the connection
>          settings and, if specified, the replication slot
>          that <application>pg_basebackup</application> is using, so that
> -        streaming replication will use the same settings later on.
> +        a) synchronization of logical replication slots on the primary to the
> +        hot_standby from <link linkend="pg-sync-replication-slots">
> +        <function>pg_sync_replication_slots</function></link> or slot
> sync worker
> +        and b) streaming replication will use the same settings later on.
>
> We can simply modify the last line as: ".. streaming replication or
> logical replication slots synchronization will use the same settings
> later on." Additionally, you can give a link reference to [1].

Thanks for the comments, the attached v4 version patch has the changes
for the same.

Regards,
Vignesh

Вложения

Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
Masahiko Sawada
Дата:
On Fri, Feb 23, 2024 at 3:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Feb 21, 2024 at 7:46 AM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > > > Just FYI - here is an extreme case. And note that I have applied proposed patch.
> > > >
> > > > When `pg_basebackup -D data_N2 -R` is used:
> > > > ```
> > > > primary_conninfo = 'user=hayato ... dbname=hayato ...
> > > > ```
> > > >
> > > > But when `pg_basebackup -d "" -D data_N2 -R` is used:
> > > > ```
> > > > primary_conninfo = 'user=hayato ... dbname=replication
> > > > ```
> > >
> > > It seems like maybe somebody should look into why this is happening,
> > > and perhaps fix it.
> >
> > I think this caused from below part [1] in GetConnection().
> >
> > If both dbname and connection_string are the NULL, we will enter the else part
> > and NULL would be substituted - {"dbnmae", NULL} key-value pair is generated
> > only here.
> >
> > Then, in PQconnectdbParams()->PQconnectStartParams->pqConnectOptions2(),
> > the strange part would be found and replaced to the username [2].
> >
> > I think if both the connection string and the dbname are NULL, it should be
> > considered as the physical replication connection. here is a patch to fix it.
> >
>
> When dbname is NULL or not given, it defaults to username. This
> follows the specs of the connection string.

This fact makes me think that the slotsync worker might be able to
accept the primary_conninfo value even if there is no dbname in the
value. That is, if there is no dbname in the primary_conninfo, it uses
the username in accordance with the specs of the connection string.
Currently, the slotsync worker connects to the local database first
and then establishes the connection to the primary server. But if we
can reverse the two steps, it can get the dbname that has actually
been used to establish the remote connection and use it for the local
connection too. That way, the primary_conninfo generated by
pg_basebackup could work even without the patch. For example, if the
OS user executing pg_basebackup is 'postgres', the slotsync worker
would connect to the postgres database. Given the 'postgres' database
is created by default and 'postgres' OS user is used in common, I
guess it could cover many cases in practice actually.

Having said that, even with (or without) the above change, we might
want to change the pg_basebackup so that it writes the dbname to the
primary_conninfo if -R option is specified. Since the database where
the slotsync worker connects cannot be dropped while the slotsync
worker is running, the user might want to change the database to
connect, and it would be useful if they can do that using
pg_basebackup instead of modifying the configuration file manually.

While the current approach makes sense to me, I'm a bit concerned that
we might end up having the pg_basebackup search the actual database
name (e.g. 'dbname=template1') from the .pgpass file instead of
'dbname=replication'. As far as I tested on my environment, suppose
that I execute:

pg_basebackup -D tmp -d "dbname=testdb" -R

The pg_basebackup established a replication connection but looked for
the password of the 'testdb' database. This could be another
inconvenience for the existing users who want to use the slot
synchronization.

A random idea I came up with is, we add a new option to the
pg_basebackup to overwrite the full or some portion of the connection
string that is eventually written in the primary_conninfo in
postgresql.auto.conf. For example, the command:

pg_basebackup -D tmp -d "host=1.1.1.1 port=5555" -R
--primary-coninfo-ext "host=2.2.2.2 dbname=postgres"

will produce the connection string that is based on -d option value
but is overwritten by --primary-conninfo-ext option value, which will
be like:

host=2.2.2.2 dbname=postgres port=5555

This option might help not only for users who want to use the slotsync
worker but also for users who want to take a basebackup from a standby
but have the new standby connect to the primary.

But it's still just an idea and I might be missing something. And
given we're getting closer to the feature freeze, it would be a PG18
item.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
Amit Kapila
Дата:
On Thu, Mar 14, 2024 at 5:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> This fact makes me think that the slotsync worker might be able to
> accept the primary_conninfo value even if there is no dbname in the
> value. That is, if there is no dbname in the primary_conninfo, it uses
> the username in accordance with the specs of the connection string.
> Currently, the slotsync worker connects to the local database first
> and then establishes the connection to the primary server. But if we
> can reverse the two steps, it can get the dbname that has actually
> been used to establish the remote connection and use it for the local
> connection too. That way, the primary_conninfo generated by
> pg_basebackup could work even without the patch. For example, if the
> OS user executing pg_basebackup is 'postgres', the slotsync worker
> would connect to the postgres database. Given the 'postgres' database
> is created by default and 'postgres' OS user is used in common, I
> guess it could cover many cases in practice actually.
>

I think this is worth investigating but I suspect that in most cases
users will end up using a replication connection without specifying
the user name and we may not be able to give a meaningful error
message when slotsync worker won't be able to connect. The same will
be true even when the dbname same as the username would be used.

> Having said that, even with (or without) the above change, we might
> want to change the pg_basebackup so that it writes the dbname to the
> primary_conninfo if -R option is specified. Since the database where
> the slotsync worker connects cannot be dropped while the slotsync
> worker is running, the user might want to change the database to
> connect, and it would be useful if they can do that using
> pg_basebackup instead of modifying the configuration file manually.
>
> While the current approach makes sense to me, I'm a bit concerned that
> we might end up having the pg_basebackup search the actual database
> name (e.g. 'dbname=template1') from the .pgpass file instead of
> 'dbname=replication'. As far as I tested on my environment, suppose
> that I execute:
>
> pg_basebackup -D tmp -d "dbname=testdb" -R
>
> The pg_basebackup established a replication connection but looked for
> the password of the 'testdb' database. This could be another
> inconvenience for the existing users who want to use the slot
> synchronization.
>

This is true because it is internally using logical replication
connection (as we will set set replication=database). I feel the
mentioned behavior is an expected one with or without slotsync worker
usage. Anyway, this is an optional feature, so users can still choose
to ignore specifying dbname in the connection string. The point is
that commit cca97ce6a6 allowed using dbname in the connection string
in pg_basebackup and we are just extending to write that dbname along
with other things in connection_info.

> A random idea I came up with is, we add a new option to the
> pg_basebackup to overwrite the full or some portion of the connection
> string that is eventually written in the primary_conninfo in
> postgresql.auto.conf. For example, the command:
>
> pg_basebackup -D tmp -d "host=1.1.1.1 port=5555" -R
> --primary-coninfo-ext "host=2.2.2.2 dbname=postgres"
>
> will produce the connection string that is based on -d option value
> but is overwritten by --primary-conninfo-ext option value, which will
> be like:
>
> host=2.2.2.2 dbname=postgres port=5555
>
> This option might help not only for users who want to use the slotsync
> worker but also for users who want to take a basebackup from a standby
> but have the new standby connect to the primary.
>

Agreed, this could be another way though it would be good to get some
inputs from users or otherwise about the preferred way to specify
dbname. One can also imagine using the Alter System for this purpose.

> But it's still just an idea and I might be missing something. And
> given we're getting closer to the feature freeze, it would be a PG18
> item.
>

+1. At this stage, it is important to discuss whether we should allow
pg_baseback to write dbname (either a specified one or a default one)
along with other parameters in primary_conninfo?

--
With Regards,
Amit Kapila.



Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
shveta malik
Дата:
On Thu, Mar 14, 2024 at 10:57 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Mar 14, 2024 at 5:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > This fact makes me think that the slotsync worker might be able to
> > accept the primary_conninfo value even if there is no dbname in the
> > value. That is, if there is no dbname in the primary_conninfo, it uses
> > the username in accordance with the specs of the connection string.
> > Currently, the slotsync worker connects to the local database first
> > and then establishes the connection to the primary server. But if we
> > can reverse the two steps, it can get the dbname that has actually
> > been used to establish the remote connection and use it for the local
> > connection too. That way, the primary_conninfo generated by
> > pg_basebackup could work even without the patch. For example, if the
> > OS user executing pg_basebackup is 'postgres', the slotsync worker
> > would connect to the postgres database. Given the 'postgres' database
> > is created by default and 'postgres' OS user is used in common, I
> > guess it could cover many cases in practice actually.
> >
>
> I think this is worth investigating but I suspect that in most cases
> users will end up using a replication connection without specifying
> the user name and we may not be able to give a meaningful error
> message when slotsync worker won't be able to connect. The same will
> be true even when the dbname same as the username would be used.
>

I attempted the change as suggested by Swada-San. Attached the PoC
patch .For it to work, I have to expose a new get api in libpq-fe
which gets dbname from stream-connection. Please have a look.

Without this PoC patch, the errors in slot-sync worker:

-----------------
a) If dbname is missing:
[1230932] LOG:  slot sync worker started
[1230932] ERROR:  slot synchronization requires dbname to be specified
in primary_conninfo

b) If specified db does not exist
[1230913] LOG:  slot sync worker started
[1230913] FATAL:  database "postgres1" does not exist
-----------------

Now with this patch:
-----------------
a) If the dbname same as user does not exist:
[1232473] LOG:  slot sync worker started
[1232473] ERROR:  could not connect to the primary server: connection
to server at "127.0.0.1", port 5433 failed: FATAL: database
"bckp_user" does not exist

b) If user itself is removed from primary_conninfo, libpq takes user
who has authenticated the system by default and gives error if db of
same name does not exist
ERROR:  could not connect to the primary server: connection to server
at "127.0.0.1", port 5433 failed: FATAL:  database "shveta" does not
exist
-----------------

The errors in second case look slightly confusing to me.

thanks
Shveta

Вложения

Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
Masahiko Sawada
Дата:
On Thu, Mar 14, 2024 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Mar 14, 2024 at 5:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > This fact makes me think that the slotsync worker might be able to
> > accept the primary_conninfo value even if there is no dbname in the
> > value. That is, if there is no dbname in the primary_conninfo, it uses
> > the username in accordance with the specs of the connection string.
> > Currently, the slotsync worker connects to the local database first
> > and then establishes the connection to the primary server. But if we
> > can reverse the two steps, it can get the dbname that has actually
> > been used to establish the remote connection and use it for the local
> > connection too. That way, the primary_conninfo generated by
> > pg_basebackup could work even without the patch. For example, if the
> > OS user executing pg_basebackup is 'postgres', the slotsync worker
> > would connect to the postgres database. Given the 'postgres' database
> > is created by default and 'postgres' OS user is used in common, I
> > guess it could cover many cases in practice actually.
> >
>
> I think this is worth investigating but I suspect that in most cases
> users will end up using a replication connection without specifying
> the user name and we may not be able to give a meaningful error
> message when slotsync worker won't be able to connect. The same will
> be true even when the dbname same as the username would be used.

What do you mean by not being able to give a meaningful error message?

If the slotsync worker uses the user name as the dbname, and such a
database doesn't exist, the error message the user will get is
"database "test_user" does not exist". ISTM the same is true when the
user specifies the wrong database in the primary_conninfo.

>
> > Having said that, even with (or without) the above change, we might
> > want to change the pg_basebackup so that it writes the dbname to the
> > primary_conninfo if -R option is specified. Since the database where
> > the slotsync worker connects cannot be dropped while the slotsync
> > worker is running, the user might want to change the database to
> > connect, and it would be useful if they can do that using
> > pg_basebackup instead of modifying the configuration file manually.
> >
> > While the current approach makes sense to me, I'm a bit concerned that
> > we might end up having the pg_basebackup search the actual database
> > name (e.g. 'dbname=template1') from the .pgpass file instead of
> > 'dbname=replication'. As far as I tested on my environment, suppose
> > that I execute:
> >
> > pg_basebackup -D tmp -d "dbname=testdb" -R
> >
> > The pg_basebackup established a replication connection but looked for
> > the password of the 'testdb' database. This could be another
> > inconvenience for the existing users who want to use the slot
> > synchronization.
> >
>
> This is true because it is internally using logical replication
> connection (as we will set set replication=database).

Did you mean the pg_basebackup is using a logical replication
connection in this case? As far as I tested, even if we specify dbname
to the -d option of pg_basebackup, it uses a physical replication
connection. For example, it can take a backup even if I specify a
non-existing database name.

> > A random idea I came up with is, we add a new option to the
> > pg_basebackup to overwrite the full or some portion of the connection
> > string that is eventually written in the primary_conninfo in
> > postgresql.auto.conf. For example, the command:
> >
> > pg_basebackup -D tmp -d "host=1.1.1.1 port=5555" -R
> > --primary-coninfo-ext "host=2.2.2.2 dbname=postgres"
> >
> > will produce the connection string that is based on -d option value
> > but is overwritten by --primary-conninfo-ext option value, which will
> > be like:
> >
> > host=2.2.2.2 dbname=postgres port=5555
> >
> > This option might help not only for users who want to use the slotsync
> > worker but also for users who want to take a basebackup from a standby
> > but have the new standby connect to the primary.
> >
>
> Agreed, this could be another way though it would be good to get some
> inputs from users or otherwise about the preferred way to specify
> dbname. One can also imagine using the Alter System for this purpose.

Agreed.

>
> > But it's still just an idea and I might be missing something. And
> > given we're getting closer to the feature freeze, it would be a PG18
> > item.
> >
>
> +1. At this stage, it is important to discuss whether we should allow
> pg_baseback to write dbname (either a specified one or a default one)
> along with other parameters in primary_conninfo?
>

True. While I basically agree that pg_basebackup writes dbname in
primary_conninfo, I'm concerned that writing "dbname=replication"
could be problematic. Quoting the case 3) Vignesh summarized before:

3) ./pg_basebackup -d "user=vignesh"  -D data -R
-> primary_conninfo = "dbname=replication"  (In this case
primary_conninfo will have dbname as replication which is the default
value from GetConnection as connection string is specified)

The primary_conninfo generated by pg_basebackup -R is now used by
either a walreceiver (for physical replication connection) or a
slotsync worker (for normal connection). The "dbname=replication" is
okay  for walreceiver. On the other hand, as for the slotsync worker,
it can pass the CheckAndGetDbnameFromConninfo() check but it's very
likely that it cannot connect to the primary since most users won't
create a database with "replication" name. The user will end up
getting an error message like 'database "replication" does not exist'
but I'm not sure it would be informative for users. Rather, the error
message "slot synchronization requires dbname to be specified in
primary_conninfo" might be more informative for users. So I personally
like to omit the dbname if "dbname=replication", at this point.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
Amit Kapila
Дата:
On Thu, Mar 14, 2024 at 1:45 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Mar 14, 2024 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Mar 14, 2024 at 5:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > This fact makes me think that the slotsync worker might be able to
> > > accept the primary_conninfo value even if there is no dbname in the
> > > value. That is, if there is no dbname in the primary_conninfo, it uses
> > > the username in accordance with the specs of the connection string.
> > > Currently, the slotsync worker connects to the local database first
> > > and then establishes the connection to the primary server. But if we
> > > can reverse the two steps, it can get the dbname that has actually
> > > been used to establish the remote connection and use it for the local
> > > connection too. That way, the primary_conninfo generated by
> > > pg_basebackup could work even without the patch. For example, if the
> > > OS user executing pg_basebackup is 'postgres', the slotsync worker
> > > would connect to the postgres database. Given the 'postgres' database
> > > is created by default and 'postgres' OS user is used in common, I
> > > guess it could cover many cases in practice actually.
> > >
> >
> > I think this is worth investigating but I suspect that in most cases
> > users will end up using a replication connection without specifying
> > the user name and we may not be able to give a meaningful error
> > message when slotsync worker won't be able to connect. The same will
> > be true even when the dbname same as the username would be used.
>
> What do you mean by not being able to give a meaningful error message?
>
> If the slotsync worker uses the user name as the dbname, and such a
> database doesn't exist, the error message the user will get is
> "database "test_user" does not exist". ISTM the same is true when the
> user specifies the wrong database in the primary_conninfo.
>

Right, the exact error message as mentioned by Shveta will be:
ERROR:  could not connect to the primary server: connection to server
at "127.0.0.1", port 5433 failed: FATAL: database "bckp_user" does not
exist

Now, without this idea, the ERROR message will be:
 ERROR:  slot synchronization requires dbname to be specified in
primary_conninfo

I am not sure how much this matters but the second message sounds more useful.

> >
> > > Having said that, even with (or without) the above change, we might
> > > want to change the pg_basebackup so that it writes the dbname to the
> > > primary_conninfo if -R option is specified. Since the database where
> > > the slotsync worker connects cannot be dropped while the slotsync
> > > worker is running, the user might want to change the database to
> > > connect, and it would be useful if they can do that using
> > > pg_basebackup instead of modifying the configuration file manually.
> > >
> > > While the current approach makes sense to me, I'm a bit concerned that
> > > we might end up having the pg_basebackup search the actual database
> > > name (e.g. 'dbname=template1') from the .pgpass file instead of
> > > 'dbname=replication'. As far as I tested on my environment, suppose
> > > that I execute:
> > >
> > > pg_basebackup -D tmp -d "dbname=testdb" -R
> > >
> > > The pg_basebackup established a replication connection but looked for
> > > the password of the 'testdb' database. This could be another
> > > inconvenience for the existing users who want to use the slot
> > > synchronization.
> > >
> >
> > This is true because it is internally using logical replication
> > connection (as we will set set replication=database).
>
> Did you mean the pg_basebackup is using a logical replication
> connection in this case? As far as I tested, even if we specify dbname
> to the -d option of pg_basebackup, it uses a physical replication
> connection. For example, it can take a backup even if I specify a
> non-existing database name.
>

You are right. I misunderstood some part of the code in GetConnection.
However, I think my point is still valid that if the user has provided
dbname in the connection string it means that she wants that database
entry to be looked upon not "replication" entry.

>
> >
> > > But it's still just an idea and I might be missing something. And
> > > given we're getting closer to the feature freeze, it would be a PG18
> > > item.
> > >
> >
> > +1. At this stage, it is important to discuss whether we should allow
> > pg_baseback to write dbname (either a specified one or a default one)
> > along with other parameters in primary_conninfo?
> >
>
> True. While I basically agree that pg_basebackup writes dbname in
> primary_conninfo, I'm concerned that writing "dbname=replication"
> could be problematic. Quoting the case 3) Vignesh summarized before:
>
> 3) ./pg_basebackup -d "user=vignesh"  -D data -R
> -> primary_conninfo = "dbname=replication"  (In this case
> primary_conninfo will have dbname as replication which is the default
> value from GetConnection as connection string is specified)
>
> The primary_conninfo generated by pg_basebackup -R is now used by
> either a walreceiver (for physical replication connection) or a
> slotsync worker (for normal connection). The "dbname=replication" is
> okay  for walreceiver. On the other hand, as for the slotsync worker,
> it can pass the CheckAndGetDbnameFromConninfo() check but it's very
> likely that it cannot connect to the primary since most users won't
> create a database with "replication" name. The user will end up
> getting an error message like 'database "replication" does not exist'
> but I'm not sure it would be informative for users. Rather, the error
> message "slot synchronization requires dbname to be specified in
> primary_conninfo" might be more informative for users. So I personally
> like to omit the dbname if "dbname=replication", at this point.
>

How about if we write dbname in primary_conninfo to
postgresql.auto.conf file only when the user has explicitly specified
dbname in the connection string? To achieve this we need to somehow
pass this information via PGconn (say by having a new bool variable
dbname_specified) from GetConnection() or something like that?

--
With Regards,
Amit Kapila.



Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
vignesh C
Дата:
On Thu, 14 Mar 2024 at 15:49, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Mar 14, 2024 at 1:45 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Mar 14, 2024 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Mar 14, 2024 at 5:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > This fact makes me think that the slotsync worker might be able to
> > > > accept the primary_conninfo value even if there is no dbname in the
> > > > value. That is, if there is no dbname in the primary_conninfo, it uses
> > > > the username in accordance with the specs of the connection string.
> > > > Currently, the slotsync worker connects to the local database first
> > > > and then establishes the connection to the primary server. But if we
> > > > can reverse the two steps, it can get the dbname that has actually
> > > > been used to establish the remote connection and use it for the local
> > > > connection too. That way, the primary_conninfo generated by
> > > > pg_basebackup could work even without the patch. For example, if the
> > > > OS user executing pg_basebackup is 'postgres', the slotsync worker
> > > > would connect to the postgres database. Given the 'postgres' database
> > > > is created by default and 'postgres' OS user is used in common, I
> > > > guess it could cover many cases in practice actually.
> > > >
> > >
> > > I think this is worth investigating but I suspect that in most cases
> > > users will end up using a replication connection without specifying
> > > the user name and we may not be able to give a meaningful error
> > > message when slotsync worker won't be able to connect. The same will
> > > be true even when the dbname same as the username would be used.
> >
> > What do you mean by not being able to give a meaningful error message?
> >
> > If the slotsync worker uses the user name as the dbname, and such a
> > database doesn't exist, the error message the user will get is
> > "database "test_user" does not exist". ISTM the same is true when the
> > user specifies the wrong database in the primary_conninfo.
> >
>
> Right, the exact error message as mentioned by Shveta will be:
> ERROR:  could not connect to the primary server: connection to server
> at "127.0.0.1", port 5433 failed: FATAL: database "bckp_user" does not
> exist
>
> Now, without this idea, the ERROR message will be:
>  ERROR:  slot synchronization requires dbname to be specified in
> primary_conninfo
>
> I am not sure how much this matters but the second message sounds more useful.
>
> > >
> > > > Having said that, even with (or without) the above change, we might
> > > > want to change the pg_basebackup so that it writes the dbname to the
> > > > primary_conninfo if -R option is specified. Since the database where
> > > > the slotsync worker connects cannot be dropped while the slotsync
> > > > worker is running, the user might want to change the database to
> > > > connect, and it would be useful if they can do that using
> > > > pg_basebackup instead of modifying the configuration file manually.
> > > >
> > > > While the current approach makes sense to me, I'm a bit concerned that
> > > > we might end up having the pg_basebackup search the actual database
> > > > name (e.g. 'dbname=template1') from the .pgpass file instead of
> > > > 'dbname=replication'. As far as I tested on my environment, suppose
> > > > that I execute:
> > > >
> > > > pg_basebackup -D tmp -d "dbname=testdb" -R
> > > >
> > > > The pg_basebackup established a replication connection but looked for
> > > > the password of the 'testdb' database. This could be another
> > > > inconvenience for the existing users who want to use the slot
> > > > synchronization.
> > > >
> > >
> > > This is true because it is internally using logical replication
> > > connection (as we will set set replication=database).
> >
> > Did you mean the pg_basebackup is using a logical replication
> > connection in this case? As far as I tested, even if we specify dbname
> > to the -d option of pg_basebackup, it uses a physical replication
> > connection. For example, it can take a backup even if I specify a
> > non-existing database name.
> >
>
> You are right. I misunderstood some part of the code in GetConnection.
> However, I think my point is still valid that if the user has provided
> dbname in the connection string it means that she wants that database
> entry to be looked upon not "replication" entry.
>
> >
> > >
> > > > But it's still just an idea and I might be missing something. And
> > > > given we're getting closer to the feature freeze, it would be a PG18
> > > > item.
> > > >
> > >
> > > +1. At this stage, it is important to discuss whether we should allow
> > > pg_baseback to write dbname (either a specified one or a default one)
> > > along with other parameters in primary_conninfo?
> > >
> >
> > True. While I basically agree that pg_basebackup writes dbname in
> > primary_conninfo, I'm concerned that writing "dbname=replication"
> > could be problematic. Quoting the case 3) Vignesh summarized before:
> >
> > 3) ./pg_basebackup -d "user=vignesh"  -D data -R
> > -> primary_conninfo = "dbname=replication"  (In this case
> > primary_conninfo will have dbname as replication which is the default
> > value from GetConnection as connection string is specified)
> >
> > The primary_conninfo generated by pg_basebackup -R is now used by
> > either a walreceiver (for physical replication connection) or a
> > slotsync worker (for normal connection). The "dbname=replication" is
> > okay  for walreceiver. On the other hand, as for the slotsync worker,
> > it can pass the CheckAndGetDbnameFromConninfo() check but it's very
> > likely that it cannot connect to the primary since most users won't
> > create a database with "replication" name. The user will end up
> > getting an error message like 'database "replication" does not exist'
> > but I'm not sure it would be informative for users. Rather, the error
> > message "slot synchronization requires dbname to be specified in
> > primary_conninfo" might be more informative for users. So I personally
> > like to omit the dbname if "dbname=replication", at this point.
> >
>
> How about if we write dbname in primary_conninfo to
> postgresql.auto.conf file only when the user has explicitly specified
> dbname in the connection string? To achieve this we need to somehow
> pass this information via PGconn (say by having a new bool variable
> dbname_specified) from GetConnection() or something like that?

Here is a patch which will write dbname in the primary_conninfo only
if the database name is specified explicitly. I have added a new
function GetDbnameFromConnectionString which will return the dbname
specified in the connection and GenerateRecoveryConfig will append
this database name.
Here are the test results with the patch:
case 1:
pg_basebackup -D test10 -p 5431 -X s -P  -R -d "user=vignesh dbname=postgres"
primary_conninfo will have dbname=postgres

case 2:
pg_basebackup -D test10 -p 5431 -X s -P  -R -d "user=vignesh dbname=replication"
primary_conninfo will have dbname=replication

case 3:
pg_basebackup -D test10 -p 5431 -X s -P  -R  -U vignesh
primary_conninfo will not have dbname

case 4:
pg_basebackup -D test10 -p 5431 -X s -P  -R
primary_conninfo will not have dbname

case 5:
pg_basebackup -D test10 -p 5431 -X s -P  -R -d "user=vignesh"
primary_conninfo will not have dbname

case 6:
pg_basebackup -D test10 -p 5431 -X s -P  -R  -d ""
primary_conninfo will not have dbname

Thoughts?

Regards,
Vignesh

Вложения

Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
Masahiko Sawada
Дата:
On Thu, Mar 14, 2024 at 11:46 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Thu, 14 Mar 2024 at 15:49, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Mar 14, 2024 at 1:45 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Thu, Mar 14, 2024 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Thu, Mar 14, 2024 at 5:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > > This fact makes me think that the slotsync worker might be able to
> > > > > accept the primary_conninfo value even if there is no dbname in the
> > > > > value. That is, if there is no dbname in the primary_conninfo, it uses
> > > > > the username in accordance with the specs of the connection string.
> > > > > Currently, the slotsync worker connects to the local database first
> > > > > and then establishes the connection to the primary server. But if we
> > > > > can reverse the two steps, it can get the dbname that has actually
> > > > > been used to establish the remote connection and use it for the local
> > > > > connection too. That way, the primary_conninfo generated by
> > > > > pg_basebackup could work even without the patch. For example, if the
> > > > > OS user executing pg_basebackup is 'postgres', the slotsync worker
> > > > > would connect to the postgres database. Given the 'postgres' database
> > > > > is created by default and 'postgres' OS user is used in common, I
> > > > > guess it could cover many cases in practice actually.
> > > > >
> > > >
> > > > I think this is worth investigating but I suspect that in most cases
> > > > users will end up using a replication connection without specifying
> > > > the user name and we may not be able to give a meaningful error
> > > > message when slotsync worker won't be able to connect. The same will
> > > > be true even when the dbname same as the username would be used.
> > >
> > > What do you mean by not being able to give a meaningful error message?
> > >
> > > If the slotsync worker uses the user name as the dbname, and such a
> > > database doesn't exist, the error message the user will get is
> > > "database "test_user" does not exist". ISTM the same is true when the
> > > user specifies the wrong database in the primary_conninfo.
> > >
> >
> > Right, the exact error message as mentioned by Shveta will be:
> > ERROR:  could not connect to the primary server: connection to server
> > at "127.0.0.1", port 5433 failed: FATAL: database "bckp_user" does not
> > exist
> >
> > Now, without this idea, the ERROR message will be:
> >  ERROR:  slot synchronization requires dbname to be specified in
> > primary_conninfo
> >
> > I am not sure how much this matters but the second message sounds more useful.
> >
> > > >
> > > > > Having said that, even with (or without) the above change, we might
> > > > > want to change the pg_basebackup so that it writes the dbname to the
> > > > > primary_conninfo if -R option is specified. Since the database where
> > > > > the slotsync worker connects cannot be dropped while the slotsync
> > > > > worker is running, the user might want to change the database to
> > > > > connect, and it would be useful if they can do that using
> > > > > pg_basebackup instead of modifying the configuration file manually.
> > > > >
> > > > > While the current approach makes sense to me, I'm a bit concerned that
> > > > > we might end up having the pg_basebackup search the actual database
> > > > > name (e.g. 'dbname=template1') from the .pgpass file instead of
> > > > > 'dbname=replication'. As far as I tested on my environment, suppose
> > > > > that I execute:
> > > > >
> > > > > pg_basebackup -D tmp -d "dbname=testdb" -R
> > > > >
> > > > > The pg_basebackup established a replication connection but looked for
> > > > > the password of the 'testdb' database. This could be another
> > > > > inconvenience for the existing users who want to use the slot
> > > > > synchronization.
> > > > >
> > > >
> > > > This is true because it is internally using logical replication
> > > > connection (as we will set set replication=database).
> > >
> > > Did you mean the pg_basebackup is using a logical replication
> > > connection in this case? As far as I tested, even if we specify dbname
> > > to the -d option of pg_basebackup, it uses a physical replication
> > > connection. For example, it can take a backup even if I specify a
> > > non-existing database name.
> > >
> >
> > You are right. I misunderstood some part of the code in GetConnection.
> > However, I think my point is still valid that if the user has provided
> > dbname in the connection string it means that she wants that database
> > entry to be looked upon not "replication" entry.
> >
> > >
> > > >
> > > > > But it's still just an idea and I might be missing something. And
> > > > > given we're getting closer to the feature freeze, it would be a PG18
> > > > > item.
> > > > >
> > > >
> > > > +1. At this stage, it is important to discuss whether we should allow
> > > > pg_baseback to write dbname (either a specified one or a default one)
> > > > along with other parameters in primary_conninfo?
> > > >
> > >
> > > True. While I basically agree that pg_basebackup writes dbname in
> > > primary_conninfo, I'm concerned that writing "dbname=replication"
> > > could be problematic. Quoting the case 3) Vignesh summarized before:
> > >
> > > 3) ./pg_basebackup -d "user=vignesh"  -D data -R
> > > -> primary_conninfo = "dbname=replication"  (In this case
> > > primary_conninfo will have dbname as replication which is the default
> > > value from GetConnection as connection string is specified)
> > >
> > > The primary_conninfo generated by pg_basebackup -R is now used by
> > > either a walreceiver (for physical replication connection) or a
> > > slotsync worker (for normal connection). The "dbname=replication" is
> > > okay  for walreceiver. On the other hand, as for the slotsync worker,
> > > it can pass the CheckAndGetDbnameFromConninfo() check but it's very
> > > likely that it cannot connect to the primary since most users won't
> > > create a database with "replication" name. The user will end up
> > > getting an error message like 'database "replication" does not exist'
> > > but I'm not sure it would be informative for users. Rather, the error
> > > message "slot synchronization requires dbname to be specified in
> > > primary_conninfo" might be more informative for users. So I personally
> > > like to omit the dbname if "dbname=replication", at this point.
> > >
> >
> > How about if we write dbname in primary_conninfo to
> > postgresql.auto.conf file only when the user has explicitly specified
> > dbname in the connection string? To achieve this we need to somehow
> > pass this information via PGconn (say by having a new bool variable
> > dbname_specified) from GetConnection() or something like that?
>
> Here is a patch which will write dbname in the primary_conninfo only
> if the database name is specified explicitly. I have added a new
> function GetDbnameFromConnectionString which will return the dbname
> specified in the connection and GenerateRecoveryConfig will append
> this database name.
> Here are the test results with the patch:
> case 1:
> pg_basebackup -D test10 -p 5431 -X s -P  -R -d "user=vignesh dbname=postgres"
> primary_conninfo will have dbname=postgres
>
> case 2:
> pg_basebackup -D test10 -p 5431 -X s -P  -R -d "user=vignesh dbname=replication"
> primary_conninfo will have dbname=replication
>
> case 3:
> pg_basebackup -D test10 -p 5431 -X s -P  -R  -U vignesh
> primary_conninfo will not have dbname
>
> case 4:
> pg_basebackup -D test10 -p 5431 -X s -P  -R
> primary_conninfo will not have dbname
>
> case 5:
> pg_basebackup -D test10 -p 5431 -X s -P  -R -d "user=vignesh"
> primary_conninfo will not have dbname
>
> case 6:
> pg_basebackup -D test10 -p 5431 -X s -P  -R  -d ""
> primary_conninfo will not have dbname
>
> Thoughts?

Thank you for updating the patch!

This behavior makes sense to me. But do we want to handle the case of
using environment variables too? IIUC,

pg_basebackup -D tmp -d "user=masahiko dbname=test_db"

is equivalent to:

PGDATABASE="user=masahiko dbname=test_db" pg_basebackup -D tmp

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



RE: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Sawada-san,

Thanks for giving comments!

> This behavior makes sense to me. But do we want to handle the case of
> using environment variables too? 

Yeah, v5 does not consider which libpq parameters are specified by environment
variables. Such a variable should be used when the dbname is not expressly written
in the connection string.
Such a path was added in the v6 patch. If the dbname is not determined after
parsing the connection string, we call PQconndefaults() to get settings from
environment variables and service files [1], then start to search dbname again.
Below shows an example.

```
PGPORT=5431 PGUSER=kuroda PGDATABASE=postgres pg_basebackup -D data_N2 -R -v
->
primary_conninfo = 'user=kuroda ... port=5431 ... dbname=postgres ... '
```

> IIUC,
>
> pg_basebackup -D tmp -d "user=masahiko dbname=test_db"
>
> is equivalent to:
>
> PGDATABASE="user=masahiko dbname=test_db" pg_basebackup -D tmp

The case won't work. I think You assumed that expanded_dbname like
PQconnectdbParams() [2] can be used for enviroment variables, but it is not correct
- it won't parse as connection string again.

In the libpq layer, connection parameters are parsed in PQconnectStartParams()->conninfo_array_parse().
When expand_dbname is specified, the entry "dbname" is firstly checked and
parsed its value. They are done at fe-connect.c:5846.

The environment variables are checked and parsed in conninfo_add_defaults(), which
is called from conninfo_array_parse(). However, it is done at fe-connect.c:5956 - the
expand_dbname has already been done at that time. This means there is no chance
that PGDATABASE is parsed as an expanded style.

For example, if the pg_basebackup runs like below:

PGDATABASE="user=kuroda dbname=postgres" pg_basebackup -D data_N2 -R -v

The primary_conninfo written in the file will be:

primary_conninfo = 'user=hayato ... dbname=''user=kuroda dbname=postgres'''

[1]: https://www.postgresql.org/docs/devel/libpq-pgservice.html
[2]: https://www.postgresql.org/docs/devel/libpq-connect.html#LIBPQ-PQCONNECTDBPARAMS

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Вложения

Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
vignesh C
Дата:
On Tue, 19 Mar 2024 at 17:18, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Sawada-san,
>
> Thanks for giving comments!
>
> > This behavior makes sense to me. But do we want to handle the case of
> > using environment variables too?
>
> Yeah, v5 does not consider which libpq parameters are specified by environment
> variables. Such a variable should be used when the dbname is not expressly written
> in the connection string.
> Such a path was added in the v6 patch. If the dbname is not determined after
> parsing the connection string, we call PQconndefaults() to get settings from
> environment variables and service files [1], then start to search dbname again.
> Below shows an example.
>
> ```
> PGPORT=5431 PGUSER=kuroda PGDATABASE=postgres pg_basebackup -D data_N2 -R -v
> ->
> primary_conninfo = 'user=kuroda ... port=5431 ... dbname=postgres ... '
> ```
>
> > IIUC,
> >
> > pg_basebackup -D tmp -d "user=masahiko dbname=test_db"
> >
> > is equivalent to:
> >
> > PGDATABASE="user=masahiko dbname=test_db" pg_basebackup -D tmp
>
> The case won't work. I think You assumed that expanded_dbname like
> PQconnectdbParams() [2] can be used for enviroment variables, but it is not correct
> - it won't parse as connection string again.
>
> In the libpq layer, connection parameters are parsed in PQconnectStartParams()->conninfo_array_parse().
> When expand_dbname is specified, the entry "dbname" is firstly checked and
> parsed its value. They are done at fe-connect.c:5846.
>
> The environment variables are checked and parsed in conninfo_add_defaults(), which
> is called from conninfo_array_parse(). However, it is done at fe-connect.c:5956 - the
> expand_dbname has already been done at that time. This means there is no chance
> that PGDATABASE is parsed as an expanded style.
>
> For example, if the pg_basebackup runs like below:
>
> PGDATABASE="user=kuroda dbname=postgres" pg_basebackup -D data_N2 -R -v
>
> The primary_conninfo written in the file will be:
>
> primary_conninfo = 'user=hayato ... dbname=''user=kuroda dbname=postgres'''

Thanks for the patch.
Here are the test results for various tests by specifying connection
string, environment variable, service file, and connection URIs with
the patch:
case 1:
pg_basebackup -D test10 -p 5431 -X s -P  -R -d "user=vignesh dbname=db1"
primary_conninfo will have dbname=db1

case 2:
pg_basebackup -D test10 -p 5431 -X s -P  -R -d "user=vignesh dbname=replication"
primary_conninfo will have dbname=replication

case 3:
pg_basebackup -D test10 -p 5431 -X s -P  -R  -U vignesh
primary_conninfo will not have dbname

case 4:
pg_basebackup -D test10 -p 5431 -X s -P  -R
primary_conninfo will not have dbname

case 5:
pg_basebackup -D test10 -p 5431 -X s -P  -R -d "user=vignesh"
primary_conninfo will not have dbname

case 6:
pg_basebackup -D test10 -p 5431 -X s -P  -R  -d ""
primary_conninfo will not have dbname

--- Testing through PGDATABASE environment variable
case 7:
export PGDATABASE="user=postgres dbname=test"
./pg_basebackup -D test10 -p 5431 -X s -P  -R
primary_conninfo will have dbname=''user=postgres dbname=test'' like below:
primary_conninfo = 'user=vignesh passfile=''/home/vignesh/.pgpass''
channel_binding=prefer port=5431 sslmode=prefer sslcompression=0
sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2
gssencmode=disable krbsrvname=postgres gssdelegation=0
target_session_attrs=any load_balance_hosts=disable
dbname=''user=postgres dbname=test'''

case 8:
export PGDATABASE=db1
./pg_basebackup -D test10 -p 5431 -X s -P  -R
primary_conninfo will have dbname=db1

--- Testing through pg_service
case 9:
Create .pg_service.conf with the following info:
[conn1]
dbname=db2

export PGSERVICE=conn1

./pg_basebackup -D test10 -p 5431 -X s -P  -R
primary_conninfo will have dbname=db2

case 10:
Create .pg_service.conf with the following info, i.e. there is no
database specified:
[conn1]

./pg_basebackup -D test10 -p 5431 -X s -P  -R
primary_conninfo will not have dbname

--- Testing through Connection URIs
case 11:
./pg_basebackup -D test10  -X s -P  -R -d "postgresql://localhost:5431"
primary_conninfo will not have dbname

case 12:
./pg_basebackup -D test10  -p 5431 -X s -P  -R -d
"postgresql://localhost/db3:5431"
primary_conninfo will have dbname=''db3:5431'' like below:
primary_conninfo = 'user=vignesh passfile=''/home/vignesh/.pgpass''
channel_binding=prefer host=localhost port=5431 sslmode=prefer
sslcompression=0 sslcertmode=allow sslsni=1
ssl_min_protocol_version=TLSv1.2 gssencmode=disable
krbsrvname=postgres gssdelegation=0 target_session_attrs=any
load_balance_hosts=disable dbname=''db3:5431'''

case 13:
./pg_basebackup -D test10  -p 5431 -X s -P  -R -d "postgresql://localhost/db3"
primary_conninfo will have dbname=db3

case 14:
./pg_basebackup -D test10   -X s -P  -R -d "postgresql://localhost:5431/db3"
primary_conninfo will have dbname=db3

case 15:
./pg_basebackup -D test10   -X s -P  -R -d
"postgresql://localhost:5431/db4,127.0.0.1:5431/db5"
primary_conninfo will have dbname=''db4,127.0.0.1:5431/db5'' like below:
primary_conninfo = 'user=vignesh passfile=''/home/vignesh/.pgpass''
channel_binding=prefer host=localhost port=5431 sslmode=prefer
sslcompression=0 sslcertmode=allow sslsni=1
ssl_min_protocol_version=TLSv1.2 gssencmode=disable
krbsrvname=postgres gssdelegation=0 target_session_attrs=any
load_balance_hosts=disable dbname=''db4,127.0.0.1:5431/db5'''

case 16:
./pg_basebackup -D test10   -X s -P  -R -d
"postgresql://localhost:5431,127.0.0.1:5431/db5"
primary_conninfo will have dbname=db5

case 17:
./pg_basebackup -D test10   -X s -P  -R -d
"postgresql:///db6?host=localhost&port=5431"
primary_conninfo will have dbname=db6

case 18:
 ./pg_basebackup -D test10 -p 5431  -X s -P  -R -d
"postgresql:///db7?host=/home/vignesh/postgres/inst/bin"
 primary_conninfo will have dbname=db7

case 19:
./pg_basebackup -D test10 -p 5431  -X s -P  -R -d
"postgresql:///db8?host=%2Fhome%2Fvignesh%2Fpostgres%2Finst%2Fbin"
 primary_conninfo will have dbname=db8

In these cases, the database name specified will be written to the
conf file. The test results look good to me.

Regards,
Vignesh



Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
Amit Kapila
Дата:
On Tue, Mar 19, 2024 at 5:18 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Thanks for giving comments!
>
> > This behavior makes sense to me. But do we want to handle the case of
> > using environment variables too?
>
> Yeah, v5 does not consider which libpq parameters are specified by environment
> variables. Such a variable should be used when the dbname is not expressly written
> in the connection string.
> Such a path was added in the v6 patch. If the dbname is not determined after
> parsing the connection string, we call PQconndefaults() to get settings from
> environment variables and service files [1], then start to search dbname again.
>

The functionality implemented by the patch looks good to me. I have
made minor modifications in the function names, error handling,
comments, and doc updates in the attached patch. Let me know what you
think of the attached.

--
With Regards,
Amit Kapila.

Вложения

Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
Masahiko Sawada
Дата:
On Tue, Mar 19, 2024 at 8:48 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Sawada-san,
>
> Thanks for giving comments!
>
> > This behavior makes sense to me. But do we want to handle the case of
> > using environment variables too?
>
> Yeah, v5 does not consider which libpq parameters are specified by environment
> variables. Such a variable should be used when the dbname is not expressly written
> in the connection string.
> Such a path was added in the v6 patch. If the dbname is not determined after
> parsing the connection string, we call PQconndefaults() to get settings from
> environment variables and service files [1], then start to search dbname again.
> Below shows an example.

Thank you for updating the patch!

>
> ```
> PGPORT=5431 PGUSER=kuroda PGDATABASE=postgres pg_basebackup -D data_N2 -R -v
> ->
> primary_conninfo = 'user=kuroda ... port=5431 ... dbname=postgres ... '
> ```
>
> > IIUC,
> >
> > pg_basebackup -D tmp -d "user=masahiko dbname=test_db"
> >
> > is equivalent to:
> >
> > PGDATABASE="user=masahiko dbname=test_db" pg_basebackup -D tmp
>
> The case won't work. I think You assumed that expanded_dbname like
> PQconnectdbParams() [2] can be used for enviroment variables, but it is not correct
> - it won't parse as connection string again.
>
> In the libpq layer, connection parameters are parsed in PQconnectStartParams()->conninfo_array_parse().
> When expand_dbname is specified, the entry "dbname" is firstly checked and
> parsed its value. They are done at fe-connect.c:5846.
>
> The environment variables are checked and parsed in conninfo_add_defaults(), which
> is called from conninfo_array_parse(). However, it is done at fe-connect.c:5956 - the
> expand_dbname has already been done at that time. This means there is no chance
> that PGDATABASE is parsed as an expanded style.
>

Thank you for pointing it out. I tested the use of PGDATABASE with
pg_basebackup and somehow missed the fact you explained.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
Masahiko Sawada
Дата:
On Wed, Mar 20, 2024 at 2:24 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, 19 Mar 2024 at 17:18, Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Dear Sawada-san,
> >
> > Thanks for giving comments!
> >
> > > This behavior makes sense to me. But do we want to handle the case of
> > > using environment variables too?
> >
> > Yeah, v5 does not consider which libpq parameters are specified by environment
> > variables. Such a variable should be used when the dbname is not expressly written
> > in the connection string.
> > Such a path was added in the v6 patch. If the dbname is not determined after
> > parsing the connection string, we call PQconndefaults() to get settings from
> > environment variables and service files [1], then start to search dbname again.
> > Below shows an example.
> >
> > ```
> > PGPORT=5431 PGUSER=kuroda PGDATABASE=postgres pg_basebackup -D data_N2 -R -v
> > ->
> > primary_conninfo = 'user=kuroda ... port=5431 ... dbname=postgres ... '
> > ```
> >
> > > IIUC,
> > >
> > > pg_basebackup -D tmp -d "user=masahiko dbname=test_db"
> > >
> > > is equivalent to:
> > >
> > > PGDATABASE="user=masahiko dbname=test_db" pg_basebackup -D tmp
> >
> > The case won't work. I think You assumed that expanded_dbname like
> > PQconnectdbParams() [2] can be used for enviroment variables, but it is not correct
> > - it won't parse as connection string again.
> >
> > In the libpq layer, connection parameters are parsed in PQconnectStartParams()->conninfo_array_parse().
> > When expand_dbname is specified, the entry "dbname" is firstly checked and
> > parsed its value. They are done at fe-connect.c:5846.
> >
> > The environment variables are checked and parsed in conninfo_add_defaults(), which
> > is called from conninfo_array_parse(). However, it is done at fe-connect.c:5956 - the
> > expand_dbname has already been done at that time. This means there is no chance
> > that PGDATABASE is parsed as an expanded style.
> >
> > For example, if the pg_basebackup runs like below:
> >
> > PGDATABASE="user=kuroda dbname=postgres" pg_basebackup -D data_N2 -R -v
> >
> > The primary_conninfo written in the file will be:
> >
> > primary_conninfo = 'user=hayato ... dbname=''user=kuroda dbname=postgres'''
>
> Thanks for the patch.
> Here are the test results for various tests by specifying connection
> string, environment variable, service file, and connection URIs with
> the patch:
> case 1:
> pg_basebackup -D test10 -p 5431 -X s -P  -R -d "user=vignesh dbname=db1"
> primary_conninfo will have dbname=db1
>
> case 2:
> pg_basebackup -D test10 -p 5431 -X s -P  -R -d "user=vignesh dbname=replication"
> primary_conninfo will have dbname=replication
>
> case 3:
> pg_basebackup -D test10 -p 5431 -X s -P  -R  -U vignesh
> primary_conninfo will not have dbname
>
> case 4:
> pg_basebackup -D test10 -p 5431 -X s -P  -R
> primary_conninfo will not have dbname
>
> case 5:
> pg_basebackup -D test10 -p 5431 -X s -P  -R -d "user=vignesh"
> primary_conninfo will not have dbname
>
> case 6:
> pg_basebackup -D test10 -p 5431 -X s -P  -R  -d ""
> primary_conninfo will not have dbname
>
> --- Testing through PGDATABASE environment variable
> case 7:
> export PGDATABASE="user=postgres dbname=test"
> ./pg_basebackup -D test10 -p 5431 -X s -P  -R
> primary_conninfo will have dbname=''user=postgres dbname=test'' like below:
> primary_conninfo = 'user=vignesh passfile=''/home/vignesh/.pgpass''
> channel_binding=prefer port=5431 sslmode=prefer sslcompression=0
> sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2
> gssencmode=disable krbsrvname=postgres gssdelegation=0
> target_session_attrs=any load_balance_hosts=disable
> dbname=''user=postgres dbname=test'''
>
> case 8:
> export PGDATABASE=db1
> ./pg_basebackup -D test10 -p 5431 -X s -P  -R
> primary_conninfo will have dbname=db1
>
> --- Testing through pg_service
> case 9:
> Create .pg_service.conf with the following info:
> [conn1]
> dbname=db2
>
> export PGSERVICE=conn1
>
> ./pg_basebackup -D test10 -p 5431 -X s -P  -R
> primary_conninfo will have dbname=db2
>
> case 10:
> Create .pg_service.conf with the following info, i.e. there is no
> database specified:
> [conn1]
>
> ./pg_basebackup -D test10 -p 5431 -X s -P  -R
> primary_conninfo will not have dbname
>
> --- Testing through Connection URIs
> case 11:
> ./pg_basebackup -D test10  -X s -P  -R -d "postgresql://localhost:5431"
> primary_conninfo will not have dbname
>
> case 12:
> ./pg_basebackup -D test10  -p 5431 -X s -P  -R -d
> "postgresql://localhost/db3:5431"
> primary_conninfo will have dbname=''db3:5431'' like below:
> primary_conninfo = 'user=vignesh passfile=''/home/vignesh/.pgpass''
> channel_binding=prefer host=localhost port=5431 sslmode=prefer
> sslcompression=0 sslcertmode=allow sslsni=1
> ssl_min_protocol_version=TLSv1.2 gssencmode=disable
> krbsrvname=postgres gssdelegation=0 target_session_attrs=any
> load_balance_hosts=disable dbname=''db3:5431'''
>
> case 13:
> ./pg_basebackup -D test10  -p 5431 -X s -P  -R -d "postgresql://localhost/db3"
> primary_conninfo will have dbname=db3
>
> case 14:
> ./pg_basebackup -D test10   -X s -P  -R -d "postgresql://localhost:5431/db3"
> primary_conninfo will have dbname=db3
>
> case 15:
> ./pg_basebackup -D test10   -X s -P  -R -d
> "postgresql://localhost:5431/db4,127.0.0.1:5431/db5"
> primary_conninfo will have dbname=''db4,127.0.0.1:5431/db5'' like below:
> primary_conninfo = 'user=vignesh passfile=''/home/vignesh/.pgpass''
> channel_binding=prefer host=localhost port=5431 sslmode=prefer
> sslcompression=0 sslcertmode=allow sslsni=1
> ssl_min_protocol_version=TLSv1.2 gssencmode=disable
> krbsrvname=postgres gssdelegation=0 target_session_attrs=any
> load_balance_hosts=disable dbname=''db4,127.0.0.1:5431/db5'''
>
> case 16:
> ./pg_basebackup -D test10   -X s -P  -R -d
> "postgresql://localhost:5431,127.0.0.1:5431/db5"
> primary_conninfo will have dbname=db5
>
> case 17:
> ./pg_basebackup -D test10   -X s -P  -R -d
> "postgresql:///db6?host=localhost&port=5431"
> primary_conninfo will have dbname=db6
>
> case 18:
>  ./pg_basebackup -D test10 -p 5431  -X s -P  -R -d
> "postgresql:///db7?host=/home/vignesh/postgres/inst/bin"
>  primary_conninfo will have dbname=db7
>
> case 19:
> ./pg_basebackup -D test10 -p 5431  -X s -P  -R -d
> "postgresql:///db8?host=%2Fhome%2Fvignesh%2Fpostgres%2Finst%2Fbin"
>  primary_conninfo will have dbname=db8
>
> In these cases, the database name specified will be written to the
> conf file. The test results look good to me.

Thank you for the tests! These results look good to me too.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

От
vignesh C
Дата:
On Wed, 20 Mar 2024 at 17:09, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Mar 19, 2024 at 5:18 PM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Thanks for giving comments!
> >
> > > This behavior makes sense to me. But do we want to handle the case of
> > > using environment variables too?
> >
> > Yeah, v5 does not consider which libpq parameters are specified by environment
> > variables. Such a variable should be used when the dbname is not expressly written
> > in the connection string.
> > Such a path was added in the v6 patch. If the dbname is not determined after
> > parsing the connection string, we call PQconndefaults() to get settings from
> > environment variables and service files [1], then start to search dbname again.
> >
>
> The functionality implemented by the patch looks good to me. I have
> made minor modifications in the function names, error handling,
> comments, and doc updates in the attached patch. Let me know what you
> think of the attached.

While reviewing, I found the following changes could be done:
a) we can add one test in 010_pg_basebackup.pl to verify the change
b) Here two different styles of linking is used in the document, we
can try to keep it same:
+        streaming replication and <link
linkend="logicaldecoding-replication-slots-synchronization">
+        logical replication slot synchronization</link> will use the same
+        settings later on. The dbname will be recorded only if the dbname was
+        specified explicitly in the connection string or environment variable
+        (see <xref linkend="libpq-envars"/>).

The updated patch has the changes for the same.

Regards,
Vignesh

Вложения