Обсуждение: Have pg_basebackup write "dbname" in "primary_conninfo"?
Hi Hi With the addition of "pg_sync_replication_slots()", there is now a use-case for including "dbname" in "primary_conninfo" and the docs have changed from stating [1]: Do not specify a database name in the primary_conninfo string. to [2]: For replication slot synchronization (see Section 48.2.3), it is also necessary to specify a valid dbname in the primary_conninfo string. This will only be used for slot synchronization. It is ignored for streaming. [1] https://www.postgresql.org/docs/16/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-STANDBY [2] https://www.postgresql.org/docs/devel/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-STANDBY However, when setting up a standby (with the intent of creating a logical standby) with pg_basebackup, providing the -R/-write-recovery-conf option results in a "primary_conninfo" string being generated without a "dbname" parameter, which requires a manual change should one intend to use "pg_sync_replication_slots()". I can't see any reason for continuing to omit "dbname", so suggest it should only continue to be omitted for 16 and earlier; see attached patch. Note that this does mean that if the conninfo string provided to pg_basebackup does not contain "dbname", the generated "primary_conninfo" GUC will default to "dbname=replication". It would be easy enough to suppress this, but AFAICS there's no way to tell if it was explicitly supplied by the user, in which case it would be surprising if it were omitted from the final "primary_conninfo" string. The only other place where GenerateRecoveryConfig() is called is pg_rewind; I can't see any adverse affects from the proposed change. But it's perfectly possible there's something blindingly obvious I'm overlooking. Regards Ian Barwick
Вложения
On Tue, 20 Feb 2024 at 00:34, Ian Lawrence Barwick <barwick@gmail.com> wrote: > With the addition of "pg_sync_replication_slots()", there is now a use-case for > including "dbname" in "primary_conninfo" and the docs have changed from > stating [1]: > > Do not specify a database name in the primary_conninfo string. > > to [2]: > > For replication slot synchronization (see Section 48.2.3), it is also > necessary to specify a valid dbname in the primary_conninfo string. This will > only be used for slot synchronization. It is ignored for streaming. > > [1] https://www.postgresql.org/docs/16/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-STANDBY > [2] https://www.postgresql.org/docs/devel/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-STANDBY Sounds like that documentation should be updated in the same way as was done for pg_basebackup/pg_receivewal in commit cca97ce6a665. When considering middleware/proxies having dbname in there can be useful even for older PG versions. > I can't see any reason for continuing to omit "dbname", so suggest it should > only continue to be omitted for 16 and earlier; see attached patch. Yeah, that seems like a good change. Though, I'm wondering if the version check is actually necessary.
On Tue, Feb 20, 2024 at 5:04 AM Ian Lawrence Barwick <barwick@gmail.com> wrote: > > > With the addition of "pg_sync_replication_slots()", there is now a use-case for > including "dbname" in "primary_conninfo" and the docs have changed from > stating [1]: > > Do not specify a database name in the primary_conninfo string. > > to [2]: > > For replication slot synchronization (see Section 48.2.3), it is also > necessary to specify a valid dbname in the primary_conninfo string. This will > only be used for slot synchronization. It is ignored for streaming. > > [1] https://www.postgresql.org/docs/16/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-STANDBY > [2] https://www.postgresql.org/docs/devel/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-STANDBY > > However, when setting up a standby (with the intent of creating a logical > standby) with pg_basebackup, providing the -R/-write-recovery-conf option > results in a "primary_conninfo" string being generated without a "dbname" > parameter, which requires a manual change should one intend to use > "pg_sync_replication_slots()". > > I can't see any reason for continuing to omit "dbname", so suggest it should > only continue to be omitted for 16 and earlier; see attached patch. > > Note that this does mean that if the conninfo string provided to pg_basebackup > does not contain "dbname", the generated "primary_conninfo" GUC will default to > "dbname=replication". > When I tried, it defaulted to user name: Default case connection string: primary_conninfo = 'user=KapilaAm passfile=''C:\\\\Users\\\\kapilaam\\\\AppData\\\\Roaming/postgresql/pgpass.conf'' channel_binding=disable dbname=KapilaAm port=5432 sslmode=disable sslcompression=0 sslcertmode=disable sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres gssdelegation=0 target_session_attrs=any load_balance_hosts=disable' When I specified -d "dbname = postgres" during backup: primary_conninfo = 'user=KapilaAm passfile=''C:\\\\Users\\\\kapilaam\\\\AppData\\\\Roaming/postgresql/pgpass.conf'' channel_binding=disable dbname=KapilaAm port=5432 sslmode=disable sslcompression=0 sslcertmode=disable sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres gssdelegation=0 target_session_attrs=any load_balance_hosts=disable' I think it makes sense to include dbname in the connection string programmatically (with some option) for slot sync functionality but it is not clear to me if there is any impact of the same when the standby is not set up to sync up logical slots. I haven't checked but if there is a way to distinguish the case where we write dbname only when specified by the user then that would be great but this is worth considering even without that. -- With Regards, Amit Kapila.
Dear Ian, Thanks for making the patch. > With the addition of "pg_sync_replication_slots()", there is now a use-case for > including "dbname" in "primary_conninfo" and the docs have changed from > stating [1]: > > Do not specify a database name in the primary_conninfo string. > > to [2]: > > For replication slot synchronization (see Section 48.2.3), it is also > necessary to specify a valid dbname in the primary_conninfo string. This will > only be used for slot synchronization. It is ignored for streaming. > > [1] > https://www.postgresql.org/docs/16/runtime-config-replication.html#RUNTIME > -CONFIG-REPLICATION-STANDBY > [2] > https://www.postgresql.org/docs/devel/runtime-config-replication.html#RUNTI > ME-CONFIG-REPLICATION-STANDBY > > However, when setting up a standby (with the intent of creating a logical > standby) with pg_basebackup, providing the -R/-write-recovery-conf option > results in a "primary_conninfo" string being generated without a "dbname" > parameter, which requires a manual change should one intend to use > "pg_sync_replication_slots()". > > I can't see any reason for continuing to omit "dbname", so suggest it should > only continue to be omitted for 16 and earlier; see attached patch. Hmm, I also cannot find a reason, but we can document the change. > Note that this does mean that if the conninfo string provided to pg_basebackup > does not contain "dbname", the generated "primary_conninfo" GUC will default to > "dbname=replication". It would be easy enough to suppress this, but AFAICS > there's no way to tell if it was explicitly supplied by the user, in which case > it would be surprising if it were omitted from the final "primary_conninfo" > string. I found an inconsistency. When I ran ` pg_basebackup -D data_N2 -U postgres -R`, dbname would be set as username. ``` primary_conninfo = 'user=postgres ... dbname=postgres ``` However, when I ran `pg_basebackup -D data_N2 -d "user=postgres" -R`, dbname would be set as "replication". Is it an intentional item? ``` primary_conninfo = 'user=postgres ... dbname=replication... ``` > The only other place where GenerateRecoveryConfig() is called is pg_rewind; > I can't see any adverse affects from the proposed change. But it's perfectly > possible there's something blindingly obvious I'm overlooking. On-going feature pg_createsubscriber[1] also uses GenerateRecoveryConfig(), but I can't see any bad effects. The output is being used to make consistent standby from the primary. Even if dbname is set in the primary_conninfo, it would be ignored. [1]: https://commitfest.postgresql.org/47/4637/ Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
On Tue, Feb 20, 2024 at 4:18 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > I found an inconsistency. When I ran ` pg_basebackup -D data_N2 -U postgres -R`, > dbname would be set as username. > > ``` > primary_conninfo = 'user=postgres ... dbname=postgres > ``` > > However, when I ran `pg_basebackup -D data_N2 -d "user=postgres" -R`, > dbname would be set as "replication". Is it an intentional item? > > ``` > primary_conninfo = 'user=postgres ... dbname=replication... > ``` Seems weird to me. You don't use dbname=replication to ask for a replication connection, so why would we ever end up with that anywhere? And especially in only one of two such closely related cases? -- Robert Haas EDB: http://www.enterprisedb.com
Dear Robert, > Seems weird to me. You don't use dbname=replication to ask for a > replication connection, so why would we ever end up with that > anywhere? And especially in only one of two such closely related > cases? Just FYI - here is an extreme case. And note that I have applied proposed patch. When `pg_basebackup -D data_N2 -R` is used: ``` primary_conninfo = 'user=hayato ... dbname=hayato ... ``` But when `pg_basebackup -d "" -D data_N2 -R` is used: ``` primary_conninfo = 'user=hayato ... dbname=replication ``` Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
On Tue, Feb 20, 2024 at 5:58 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Seems weird to me. You don't use dbname=replication to ask for a > > replication connection, so why would we ever end up with that > > anywhere? And especially in only one of two such closely related > > cases? > > Just FYI - here is an extreme case. And note that I have applied proposed patch. > > When `pg_basebackup -D data_N2 -R` is used: > ``` > primary_conninfo = 'user=hayato ... dbname=hayato ... > ``` > > But when `pg_basebackup -d "" -D data_N2 -R` is used: > ``` > primary_conninfo = 'user=hayato ... dbname=replication > ``` It seems like maybe somebody should look into why this is happening, and perhaps fix it. -- Robert Haas EDB: http://www.enterprisedb.com
Dear Robert, > > Just FYI - here is an extreme case. And note that I have applied proposed patch. > > > > When `pg_basebackup -D data_N2 -R` is used: > > ``` > > primary_conninfo = 'user=hayato ... dbname=hayato ... > > ``` > > > > But when `pg_basebackup -d "" -D data_N2 -R` is used: > > ``` > > primary_conninfo = 'user=hayato ... dbname=replication > > ``` > > It seems like maybe somebody should look into why this is happening, > and perhaps fix it. I think this caused from below part [1] in GetConnection(). If both dbname and connection_string are the NULL, we will enter the else part and NULL would be substituted - {"dbnmae", NULL} key-value pair is generated only here. Then, in PQconnectdbParams()->PQconnectStartParams->pqConnectOptions2(), the strange part would be found and replaced to the username [2]. I think if both the connection string and the dbname are NULL, it should be considered as the physical replication connection. here is a patch to fix it. After the application, below two examples can output "dbname=replication". You can also confirm. ``` pg_basebackup -D data_N2 -U postgres pg_basebackup -D data_N2 -R -v -> primary_conninfo = 'user=postgres ... dbname=replication ... ``` [1] ``` else { keywords = pg_malloc0((argcount + 1) * sizeof(*keywords)); values = pg_malloc0((argcount + 1) * sizeof(*values)); keywords[i] = "dbname"; values[i] = dbname; i++; } ``` [2] ``` /* * If database name was not given, default it to equal user name */ if (conn->dbName == NULL || conn->dbName[0] == '\0') { free(conn->dbName); conn->dbName = strdup(conn->pguser); if (!conn->dbName) goto oom_error; } ``` Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Вложения
At Tue, 20 Feb 2024 19:56:10 +0530, Robert Haas <robertmhaas@gmail.com> wrote in > It seems like maybe somebody should look into why this is happening, > and perhaps fix it. GetConnection()@streamutil.c wants to ensure conninfo has a fallback database name ("replication"). However, the function seems to be ignoring the case where neither dbname nor connection string is given, which is the first case Kuroda-san raised. The second case is the intended behavior of the function. > /* pg_recvlogical uses dbname only; others use connection_string only. */ > Assert(dbname == NULL || connection_string == NULL); And the function incorrectly assumes that the connection string requires "dbname=replication". > * Merge the connection info inputs given in form of connection string, > * options and default values (dbname=replication, replication=true, etc.) But the name is a pseudo database name only used by pg_hba.conf (maybe) , which cannot be used as an actual database name (without quotation marks, or unless it is actually created). The function should not add the fallback database name because the connection string for physical replication connection doesn't require the dbname parameter. (attached patch) About the proposed patch, pg_basebackup cannot verify the validity of the dbname. It could be problematic. Although I haven't looked the original thread, it seems that the dbname is used only by pg_sync_replication_slots(). If it is true, couldn't we make the SQL function require a database name to make a connection, instead of requiring it in physical-replication conninfo? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Вложения
On Wed, Feb 21, 2024 at 2:04 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
About the proposed patch, pg_basebackup cannot verify the validity of
the dbname. It could be problematic.
Although I haven't looked the original thread, it seems that the
dbname is used only by pg_sync_replication_slots(). If it is true,
couldn't we make the SQL function require a database name to make a
connection, instead of requiring it in physical-replication conninfo?
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
I agree. If the intention is to meet the new requirement of the sync-slot patch which requires a dbname in the primary_conninfo, then pseudo dbnames will not work, whether it be the username or just "replication". I feel if the user does not specify dbname explicitly in pg_basebackup it should be left blank in the generated primary_conninfo string as well.
regards,
Ajin Cherian
Fujitsu Australia
On Wed, Feb 21, 2024 at 2:04 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
Although I haven't looked the original thread, it seems that the
dbname is used only by pg_sync_replication_slots(). If it is true,
couldn't we make the SQL function require a database name to make a
connection, instead of requiring it in physical-replication conninfo?
In the original thread, the intention is to not just provide this functionality using the function pg_sync_replication_slots(), but provide a GUC option on standbys to sync logical replication slots periodically even without calling that function. This requires connecting to a database.
regards,
Ajin Cherian
Fujitsu Australia
Dear Horiguchi-san, > GetConnection()@streamutil.c wants to ensure conninfo has a fallback > database name ("replication"). However, the function seems to be > ignoring the case where neither dbname nor connection string is given, > which is the first case Kuroda-san raised. The second case is the > intended behavior of the function. > > > /* pg_recvlogical uses dbname only; others use connection_string only. > */ > > Assert(dbname == NULL || connection_string == NULL); > > And the function incorrectly assumes that the connection string > requires "dbname=replication". > > > * Merge the connection info inputs given in form of connection string, > > * options and default values (dbname=replication, replication=true, > etc.) > > But the name is a pseudo database name only used by pg_hba.conf > (maybe) , which cannot be used as an actual database name (without > quotation marks, or unless it is actually created). The function > should not add the fallback database name because the connection > string for physical replication connection doesn't require the dbname > parameter. (attached patch) I was also missing, but the requirement was that the dbname should be included only when the dbname option was explicitly specified [1]. Even mine and yours cannot handle like that. Libpq function PQconnectdbParams()->pqConnectOptions2() fills all the parameter to PGconn, at that time the information whether it is intentionally specified or not is discarded. Then, GenerateRecoveryConfig() would just write down all the connection parameters from PGconn, they cannot recognize. One approach is that based on Horiguchi-san's one and initial patch, we can avoid writing when the dbname is same as the username. [1]: https://www.postgresql.org/message-id/CAA4eK1KH1d1J5giPMZVOtMe0iqncf1CpNwkBKoYAmXdC-kEGZg%40mail.gmail.com Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
On Wed, Feb 21, 2024 at 4:09 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote:
Dear Horiguchi-san,
> GetConnection()@streamutil.c wants to ensure conninfo has a fallback
> database name ("replication"). However, the function seems to be
> ignoring the case where neither dbname nor connection string is given,
> which is the first case Kuroda-san raised. The second case is the
> intended behavior of the function.
>
> > /* pg_recvlogical uses dbname only; others use connection_string only.
> */
> > Assert(dbname == NULL || connection_string == NULL);
>
> And the function incorrectly assumes that the connection string
> requires "dbname=replication".
>
> > * Merge the connection info inputs given in form of connection string,
> > * options and default values (dbname=replication, replication=true,
> etc.)
>
> But the name is a pseudo database name only used by pg_hba.conf
> (maybe) , which cannot be used as an actual database name (without
> quotation marks, or unless it is actually created). The function
> should not add the fallback database name because the connection
> string for physical replication connection doesn't require the dbname
> parameter. (attached patch)
I was also missing, but the requirement was that the dbname should be included
only when the dbname option was explicitly specified [1]. Even mine and yours
cannot handle like that. Libpq function PQconnectdbParams()->pqConnectOptions2()
fills all the parameter to PGconn, at that time the information whether it is
intentionally specified or not is discarded. Then, GenerateRecoveryConfig() would
just write down all the connection parameters from PGconn, they cannot recognize.
Well, one option is that if a default dbname should be written to the configuration file, then "postgres' is a better option than "replication" or "username" as the default option, at least a db of that name exists.
regards,
Ajin Cherian
Fujitsu Australia
On Wed, Feb 21, 2024 at 7:46 AM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > > > Just FYI - here is an extreme case. And note that I have applied proposed patch. > > > > > > When `pg_basebackup -D data_N2 -R` is used: > > > ``` > > > primary_conninfo = 'user=hayato ... dbname=hayato ... > > > ``` > > > > > > But when `pg_basebackup -d "" -D data_N2 -R` is used: > > > ``` > > > primary_conninfo = 'user=hayato ... dbname=replication > > > ``` > > > > It seems like maybe somebody should look into why this is happening, > > and perhaps fix it. > > I think this caused from below part [1] in GetConnection(). > > If both dbname and connection_string are the NULL, we will enter the else part > and NULL would be substituted - {"dbnmae", NULL} key-value pair is generated > only here. > > Then, in PQconnectdbParams()->PQconnectStartParams->pqConnectOptions2(), > the strange part would be found and replaced to the username [2]. > > I think if both the connection string and the dbname are NULL, it should be > considered as the physical replication connection. here is a patch to fix it. > When dbname is NULL or not given, it defaults to username. This follows the specs of the connection string. See (dbname # The database name. Defaults to be the same as the user name...) [1]. Your patch breaks that specs, so I don't think it is correct. [1] - https://www.postgresql.org/docs/devel/libpq-connect.html#LIBPQ-CONNSTRING -- With Regards, Amit Kapila.
On Wed, Feb 21, 2024 at 8:34 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Tue, 20 Feb 2024 19:56:10 +0530, Robert Haas <robertmhaas@gmail.com> wrote in > > It seems like maybe somebody should look into why this is happening, > > and perhaps fix it. > > GetConnection()@streamutil.c wants to ensure conninfo has a fallback > database name ("replication"). However, the function seems to be > ignoring the case where neither dbname nor connection string is given, > which is the first case Kuroda-san raised. The second case is the > intended behavior of the function. > > > /* pg_recvlogical uses dbname only; others use connection_string only. */ > > Assert(dbname == NULL || connection_string == NULL); > > And the function incorrectly assumes that the connection string > requires "dbname=replication". > > > * Merge the connection info inputs given in form of connection string, > > * options and default values (dbname=replication, replication=true, etc.) > > But the name is a pseudo database name only used by pg_hba.conf > (maybe) , which cannot be used as an actual database name (without > quotation marks, or unless it is actually created). The function > should not add the fallback database name because the connection > string for physical replication connection doesn't require the dbname > parameter. (attached patch) > We do append dbname=replication even in libpqrcv_connect for .pgpass lookup as mentioned in comments. See below: libpqrcv_connect() { .... else { /* * The database name is ignored by the server in replication mode, * but specify "replication" for .pgpass lookup. */ keys[++i] = "dbname"; vals[i] = "replication"; } ... } I think as part of this effort we should just add dbname to primary_conninfo written in postgresql.auto.conf file. As above, the question is valid whether we should do it just for 17 or for prior versions. Let's discuss more on that. I am not sure of the use case for versions before 17 but commit cca97ce6a665 mentioned that some middleware or proxies might however need to know the dbname to make the correct routing decision for the connection. Does that apply here as well? If so, we can do it and update the docs, otherwise, let's do it just for 17. -- With Regards, Amit Kapila.
Dear Amit, > When dbname is NULL or not given, it defaults to username. This > follows the specs of the connection string. See (dbname # > The database name. Defaults to be the same as the user name...) [1]. > Your patch breaks that specs, so I don't think it is correct. I have proposed the point because I thought pg_basebackup basically wanted to do a physical replication. But if the general libpq rule is stronger than it, we should not apply my add_NULL_check.txt. Let's forget it. Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Dear Amit, > We do append dbname=replication even in libpqrcv_connect for .pgpass > lookup as mentioned in comments. See below: > libpqrcv_connect() > { > .... > else > { > /* > * The database name is ignored by the server in replication mode, > * but specify "replication" for .pgpass lookup. > */ > keys[++i] = "dbname"; > vals[i] = "replication"; > } > ... > } OK. So we must add the value for the authorization, right? I think it should be described even in GetConnection(). > I think as part of this effort we should just add dbname to > primary_conninfo written in postgresql.auto.conf file. As above, the > question is valid whether we should do it just for 17 or for prior > versions. Let's discuss more on that. I am not sure of the use case > for versions before 17 but commit cca97ce6a665 mentioned that some > middleware or proxies might however need to know the dbname to make > the correct routing decision for the connection. Does that apply here > as well? If so, we can do it and update the docs, otherwise, let's do > it just for 17. Hmm, I might lose your requirements again. So, we must satisfy all of below ones, right? 1) add {"dbname", "replication"} key-value pair to look up .pgpass file correctly. 2) If the -R is given, output dbname=xxx value to be used by slotsync worker. 3) If the dbname is not given in the connection string, the same string as username must be used to keep the libpq connection rule. 4) No need to add dbname=replication pair PSA the patch for implementing it. It is basically same as Ian's one. However, this patch still cannot satisfy the condition 3). `pg_basebackup -D data_N2 -d "user=postgres" -R` -> dbname would not be appeared in primary_conninfo. This is because `if (connection_string)` case in GetConnection() explicy override a dbname to "replication". I've tried to add a dummy entry {"dbname", NULL} pair before the overriding, but it is no-op. Because The replacement of the dbname in pqConnectOptions2() would be done only for the valid (=lastly specified) connection options. Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Вложения
> PSA the patch for implementing it. It is basically same as Ian's one. > However, this patch still cannot satisfy the condition 3). > > `pg_basebackup -D data_N2 -d "user=postgres" -R` > -> dbname would not be appeared in primary_conninfo. > > This is because `if (connection_string)` case in GetConnection() explicy override > a dbname to "replication". I've tried to add a dummy entry {"dbname", NULL} pair > before the overriding, but it is no-op. Because The replacement of the dbname in > pqConnectOptions2() would be done only for the valid (=lastly specified) > connection options. Oh, this patch missed the straightforward case: pg_basebackup -D data_N2 -d "user=postgres dbname=replication" -R -> dbname would not be appeared in primary_conninfo. So I think it cannot be applied as-is. Sorry for sharing the bad item. Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
On Tue, Feb 27, 2024 at 2:00 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > > We do append dbname=replication even in libpqrcv_connect for .pgpass > > lookup as mentioned in comments. See below: > > libpqrcv_connect() > > { > > .... > > else > > { > > /* > > * The database name is ignored by the server in replication mode, > > * but specify "replication" for .pgpass lookup. > > */ > > keys[++i] = "dbname"; > > vals[i] = "replication"; > > } > > ... > > } > > OK. So we must add the value for the authorization, right? > I think it should be described even in GetConnection(). > > > I think as part of this effort we should just add dbname to > > primary_conninfo written in postgresql.auto.conf file. As above, the > > question is valid whether we should do it just for 17 or for prior > > versions. Let's discuss more on that. I am not sure of the use case > > for versions before 17 but commit cca97ce6a665 mentioned that some > > middleware or proxies might however need to know the dbname to make > > the correct routing decision for the connection. Does that apply here > > as well? If so, we can do it and update the docs, otherwise, let's do > > it just for 17. > > Hmm, I might lose your requirements again. So, we must satisfy all of below > ones, right? > 1) add {"dbname", "replication"} key-value pair to look up .pgpass file correctly. > 2) If the -R is given, output dbname=xxx value to be used by slotsync worker. > 3) If the dbname is not given in the connection string, the same string as > username must be used to keep the libpq connection rule. > 4) No need to add dbname=replication pair > Point 1) and 4) seems contradictory or maybe I am missing something. -- With Regards, Amit Kapila.
On Tue, Feb 27, 2024 at 2:07 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > > PSA the patch for implementing it. It is basically same as Ian's one. > > However, this patch still cannot satisfy the condition 3). > > > > `pg_basebackup -D data_N2 -d "user=postgres" -R` > > -> dbname would not be appeared in primary_conninfo. > > > > This is because `if (connection_string)` case in GetConnection() explicy override > > a dbname to "replication". I've tried to add a dummy entry {"dbname", NULL} pair > > before the overriding, but it is no-op. Because The replacement of the dbname in > > pqConnectOptions2() would be done only for the valid (=lastly specified) > > connection options. > > Oh, this patch missed the straightforward case: > > pg_basebackup -D data_N2 -d "user=postgres dbname=replication" -R > -> dbname would not be appeared in primary_conninfo. > > So I think it cannot be applied as-is. Sorry for sharing the bad item. > Can you please share the patch that can be considered for review? -- With Regards, Amit Kapila.
On Mon, 11 Mar 2024 at 17:16, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Feb 27, 2024 at 2:07 PM Hayato Kuroda (Fujitsu) > <kuroda.hayato@fujitsu.com> wrote: > > > > > PSA the patch for implementing it. It is basically same as Ian's one. > > > However, this patch still cannot satisfy the condition 3). > > > > > > `pg_basebackup -D data_N2 -d "user=postgres" -R` > > > -> dbname would not be appeared in primary_conninfo. > > > > > > This is because `if (connection_string)` case in GetConnection() explicy override > > > a dbname to "replication". I've tried to add a dummy entry {"dbname", NULL} pair > > > before the overriding, but it is no-op. Because The replacement of the dbname in > > > pqConnectOptions2() would be done only for the valid (=lastly specified) > > > connection options. > > > > Oh, this patch missed the straightforward case: > > > > pg_basebackup -D data_N2 -d "user=postgres dbname=replication" -R > > -> dbname would not be appeared in primary_conninfo. > > > > So I think it cannot be applied as-is. Sorry for sharing the bad item. > > > > Can you please share the patch that can be considered for review? Here is a patch with few changes: a) Removed the version check based on discussion from [1] b) updated the documentation. I have tested various scenarios with the attached patch, the dbname that is written in postgresql.auto.conf for each of the cases with pg_basebackup is given below: 1) ./pg_basebackup -U vignesh -R -> primary_conninfo = "dbname=vignesh" (In this case primary_conninfo will have dbname as username specified) 2) ./pg_basebackup -D data -R -> primary_conninfo = "dbname=vignesh" (In this case primary_conninfo will have the dbname as username (which is the same as the OS user while setting defaults)) 3) ./pg_basebackup -d "user=vignesh" -D data -R -> primary_conninfo = "dbname=replication" (In this case primary_conninfo will have dbname as replication which is the default value from GetConnection as connection string is specified) 4) ./pg_basebackup -d "user=vignesh dbname=postgres" -D data -R -> primary_conninfo = "dbname=postgres" (In this case primary_conninfo will have the dbname as the dbname specified) 5) ./pg_basebackup -d "" -D data -R -> primary_conninfo = "dbname=replication" (In this case primary_conninfo will have dbname as replication which is the default value from GetConnection as connection string is specified) I have mentioned the reasons as to why dbname is being set to replication or username in each of the cases. How about replacing these values in postgresql.auto.conf manually. [1] - https://www.postgresql.org/message-id/CAGECzQTh9oB3nu98DsHMpRaVaqXPDRgTDEikY82OAKYF0%3DhVMA%40mail.gmail.com Regards, Vignesh
Вложения
On Tue, Mar 12, 2024 at 5:13 PM vignesh C <vignesh21@gmail.com> wrote: > > On Mon, 11 Mar 2024 at 17:16, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Feb 27, 2024 at 2:07 PM Hayato Kuroda (Fujitsu) > > <kuroda.hayato@fujitsu.com> wrote: > > > > > > > PSA the patch for implementing it. It is basically same as Ian's one. > > > > However, this patch still cannot satisfy the condition 3). > > > > > > > > `pg_basebackup -D data_N2 -d "user=postgres" -R` > > > > -> dbname would not be appeared in primary_conninfo. > > > > > > > > This is because `if (connection_string)` case in GetConnection() explicy override > > > > a dbname to "replication". I've tried to add a dummy entry {"dbname", NULL} pair > > > > before the overriding, but it is no-op. Because The replacement of the dbname in > > > > pqConnectOptions2() would be done only for the valid (=lastly specified) > > > > connection options. > > > > > > Oh, this patch missed the straightforward case: > > > > > > pg_basebackup -D data_N2 -d "user=postgres dbname=replication" -R > > > -> dbname would not be appeared in primary_conninfo. > > > > > > So I think it cannot be applied as-is. Sorry for sharing the bad item. > > > > > > > Can you please share the patch that can be considered for review? > > Here is a patch with few changes: a) Removed the version check based > on discussion from [1] b) updated the documentation. > I have tested various scenarios with the attached patch, the dbname > that is written in postgresql.auto.conf for each of the cases with > pg_basebackup is given below: > 1) ./pg_basebackup -U vignesh -R > -> primary_conninfo = "dbname=vignesh" (In this case primary_conninfo > will have dbname as username specified) > 2) ./pg_basebackup -D data -R > -> primary_conninfo = "dbname=vignesh" (In this case primary_conninfo > will have the dbname as username (which is the same as the OS user > while setting defaults)) > 3) ./pg_basebackup -d "user=vignesh" -D data -R > -> primary_conninfo = "dbname=replication" (In this case > primary_conninfo will have dbname as replication which is the default > value from GetConnection as connection string is specified) > 4) ./pg_basebackup -d "user=vignesh dbname=postgres" -D data -R > -> primary_conninfo = "dbname=postgres" (In this case primary_conninfo > will have the dbname as the dbname specified) > 5) ./pg_basebackup -d "" -D data -R > -> primary_conninfo = "dbname=replication" (In this case > primary_conninfo will have dbname as replication which is the default > value from GetConnection as connection string is specified) > > I have mentioned the reasons as to why dbname is being set to > replication or username in each of the cases. > IIUC, the dbname is already allowed in connstring for pg_basebackup by commit cca97ce6a6 and with this patch we are just writing it in postgresql.auto.conf if -R option is used to write recovery info. This will help slot syncworker to automatically connect with database without user manually specifying the dbname and replication connection, the same will still be ignored. I don't see any problem with this. Does anyone else see any problem? The <filename>postgresql.auto.conf</filename> file will record the connection settings and, if specified, the replication slot that <application>pg_basebackup</application> is using, so that - streaming replication will use the same settings later on. + a) synchronization of logical replication slots on the primary to the + hot_standby from <link linkend="pg-sync-replication-slots"> + <function>pg_sync_replication_slots</function></link> or slot sync worker + and b) streaming replication will use the same settings later on. We can simply modify the last line as: ".. streaming replication or logical replication slots synchronization will use the same settings later on." Additionally, you can give a link reference to [1]. [1] - https://www.postgresql.org/docs/devel/logicaldecoding-explanation.html#LOGICALDECODING-REPLICATION-SLOTS-SYNCHRONIZATION -- With Regards, Amit Kapila.
On Wed, 13 Mar 2024 at 16:58, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Mar 12, 2024 at 5:13 PM vignesh C <vignesh21@gmail.com> wrote: > > > > On Mon, 11 Mar 2024 at 17:16, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Feb 27, 2024 at 2:07 PM Hayato Kuroda (Fujitsu) > > > <kuroda.hayato@fujitsu.com> wrote: > > > > > > > > > PSA the patch for implementing it. It is basically same as Ian's one. > > > > > However, this patch still cannot satisfy the condition 3). > > > > > > > > > > `pg_basebackup -D data_N2 -d "user=postgres" -R` > > > > > -> dbname would not be appeared in primary_conninfo. > > > > > > > > > > This is because `if (connection_string)` case in GetConnection() explicy override > > > > > a dbname to "replication". I've tried to add a dummy entry {"dbname", NULL} pair > > > > > before the overriding, but it is no-op. Because The replacement of the dbname in > > > > > pqConnectOptions2() would be done only for the valid (=lastly specified) > > > > > connection options. > > > > > > > > Oh, this patch missed the straightforward case: > > > > > > > > pg_basebackup -D data_N2 -d "user=postgres dbname=replication" -R > > > > -> dbname would not be appeared in primary_conninfo. > > > > > > > > So I think it cannot be applied as-is. Sorry for sharing the bad item. > > > > > > > > > > Can you please share the patch that can be considered for review? > > > > Here is a patch with few changes: a) Removed the version check based > > on discussion from [1] b) updated the documentation. > > I have tested various scenarios with the attached patch, the dbname > > that is written in postgresql.auto.conf for each of the cases with > > pg_basebackup is given below: > > 1) ./pg_basebackup -U vignesh -R > > -> primary_conninfo = "dbname=vignesh" (In this case primary_conninfo > > will have dbname as username specified) > > 2) ./pg_basebackup -D data -R > > -> primary_conninfo = "dbname=vignesh" (In this case primary_conninfo > > will have the dbname as username (which is the same as the OS user > > while setting defaults)) > > 3) ./pg_basebackup -d "user=vignesh" -D data -R > > -> primary_conninfo = "dbname=replication" (In this case > > primary_conninfo will have dbname as replication which is the default > > value from GetConnection as connection string is specified) > > 4) ./pg_basebackup -d "user=vignesh dbname=postgres" -D data -R > > -> primary_conninfo = "dbname=postgres" (In this case primary_conninfo > > will have the dbname as the dbname specified) > > 5) ./pg_basebackup -d "" -D data -R > > -> primary_conninfo = "dbname=replication" (In this case > > primary_conninfo will have dbname as replication which is the default > > value from GetConnection as connection string is specified) > > > > I have mentioned the reasons as to why dbname is being set to > > replication or username in each of the cases. > > > > IIUC, the dbname is already allowed in connstring for pg_basebackup by > commit cca97ce6a6 and with this patch we are just writing it in > postgresql.auto.conf if -R option is used to write recovery info. This > will help slot syncworker to automatically connect with database > without user manually specifying the dbname and replication > connection, the same will still be ignored. I don't see any problem > with this. Does anyone else see any problem? > > The <filename>postgresql.auto.conf</filename> file will record the connection > settings and, if specified, the replication slot > that <application>pg_basebackup</application> is using, so that > - streaming replication will use the same settings later on. > + a) synchronization of logical replication slots on the primary to the > + hot_standby from <link linkend="pg-sync-replication-slots"> > + <function>pg_sync_replication_slots</function></link> or slot > sync worker > + and b) streaming replication will use the same settings later on. > > We can simply modify the last line as: ".. streaming replication or > logical replication slots synchronization will use the same settings > later on." Additionally, you can give a link reference to [1]. Thanks for the comments, the attached v4 version patch has the changes for the same. Regards, Vignesh
Вложения
On Fri, Feb 23, 2024 at 3:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Feb 21, 2024 at 7:46 AM Hayato Kuroda (Fujitsu) > <kuroda.hayato@fujitsu.com> wrote: > > > > > > Just FYI - here is an extreme case. And note that I have applied proposed patch. > > > > > > > > When `pg_basebackup -D data_N2 -R` is used: > > > > ``` > > > > primary_conninfo = 'user=hayato ... dbname=hayato ... > > > > ``` > > > > > > > > But when `pg_basebackup -d "" -D data_N2 -R` is used: > > > > ``` > > > > primary_conninfo = 'user=hayato ... dbname=replication > > > > ``` > > > > > > It seems like maybe somebody should look into why this is happening, > > > and perhaps fix it. > > > > I think this caused from below part [1] in GetConnection(). > > > > If both dbname and connection_string are the NULL, we will enter the else part > > and NULL would be substituted - {"dbnmae", NULL} key-value pair is generated > > only here. > > > > Then, in PQconnectdbParams()->PQconnectStartParams->pqConnectOptions2(), > > the strange part would be found and replaced to the username [2]. > > > > I think if both the connection string and the dbname are NULL, it should be > > considered as the physical replication connection. here is a patch to fix it. > > > > When dbname is NULL or not given, it defaults to username. This > follows the specs of the connection string. This fact makes me think that the slotsync worker might be able to accept the primary_conninfo value even if there is no dbname in the value. That is, if there is no dbname in the primary_conninfo, it uses the username in accordance with the specs of the connection string. Currently, the slotsync worker connects to the local database first and then establishes the connection to the primary server. But if we can reverse the two steps, it can get the dbname that has actually been used to establish the remote connection and use it for the local connection too. That way, the primary_conninfo generated by pg_basebackup could work even without the patch. For example, if the OS user executing pg_basebackup is 'postgres', the slotsync worker would connect to the postgres database. Given the 'postgres' database is created by default and 'postgres' OS user is used in common, I guess it could cover many cases in practice actually. Having said that, even with (or without) the above change, we might want to change the pg_basebackup so that it writes the dbname to the primary_conninfo if -R option is specified. Since the database where the slotsync worker connects cannot be dropped while the slotsync worker is running, the user might want to change the database to connect, and it would be useful if they can do that using pg_basebackup instead of modifying the configuration file manually. While the current approach makes sense to me, I'm a bit concerned that we might end up having the pg_basebackup search the actual database name (e.g. 'dbname=template1') from the .pgpass file instead of 'dbname=replication'. As far as I tested on my environment, suppose that I execute: pg_basebackup -D tmp -d "dbname=testdb" -R The pg_basebackup established a replication connection but looked for the password of the 'testdb' database. This could be another inconvenience for the existing users who want to use the slot synchronization. A random idea I came up with is, we add a new option to the pg_basebackup to overwrite the full or some portion of the connection string that is eventually written in the primary_conninfo in postgresql.auto.conf. For example, the command: pg_basebackup -D tmp -d "host=1.1.1.1 port=5555" -R --primary-coninfo-ext "host=2.2.2.2 dbname=postgres" will produce the connection string that is based on -d option value but is overwritten by --primary-conninfo-ext option value, which will be like: host=2.2.2.2 dbname=postgres port=5555 This option might help not only for users who want to use the slotsync worker but also for users who want to take a basebackup from a standby but have the new standby connect to the primary. But it's still just an idea and I might be missing something. And given we're getting closer to the feature freeze, it would be a PG18 item. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Thu, Mar 14, 2024 at 5:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > This fact makes me think that the slotsync worker might be able to > accept the primary_conninfo value even if there is no dbname in the > value. That is, if there is no dbname in the primary_conninfo, it uses > the username in accordance with the specs of the connection string. > Currently, the slotsync worker connects to the local database first > and then establishes the connection to the primary server. But if we > can reverse the two steps, it can get the dbname that has actually > been used to establish the remote connection and use it for the local > connection too. That way, the primary_conninfo generated by > pg_basebackup could work even without the patch. For example, if the > OS user executing pg_basebackup is 'postgres', the slotsync worker > would connect to the postgres database. Given the 'postgres' database > is created by default and 'postgres' OS user is used in common, I > guess it could cover many cases in practice actually. > I think this is worth investigating but I suspect that in most cases users will end up using a replication connection without specifying the user name and we may not be able to give a meaningful error message when slotsync worker won't be able to connect. The same will be true even when the dbname same as the username would be used. > Having said that, even with (or without) the above change, we might > want to change the pg_basebackup so that it writes the dbname to the > primary_conninfo if -R option is specified. Since the database where > the slotsync worker connects cannot be dropped while the slotsync > worker is running, the user might want to change the database to > connect, and it would be useful if they can do that using > pg_basebackup instead of modifying the configuration file manually. > > While the current approach makes sense to me, I'm a bit concerned that > we might end up having the pg_basebackup search the actual database > name (e.g. 'dbname=template1') from the .pgpass file instead of > 'dbname=replication'. As far as I tested on my environment, suppose > that I execute: > > pg_basebackup -D tmp -d "dbname=testdb" -R > > The pg_basebackup established a replication connection but looked for > the password of the 'testdb' database. This could be another > inconvenience for the existing users who want to use the slot > synchronization. > This is true because it is internally using logical replication connection (as we will set set replication=database). I feel the mentioned behavior is an expected one with or without slotsync worker usage. Anyway, this is an optional feature, so users can still choose to ignore specifying dbname in the connection string. The point is that commit cca97ce6a6 allowed using dbname in the connection string in pg_basebackup and we are just extending to write that dbname along with other things in connection_info. > A random idea I came up with is, we add a new option to the > pg_basebackup to overwrite the full or some portion of the connection > string that is eventually written in the primary_conninfo in > postgresql.auto.conf. For example, the command: > > pg_basebackup -D tmp -d "host=1.1.1.1 port=5555" -R > --primary-coninfo-ext "host=2.2.2.2 dbname=postgres" > > will produce the connection string that is based on -d option value > but is overwritten by --primary-conninfo-ext option value, which will > be like: > > host=2.2.2.2 dbname=postgres port=5555 > > This option might help not only for users who want to use the slotsync > worker but also for users who want to take a basebackup from a standby > but have the new standby connect to the primary. > Agreed, this could be another way though it would be good to get some inputs from users or otherwise about the preferred way to specify dbname. One can also imagine using the Alter System for this purpose. > But it's still just an idea and I might be missing something. And > given we're getting closer to the feature freeze, it would be a PG18 > item. > +1. At this stage, it is important to discuss whether we should allow pg_baseback to write dbname (either a specified one or a default one) along with other parameters in primary_conninfo? -- With Regards, Amit Kapila.
On Thu, Mar 14, 2024 at 10:57 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Mar 14, 2024 at 5:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > This fact makes me think that the slotsync worker might be able to > > accept the primary_conninfo value even if there is no dbname in the > > value. That is, if there is no dbname in the primary_conninfo, it uses > > the username in accordance with the specs of the connection string. > > Currently, the slotsync worker connects to the local database first > > and then establishes the connection to the primary server. But if we > > can reverse the two steps, it can get the dbname that has actually > > been used to establish the remote connection and use it for the local > > connection too. That way, the primary_conninfo generated by > > pg_basebackup could work even without the patch. For example, if the > > OS user executing pg_basebackup is 'postgres', the slotsync worker > > would connect to the postgres database. Given the 'postgres' database > > is created by default and 'postgres' OS user is used in common, I > > guess it could cover many cases in practice actually. > > > > I think this is worth investigating but I suspect that in most cases > users will end up using a replication connection without specifying > the user name and we may not be able to give a meaningful error > message when slotsync worker won't be able to connect. The same will > be true even when the dbname same as the username would be used. > I attempted the change as suggested by Swada-San. Attached the PoC patch .For it to work, I have to expose a new get api in libpq-fe which gets dbname from stream-connection. Please have a look. Without this PoC patch, the errors in slot-sync worker: ----------------- a) If dbname is missing: [1230932] LOG: slot sync worker started [1230932] ERROR: slot synchronization requires dbname to be specified in primary_conninfo b) If specified db does not exist [1230913] LOG: slot sync worker started [1230913] FATAL: database "postgres1" does not exist ----------------- Now with this patch: ----------------- a) If the dbname same as user does not exist: [1232473] LOG: slot sync worker started [1232473] ERROR: could not connect to the primary server: connection to server at "127.0.0.1", port 5433 failed: FATAL: database "bckp_user" does not exist b) If user itself is removed from primary_conninfo, libpq takes user who has authenticated the system by default and gives error if db of same name does not exist ERROR: could not connect to the primary server: connection to server at "127.0.0.1", port 5433 failed: FATAL: database "shveta" does not exist ----------------- The errors in second case look slightly confusing to me. thanks Shveta
Вложения
On Thu, Mar 14, 2024 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Mar 14, 2024 at 5:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > This fact makes me think that the slotsync worker might be able to > > accept the primary_conninfo value even if there is no dbname in the > > value. That is, if there is no dbname in the primary_conninfo, it uses > > the username in accordance with the specs of the connection string. > > Currently, the slotsync worker connects to the local database first > > and then establishes the connection to the primary server. But if we > > can reverse the two steps, it can get the dbname that has actually > > been used to establish the remote connection and use it for the local > > connection too. That way, the primary_conninfo generated by > > pg_basebackup could work even without the patch. For example, if the > > OS user executing pg_basebackup is 'postgres', the slotsync worker > > would connect to the postgres database. Given the 'postgres' database > > is created by default and 'postgres' OS user is used in common, I > > guess it could cover many cases in practice actually. > > > > I think this is worth investigating but I suspect that in most cases > users will end up using a replication connection without specifying > the user name and we may not be able to give a meaningful error > message when slotsync worker won't be able to connect. The same will > be true even when the dbname same as the username would be used. What do you mean by not being able to give a meaningful error message? If the slotsync worker uses the user name as the dbname, and such a database doesn't exist, the error message the user will get is "database "test_user" does not exist". ISTM the same is true when the user specifies the wrong database in the primary_conninfo. > > > Having said that, even with (or without) the above change, we might > > want to change the pg_basebackup so that it writes the dbname to the > > primary_conninfo if -R option is specified. Since the database where > > the slotsync worker connects cannot be dropped while the slotsync > > worker is running, the user might want to change the database to > > connect, and it would be useful if they can do that using > > pg_basebackup instead of modifying the configuration file manually. > > > > While the current approach makes sense to me, I'm a bit concerned that > > we might end up having the pg_basebackup search the actual database > > name (e.g. 'dbname=template1') from the .pgpass file instead of > > 'dbname=replication'. As far as I tested on my environment, suppose > > that I execute: > > > > pg_basebackup -D tmp -d "dbname=testdb" -R > > > > The pg_basebackup established a replication connection but looked for > > the password of the 'testdb' database. This could be another > > inconvenience for the existing users who want to use the slot > > synchronization. > > > > This is true because it is internally using logical replication > connection (as we will set set replication=database). Did you mean the pg_basebackup is using a logical replication connection in this case? As far as I tested, even if we specify dbname to the -d option of pg_basebackup, it uses a physical replication connection. For example, it can take a backup even if I specify a non-existing database name. > > A random idea I came up with is, we add a new option to the > > pg_basebackup to overwrite the full or some portion of the connection > > string that is eventually written in the primary_conninfo in > > postgresql.auto.conf. For example, the command: > > > > pg_basebackup -D tmp -d "host=1.1.1.1 port=5555" -R > > --primary-coninfo-ext "host=2.2.2.2 dbname=postgres" > > > > will produce the connection string that is based on -d option value > > but is overwritten by --primary-conninfo-ext option value, which will > > be like: > > > > host=2.2.2.2 dbname=postgres port=5555 > > > > This option might help not only for users who want to use the slotsync > > worker but also for users who want to take a basebackup from a standby > > but have the new standby connect to the primary. > > > > Agreed, this could be another way though it would be good to get some > inputs from users or otherwise about the preferred way to specify > dbname. One can also imagine using the Alter System for this purpose. Agreed. > > > But it's still just an idea and I might be missing something. And > > given we're getting closer to the feature freeze, it would be a PG18 > > item. > > > > +1. At this stage, it is important to discuss whether we should allow > pg_baseback to write dbname (either a specified one or a default one) > along with other parameters in primary_conninfo? > True. While I basically agree that pg_basebackup writes dbname in primary_conninfo, I'm concerned that writing "dbname=replication" could be problematic. Quoting the case 3) Vignesh summarized before: 3) ./pg_basebackup -d "user=vignesh" -D data -R -> primary_conninfo = "dbname=replication" (In this case primary_conninfo will have dbname as replication which is the default value from GetConnection as connection string is specified) The primary_conninfo generated by pg_basebackup -R is now used by either a walreceiver (for physical replication connection) or a slotsync worker (for normal connection). The "dbname=replication" is okay for walreceiver. On the other hand, as for the slotsync worker, it can pass the CheckAndGetDbnameFromConninfo() check but it's very likely that it cannot connect to the primary since most users won't create a database with "replication" name. The user will end up getting an error message like 'database "replication" does not exist' but I'm not sure it would be informative for users. Rather, the error message "slot synchronization requires dbname to be specified in primary_conninfo" might be more informative for users. So I personally like to omit the dbname if "dbname=replication", at this point. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Thu, Mar 14, 2024 at 1:45 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Mar 14, 2024 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Mar 14, 2024 at 5:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > This fact makes me think that the slotsync worker might be able to > > > accept the primary_conninfo value even if there is no dbname in the > > > value. That is, if there is no dbname in the primary_conninfo, it uses > > > the username in accordance with the specs of the connection string. > > > Currently, the slotsync worker connects to the local database first > > > and then establishes the connection to the primary server. But if we > > > can reverse the two steps, it can get the dbname that has actually > > > been used to establish the remote connection and use it for the local > > > connection too. That way, the primary_conninfo generated by > > > pg_basebackup could work even without the patch. For example, if the > > > OS user executing pg_basebackup is 'postgres', the slotsync worker > > > would connect to the postgres database. Given the 'postgres' database > > > is created by default and 'postgres' OS user is used in common, I > > > guess it could cover many cases in practice actually. > > > > > > > I think this is worth investigating but I suspect that in most cases > > users will end up using a replication connection without specifying > > the user name and we may not be able to give a meaningful error > > message when slotsync worker won't be able to connect. The same will > > be true even when the dbname same as the username would be used. > > What do you mean by not being able to give a meaningful error message? > > If the slotsync worker uses the user name as the dbname, and such a > database doesn't exist, the error message the user will get is > "database "test_user" does not exist". ISTM the same is true when the > user specifies the wrong database in the primary_conninfo. > Right, the exact error message as mentioned by Shveta will be: ERROR: could not connect to the primary server: connection to server at "127.0.0.1", port 5433 failed: FATAL: database "bckp_user" does not exist Now, without this idea, the ERROR message will be: ERROR: slot synchronization requires dbname to be specified in primary_conninfo I am not sure how much this matters but the second message sounds more useful. > > > > > Having said that, even with (or without) the above change, we might > > > want to change the pg_basebackup so that it writes the dbname to the > > > primary_conninfo if -R option is specified. Since the database where > > > the slotsync worker connects cannot be dropped while the slotsync > > > worker is running, the user might want to change the database to > > > connect, and it would be useful if they can do that using > > > pg_basebackup instead of modifying the configuration file manually. > > > > > > While the current approach makes sense to me, I'm a bit concerned that > > > we might end up having the pg_basebackup search the actual database > > > name (e.g. 'dbname=template1') from the .pgpass file instead of > > > 'dbname=replication'. As far as I tested on my environment, suppose > > > that I execute: > > > > > > pg_basebackup -D tmp -d "dbname=testdb" -R > > > > > > The pg_basebackup established a replication connection but looked for > > > the password of the 'testdb' database. This could be another > > > inconvenience for the existing users who want to use the slot > > > synchronization. > > > > > > > This is true because it is internally using logical replication > > connection (as we will set set replication=database). > > Did you mean the pg_basebackup is using a logical replication > connection in this case? As far as I tested, even if we specify dbname > to the -d option of pg_basebackup, it uses a physical replication > connection. For example, it can take a backup even if I specify a > non-existing database name. > You are right. I misunderstood some part of the code in GetConnection. However, I think my point is still valid that if the user has provided dbname in the connection string it means that she wants that database entry to be looked upon not "replication" entry. > > > > > > But it's still just an idea and I might be missing something. And > > > given we're getting closer to the feature freeze, it would be a PG18 > > > item. > > > > > > > +1. At this stage, it is important to discuss whether we should allow > > pg_baseback to write dbname (either a specified one or a default one) > > along with other parameters in primary_conninfo? > > > > True. While I basically agree that pg_basebackup writes dbname in > primary_conninfo, I'm concerned that writing "dbname=replication" > could be problematic. Quoting the case 3) Vignesh summarized before: > > 3) ./pg_basebackup -d "user=vignesh" -D data -R > -> primary_conninfo = "dbname=replication" (In this case > primary_conninfo will have dbname as replication which is the default > value from GetConnection as connection string is specified) > > The primary_conninfo generated by pg_basebackup -R is now used by > either a walreceiver (for physical replication connection) or a > slotsync worker (for normal connection). The "dbname=replication" is > okay for walreceiver. On the other hand, as for the slotsync worker, > it can pass the CheckAndGetDbnameFromConninfo() check but it's very > likely that it cannot connect to the primary since most users won't > create a database with "replication" name. The user will end up > getting an error message like 'database "replication" does not exist' > but I'm not sure it would be informative for users. Rather, the error > message "slot synchronization requires dbname to be specified in > primary_conninfo" might be more informative for users. So I personally > like to omit the dbname if "dbname=replication", at this point. > How about if we write dbname in primary_conninfo to postgresql.auto.conf file only when the user has explicitly specified dbname in the connection string? To achieve this we need to somehow pass this information via PGconn (say by having a new bool variable dbname_specified) from GetConnection() or something like that? -- With Regards, Amit Kapila.
On Thu, 14 Mar 2024 at 15:49, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Mar 14, 2024 at 1:45 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Thu, Mar 14, 2024 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Thu, Mar 14, 2024 at 5:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > This fact makes me think that the slotsync worker might be able to > > > > accept the primary_conninfo value even if there is no dbname in the > > > > value. That is, if there is no dbname in the primary_conninfo, it uses > > > > the username in accordance with the specs of the connection string. > > > > Currently, the slotsync worker connects to the local database first > > > > and then establishes the connection to the primary server. But if we > > > > can reverse the two steps, it can get the dbname that has actually > > > > been used to establish the remote connection and use it for the local > > > > connection too. That way, the primary_conninfo generated by > > > > pg_basebackup could work even without the patch. For example, if the > > > > OS user executing pg_basebackup is 'postgres', the slotsync worker > > > > would connect to the postgres database. Given the 'postgres' database > > > > is created by default and 'postgres' OS user is used in common, I > > > > guess it could cover many cases in practice actually. > > > > > > > > > > I think this is worth investigating but I suspect that in most cases > > > users will end up using a replication connection without specifying > > > the user name and we may not be able to give a meaningful error > > > message when slotsync worker won't be able to connect. The same will > > > be true even when the dbname same as the username would be used. > > > > What do you mean by not being able to give a meaningful error message? > > > > If the slotsync worker uses the user name as the dbname, and such a > > database doesn't exist, the error message the user will get is > > "database "test_user" does not exist". ISTM the same is true when the > > user specifies the wrong database in the primary_conninfo. > > > > Right, the exact error message as mentioned by Shveta will be: > ERROR: could not connect to the primary server: connection to server > at "127.0.0.1", port 5433 failed: FATAL: database "bckp_user" does not > exist > > Now, without this idea, the ERROR message will be: > ERROR: slot synchronization requires dbname to be specified in > primary_conninfo > > I am not sure how much this matters but the second message sounds more useful. > > > > > > > > Having said that, even with (or without) the above change, we might > > > > want to change the pg_basebackup so that it writes the dbname to the > > > > primary_conninfo if -R option is specified. Since the database where > > > > the slotsync worker connects cannot be dropped while the slotsync > > > > worker is running, the user might want to change the database to > > > > connect, and it would be useful if they can do that using > > > > pg_basebackup instead of modifying the configuration file manually. > > > > > > > > While the current approach makes sense to me, I'm a bit concerned that > > > > we might end up having the pg_basebackup search the actual database > > > > name (e.g. 'dbname=template1') from the .pgpass file instead of > > > > 'dbname=replication'. As far as I tested on my environment, suppose > > > > that I execute: > > > > > > > > pg_basebackup -D tmp -d "dbname=testdb" -R > > > > > > > > The pg_basebackup established a replication connection but looked for > > > > the password of the 'testdb' database. This could be another > > > > inconvenience for the existing users who want to use the slot > > > > synchronization. > > > > > > > > > > This is true because it is internally using logical replication > > > connection (as we will set set replication=database). > > > > Did you mean the pg_basebackup is using a logical replication > > connection in this case? As far as I tested, even if we specify dbname > > to the -d option of pg_basebackup, it uses a physical replication > > connection. For example, it can take a backup even if I specify a > > non-existing database name. > > > > You are right. I misunderstood some part of the code in GetConnection. > However, I think my point is still valid that if the user has provided > dbname in the connection string it means that she wants that database > entry to be looked upon not "replication" entry. > > > > > > > > > > But it's still just an idea and I might be missing something. And > > > > given we're getting closer to the feature freeze, it would be a PG18 > > > > item. > > > > > > > > > > +1. At this stage, it is important to discuss whether we should allow > > > pg_baseback to write dbname (either a specified one or a default one) > > > along with other parameters in primary_conninfo? > > > > > > > True. While I basically agree that pg_basebackup writes dbname in > > primary_conninfo, I'm concerned that writing "dbname=replication" > > could be problematic. Quoting the case 3) Vignesh summarized before: > > > > 3) ./pg_basebackup -d "user=vignesh" -D data -R > > -> primary_conninfo = "dbname=replication" (In this case > > primary_conninfo will have dbname as replication which is the default > > value from GetConnection as connection string is specified) > > > > The primary_conninfo generated by pg_basebackup -R is now used by > > either a walreceiver (for physical replication connection) or a > > slotsync worker (for normal connection). The "dbname=replication" is > > okay for walreceiver. On the other hand, as for the slotsync worker, > > it can pass the CheckAndGetDbnameFromConninfo() check but it's very > > likely that it cannot connect to the primary since most users won't > > create a database with "replication" name. The user will end up > > getting an error message like 'database "replication" does not exist' > > but I'm not sure it would be informative for users. Rather, the error > > message "slot synchronization requires dbname to be specified in > > primary_conninfo" might be more informative for users. So I personally > > like to omit the dbname if "dbname=replication", at this point. > > > > How about if we write dbname in primary_conninfo to > postgresql.auto.conf file only when the user has explicitly specified > dbname in the connection string? To achieve this we need to somehow > pass this information via PGconn (say by having a new bool variable > dbname_specified) from GetConnection() or something like that? Here is a patch which will write dbname in the primary_conninfo only if the database name is specified explicitly. I have added a new function GetDbnameFromConnectionString which will return the dbname specified in the connection and GenerateRecoveryConfig will append this database name. Here are the test results with the patch: case 1: pg_basebackup -D test10 -p 5431 -X s -P -R -d "user=vignesh dbname=postgres" primary_conninfo will have dbname=postgres case 2: pg_basebackup -D test10 -p 5431 -X s -P -R -d "user=vignesh dbname=replication" primary_conninfo will have dbname=replication case 3: pg_basebackup -D test10 -p 5431 -X s -P -R -U vignesh primary_conninfo will not have dbname case 4: pg_basebackup -D test10 -p 5431 -X s -P -R primary_conninfo will not have dbname case 5: pg_basebackup -D test10 -p 5431 -X s -P -R -d "user=vignesh" primary_conninfo will not have dbname case 6: pg_basebackup -D test10 -p 5431 -X s -P -R -d "" primary_conninfo will not have dbname Thoughts? Regards, Vignesh
Вложения
On Thu, Mar 14, 2024 at 11:46 PM vignesh C <vignesh21@gmail.com> wrote: > > On Thu, 14 Mar 2024 at 15:49, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Mar 14, 2024 at 1:45 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Thu, Mar 14, 2024 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Thu, Mar 14, 2024 at 5:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > This fact makes me think that the slotsync worker might be able to > > > > > accept the primary_conninfo value even if there is no dbname in the > > > > > value. That is, if there is no dbname in the primary_conninfo, it uses > > > > > the username in accordance with the specs of the connection string. > > > > > Currently, the slotsync worker connects to the local database first > > > > > and then establishes the connection to the primary server. But if we > > > > > can reverse the two steps, it can get the dbname that has actually > > > > > been used to establish the remote connection and use it for the local > > > > > connection too. That way, the primary_conninfo generated by > > > > > pg_basebackup could work even without the patch. For example, if the > > > > > OS user executing pg_basebackup is 'postgres', the slotsync worker > > > > > would connect to the postgres database. Given the 'postgres' database > > > > > is created by default and 'postgres' OS user is used in common, I > > > > > guess it could cover many cases in practice actually. > > > > > > > > > > > > > I think this is worth investigating but I suspect that in most cases > > > > users will end up using a replication connection without specifying > > > > the user name and we may not be able to give a meaningful error > > > > message when slotsync worker won't be able to connect. The same will > > > > be true even when the dbname same as the username would be used. > > > > > > What do you mean by not being able to give a meaningful error message? > > > > > > If the slotsync worker uses the user name as the dbname, and such a > > > database doesn't exist, the error message the user will get is > > > "database "test_user" does not exist". ISTM the same is true when the > > > user specifies the wrong database in the primary_conninfo. > > > > > > > Right, the exact error message as mentioned by Shveta will be: > > ERROR: could not connect to the primary server: connection to server > > at "127.0.0.1", port 5433 failed: FATAL: database "bckp_user" does not > > exist > > > > Now, without this idea, the ERROR message will be: > > ERROR: slot synchronization requires dbname to be specified in > > primary_conninfo > > > > I am not sure how much this matters but the second message sounds more useful. > > > > > > > > > > > Having said that, even with (or without) the above change, we might > > > > > want to change the pg_basebackup so that it writes the dbname to the > > > > > primary_conninfo if -R option is specified. Since the database where > > > > > the slotsync worker connects cannot be dropped while the slotsync > > > > > worker is running, the user might want to change the database to > > > > > connect, and it would be useful if they can do that using > > > > > pg_basebackup instead of modifying the configuration file manually. > > > > > > > > > > While the current approach makes sense to me, I'm a bit concerned that > > > > > we might end up having the pg_basebackup search the actual database > > > > > name (e.g. 'dbname=template1') from the .pgpass file instead of > > > > > 'dbname=replication'. As far as I tested on my environment, suppose > > > > > that I execute: > > > > > > > > > > pg_basebackup -D tmp -d "dbname=testdb" -R > > > > > > > > > > The pg_basebackup established a replication connection but looked for > > > > > the password of the 'testdb' database. This could be another > > > > > inconvenience for the existing users who want to use the slot > > > > > synchronization. > > > > > > > > > > > > > This is true because it is internally using logical replication > > > > connection (as we will set set replication=database). > > > > > > Did you mean the pg_basebackup is using a logical replication > > > connection in this case? As far as I tested, even if we specify dbname > > > to the -d option of pg_basebackup, it uses a physical replication > > > connection. For example, it can take a backup even if I specify a > > > non-existing database name. > > > > > > > You are right. I misunderstood some part of the code in GetConnection. > > However, I think my point is still valid that if the user has provided > > dbname in the connection string it means that she wants that database > > entry to be looked upon not "replication" entry. > > > > > > > > > > > > > > But it's still just an idea and I might be missing something. And > > > > > given we're getting closer to the feature freeze, it would be a PG18 > > > > > item. > > > > > > > > > > > > > +1. At this stage, it is important to discuss whether we should allow > > > > pg_baseback to write dbname (either a specified one or a default one) > > > > along with other parameters in primary_conninfo? > > > > > > > > > > True. While I basically agree that pg_basebackup writes dbname in > > > primary_conninfo, I'm concerned that writing "dbname=replication" > > > could be problematic. Quoting the case 3) Vignesh summarized before: > > > > > > 3) ./pg_basebackup -d "user=vignesh" -D data -R > > > -> primary_conninfo = "dbname=replication" (In this case > > > primary_conninfo will have dbname as replication which is the default > > > value from GetConnection as connection string is specified) > > > > > > The primary_conninfo generated by pg_basebackup -R is now used by > > > either a walreceiver (for physical replication connection) or a > > > slotsync worker (for normal connection). The "dbname=replication" is > > > okay for walreceiver. On the other hand, as for the slotsync worker, > > > it can pass the CheckAndGetDbnameFromConninfo() check but it's very > > > likely that it cannot connect to the primary since most users won't > > > create a database with "replication" name. The user will end up > > > getting an error message like 'database "replication" does not exist' > > > but I'm not sure it would be informative for users. Rather, the error > > > message "slot synchronization requires dbname to be specified in > > > primary_conninfo" might be more informative for users. So I personally > > > like to omit the dbname if "dbname=replication", at this point. > > > > > > > How about if we write dbname in primary_conninfo to > > postgresql.auto.conf file only when the user has explicitly specified > > dbname in the connection string? To achieve this we need to somehow > > pass this information via PGconn (say by having a new bool variable > > dbname_specified) from GetConnection() or something like that? > > Here is a patch which will write dbname in the primary_conninfo only > if the database name is specified explicitly. I have added a new > function GetDbnameFromConnectionString which will return the dbname > specified in the connection and GenerateRecoveryConfig will append > this database name. > Here are the test results with the patch: > case 1: > pg_basebackup -D test10 -p 5431 -X s -P -R -d "user=vignesh dbname=postgres" > primary_conninfo will have dbname=postgres > > case 2: > pg_basebackup -D test10 -p 5431 -X s -P -R -d "user=vignesh dbname=replication" > primary_conninfo will have dbname=replication > > case 3: > pg_basebackup -D test10 -p 5431 -X s -P -R -U vignesh > primary_conninfo will not have dbname > > case 4: > pg_basebackup -D test10 -p 5431 -X s -P -R > primary_conninfo will not have dbname > > case 5: > pg_basebackup -D test10 -p 5431 -X s -P -R -d "user=vignesh" > primary_conninfo will not have dbname > > case 6: > pg_basebackup -D test10 -p 5431 -X s -P -R -d "" > primary_conninfo will not have dbname > > Thoughts? Thank you for updating the patch! This behavior makes sense to me. But do we want to handle the case of using environment variables too? IIUC, pg_basebackup -D tmp -d "user=masahiko dbname=test_db" is equivalent to: PGDATABASE="user=masahiko dbname=test_db" pg_basebackup -D tmp Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Dear Sawada-san, Thanks for giving comments! > This behavior makes sense to me. But do we want to handle the case of > using environment variables too? Yeah, v5 does not consider which libpq parameters are specified by environment variables. Such a variable should be used when the dbname is not expressly written in the connection string. Such a path was added in the v6 patch. If the dbname is not determined after parsing the connection string, we call PQconndefaults() to get settings from environment variables and service files [1], then start to search dbname again. Below shows an example. ``` PGPORT=5431 PGUSER=kuroda PGDATABASE=postgres pg_basebackup -D data_N2 -R -v -> primary_conninfo = 'user=kuroda ... port=5431 ... dbname=postgres ... ' ``` > IIUC, > > pg_basebackup -D tmp -d "user=masahiko dbname=test_db" > > is equivalent to: > > PGDATABASE="user=masahiko dbname=test_db" pg_basebackup -D tmp The case won't work. I think You assumed that expanded_dbname like PQconnectdbParams() [2] can be used for enviroment variables, but it is not correct - it won't parse as connection string again. In the libpq layer, connection parameters are parsed in PQconnectStartParams()->conninfo_array_parse(). When expand_dbname is specified, the entry "dbname" is firstly checked and parsed its value. They are done at fe-connect.c:5846. The environment variables are checked and parsed in conninfo_add_defaults(), which is called from conninfo_array_parse(). However, it is done at fe-connect.c:5956 - the expand_dbname has already been done at that time. This means there is no chance that PGDATABASE is parsed as an expanded style. For example, if the pg_basebackup runs like below: PGDATABASE="user=kuroda dbname=postgres" pg_basebackup -D data_N2 -R -v The primary_conninfo written in the file will be: primary_conninfo = 'user=hayato ... dbname=''user=kuroda dbname=postgres''' [1]: https://www.postgresql.org/docs/devel/libpq-pgservice.html [2]: https://www.postgresql.org/docs/devel/libpq-connect.html#LIBPQ-PQCONNECTDBPARAMS Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Вложения
On Tue, 19 Mar 2024 at 17:18, Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Sawada-san, > > Thanks for giving comments! > > > This behavior makes sense to me. But do we want to handle the case of > > using environment variables too? > > Yeah, v5 does not consider which libpq parameters are specified by environment > variables. Such a variable should be used when the dbname is not expressly written > in the connection string. > Such a path was added in the v6 patch. If the dbname is not determined after > parsing the connection string, we call PQconndefaults() to get settings from > environment variables and service files [1], then start to search dbname again. > Below shows an example. > > ``` > PGPORT=5431 PGUSER=kuroda PGDATABASE=postgres pg_basebackup -D data_N2 -R -v > -> > primary_conninfo = 'user=kuroda ... port=5431 ... dbname=postgres ... ' > ``` > > > IIUC, > > > > pg_basebackup -D tmp -d "user=masahiko dbname=test_db" > > > > is equivalent to: > > > > PGDATABASE="user=masahiko dbname=test_db" pg_basebackup -D tmp > > The case won't work. I think You assumed that expanded_dbname like > PQconnectdbParams() [2] can be used for enviroment variables, but it is not correct > - it won't parse as connection string again. > > In the libpq layer, connection parameters are parsed in PQconnectStartParams()->conninfo_array_parse(). > When expand_dbname is specified, the entry "dbname" is firstly checked and > parsed its value. They are done at fe-connect.c:5846. > > The environment variables are checked and parsed in conninfo_add_defaults(), which > is called from conninfo_array_parse(). However, it is done at fe-connect.c:5956 - the > expand_dbname has already been done at that time. This means there is no chance > that PGDATABASE is parsed as an expanded style. > > For example, if the pg_basebackup runs like below: > > PGDATABASE="user=kuroda dbname=postgres" pg_basebackup -D data_N2 -R -v > > The primary_conninfo written in the file will be: > > primary_conninfo = 'user=hayato ... dbname=''user=kuroda dbname=postgres''' Thanks for the patch. Here are the test results for various tests by specifying connection string, environment variable, service file, and connection URIs with the patch: case 1: pg_basebackup -D test10 -p 5431 -X s -P -R -d "user=vignesh dbname=db1" primary_conninfo will have dbname=db1 case 2: pg_basebackup -D test10 -p 5431 -X s -P -R -d "user=vignesh dbname=replication" primary_conninfo will have dbname=replication case 3: pg_basebackup -D test10 -p 5431 -X s -P -R -U vignesh primary_conninfo will not have dbname case 4: pg_basebackup -D test10 -p 5431 -X s -P -R primary_conninfo will not have dbname case 5: pg_basebackup -D test10 -p 5431 -X s -P -R -d "user=vignesh" primary_conninfo will not have dbname case 6: pg_basebackup -D test10 -p 5431 -X s -P -R -d "" primary_conninfo will not have dbname --- Testing through PGDATABASE environment variable case 7: export PGDATABASE="user=postgres dbname=test" ./pg_basebackup -D test10 -p 5431 -X s -P -R primary_conninfo will have dbname=''user=postgres dbname=test'' like below: primary_conninfo = 'user=vignesh passfile=''/home/vignesh/.pgpass'' channel_binding=prefer port=5431 sslmode=prefer sslcompression=0 sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres gssdelegation=0 target_session_attrs=any load_balance_hosts=disable dbname=''user=postgres dbname=test''' case 8: export PGDATABASE=db1 ./pg_basebackup -D test10 -p 5431 -X s -P -R primary_conninfo will have dbname=db1 --- Testing through pg_service case 9: Create .pg_service.conf with the following info: [conn1] dbname=db2 export PGSERVICE=conn1 ./pg_basebackup -D test10 -p 5431 -X s -P -R primary_conninfo will have dbname=db2 case 10: Create .pg_service.conf with the following info, i.e. there is no database specified: [conn1] ./pg_basebackup -D test10 -p 5431 -X s -P -R primary_conninfo will not have dbname --- Testing through Connection URIs case 11: ./pg_basebackup -D test10 -X s -P -R -d "postgresql://localhost:5431" primary_conninfo will not have dbname case 12: ./pg_basebackup -D test10 -p 5431 -X s -P -R -d "postgresql://localhost/db3:5431" primary_conninfo will have dbname=''db3:5431'' like below: primary_conninfo = 'user=vignesh passfile=''/home/vignesh/.pgpass'' channel_binding=prefer host=localhost port=5431 sslmode=prefer sslcompression=0 sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres gssdelegation=0 target_session_attrs=any load_balance_hosts=disable dbname=''db3:5431''' case 13: ./pg_basebackup -D test10 -p 5431 -X s -P -R -d "postgresql://localhost/db3" primary_conninfo will have dbname=db3 case 14: ./pg_basebackup -D test10 -X s -P -R -d "postgresql://localhost:5431/db3" primary_conninfo will have dbname=db3 case 15: ./pg_basebackup -D test10 -X s -P -R -d "postgresql://localhost:5431/db4,127.0.0.1:5431/db5" primary_conninfo will have dbname=''db4,127.0.0.1:5431/db5'' like below: primary_conninfo = 'user=vignesh passfile=''/home/vignesh/.pgpass'' channel_binding=prefer host=localhost port=5431 sslmode=prefer sslcompression=0 sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres gssdelegation=0 target_session_attrs=any load_balance_hosts=disable dbname=''db4,127.0.0.1:5431/db5''' case 16: ./pg_basebackup -D test10 -X s -P -R -d "postgresql://localhost:5431,127.0.0.1:5431/db5" primary_conninfo will have dbname=db5 case 17: ./pg_basebackup -D test10 -X s -P -R -d "postgresql:///db6?host=localhost&port=5431" primary_conninfo will have dbname=db6 case 18: ./pg_basebackup -D test10 -p 5431 -X s -P -R -d "postgresql:///db7?host=/home/vignesh/postgres/inst/bin" primary_conninfo will have dbname=db7 case 19: ./pg_basebackup -D test10 -p 5431 -X s -P -R -d "postgresql:///db8?host=%2Fhome%2Fvignesh%2Fpostgres%2Finst%2Fbin" primary_conninfo will have dbname=db8 In these cases, the database name specified will be written to the conf file. The test results look good to me. Regards, Vignesh
On Tue, Mar 19, 2024 at 5:18 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Thanks for giving comments! > > > This behavior makes sense to me. But do we want to handle the case of > > using environment variables too? > > Yeah, v5 does not consider which libpq parameters are specified by environment > variables. Such a variable should be used when the dbname is not expressly written > in the connection string. > Such a path was added in the v6 patch. If the dbname is not determined after > parsing the connection string, we call PQconndefaults() to get settings from > environment variables and service files [1], then start to search dbname again. > The functionality implemented by the patch looks good to me. I have made minor modifications in the function names, error handling, comments, and doc updates in the attached patch. Let me know what you think of the attached. -- With Regards, Amit Kapila.
Вложения
On Tue, Mar 19, 2024 at 8:48 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Sawada-san, > > Thanks for giving comments! > > > This behavior makes sense to me. But do we want to handle the case of > > using environment variables too? > > Yeah, v5 does not consider which libpq parameters are specified by environment > variables. Such a variable should be used when the dbname is not expressly written > in the connection string. > Such a path was added in the v6 patch. If the dbname is not determined after > parsing the connection string, we call PQconndefaults() to get settings from > environment variables and service files [1], then start to search dbname again. > Below shows an example. Thank you for updating the patch! > > ``` > PGPORT=5431 PGUSER=kuroda PGDATABASE=postgres pg_basebackup -D data_N2 -R -v > -> > primary_conninfo = 'user=kuroda ... port=5431 ... dbname=postgres ... ' > ``` > > > IIUC, > > > > pg_basebackup -D tmp -d "user=masahiko dbname=test_db" > > > > is equivalent to: > > > > PGDATABASE="user=masahiko dbname=test_db" pg_basebackup -D tmp > > The case won't work. I think You assumed that expanded_dbname like > PQconnectdbParams() [2] can be used for enviroment variables, but it is not correct > - it won't parse as connection string again. > > In the libpq layer, connection parameters are parsed in PQconnectStartParams()->conninfo_array_parse(). > When expand_dbname is specified, the entry "dbname" is firstly checked and > parsed its value. They are done at fe-connect.c:5846. > > The environment variables are checked and parsed in conninfo_add_defaults(), which > is called from conninfo_array_parse(). However, it is done at fe-connect.c:5956 - the > expand_dbname has already been done at that time. This means there is no chance > that PGDATABASE is parsed as an expanded style. > Thank you for pointing it out. I tested the use of PGDATABASE with pg_basebackup and somehow missed the fact you explained. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Wed, Mar 20, 2024 at 2:24 PM vignesh C <vignesh21@gmail.com> wrote: > > On Tue, 19 Mar 2024 at 17:18, Hayato Kuroda (Fujitsu) > <kuroda.hayato@fujitsu.com> wrote: > > > > Dear Sawada-san, > > > > Thanks for giving comments! > > > > > This behavior makes sense to me. But do we want to handle the case of > > > using environment variables too? > > > > Yeah, v5 does not consider which libpq parameters are specified by environment > > variables. Such a variable should be used when the dbname is not expressly written > > in the connection string. > > Such a path was added in the v6 patch. If the dbname is not determined after > > parsing the connection string, we call PQconndefaults() to get settings from > > environment variables and service files [1], then start to search dbname again. > > Below shows an example. > > > > ``` > > PGPORT=5431 PGUSER=kuroda PGDATABASE=postgres pg_basebackup -D data_N2 -R -v > > -> > > primary_conninfo = 'user=kuroda ... port=5431 ... dbname=postgres ... ' > > ``` > > > > > IIUC, > > > > > > pg_basebackup -D tmp -d "user=masahiko dbname=test_db" > > > > > > is equivalent to: > > > > > > PGDATABASE="user=masahiko dbname=test_db" pg_basebackup -D tmp > > > > The case won't work. I think You assumed that expanded_dbname like > > PQconnectdbParams() [2] can be used for enviroment variables, but it is not correct > > - it won't parse as connection string again. > > > > In the libpq layer, connection parameters are parsed in PQconnectStartParams()->conninfo_array_parse(). > > When expand_dbname is specified, the entry "dbname" is firstly checked and > > parsed its value. They are done at fe-connect.c:5846. > > > > The environment variables are checked and parsed in conninfo_add_defaults(), which > > is called from conninfo_array_parse(). However, it is done at fe-connect.c:5956 - the > > expand_dbname has already been done at that time. This means there is no chance > > that PGDATABASE is parsed as an expanded style. > > > > For example, if the pg_basebackup runs like below: > > > > PGDATABASE="user=kuroda dbname=postgres" pg_basebackup -D data_N2 -R -v > > > > The primary_conninfo written in the file will be: > > > > primary_conninfo = 'user=hayato ... dbname=''user=kuroda dbname=postgres''' > > Thanks for the patch. > Here are the test results for various tests by specifying connection > string, environment variable, service file, and connection URIs with > the patch: > case 1: > pg_basebackup -D test10 -p 5431 -X s -P -R -d "user=vignesh dbname=db1" > primary_conninfo will have dbname=db1 > > case 2: > pg_basebackup -D test10 -p 5431 -X s -P -R -d "user=vignesh dbname=replication" > primary_conninfo will have dbname=replication > > case 3: > pg_basebackup -D test10 -p 5431 -X s -P -R -U vignesh > primary_conninfo will not have dbname > > case 4: > pg_basebackup -D test10 -p 5431 -X s -P -R > primary_conninfo will not have dbname > > case 5: > pg_basebackup -D test10 -p 5431 -X s -P -R -d "user=vignesh" > primary_conninfo will not have dbname > > case 6: > pg_basebackup -D test10 -p 5431 -X s -P -R -d "" > primary_conninfo will not have dbname > > --- Testing through PGDATABASE environment variable > case 7: > export PGDATABASE="user=postgres dbname=test" > ./pg_basebackup -D test10 -p 5431 -X s -P -R > primary_conninfo will have dbname=''user=postgres dbname=test'' like below: > primary_conninfo = 'user=vignesh passfile=''/home/vignesh/.pgpass'' > channel_binding=prefer port=5431 sslmode=prefer sslcompression=0 > sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2 > gssencmode=disable krbsrvname=postgres gssdelegation=0 > target_session_attrs=any load_balance_hosts=disable > dbname=''user=postgres dbname=test''' > > case 8: > export PGDATABASE=db1 > ./pg_basebackup -D test10 -p 5431 -X s -P -R > primary_conninfo will have dbname=db1 > > --- Testing through pg_service > case 9: > Create .pg_service.conf with the following info: > [conn1] > dbname=db2 > > export PGSERVICE=conn1 > > ./pg_basebackup -D test10 -p 5431 -X s -P -R > primary_conninfo will have dbname=db2 > > case 10: > Create .pg_service.conf with the following info, i.e. there is no > database specified: > [conn1] > > ./pg_basebackup -D test10 -p 5431 -X s -P -R > primary_conninfo will not have dbname > > --- Testing through Connection URIs > case 11: > ./pg_basebackup -D test10 -X s -P -R -d "postgresql://localhost:5431" > primary_conninfo will not have dbname > > case 12: > ./pg_basebackup -D test10 -p 5431 -X s -P -R -d > "postgresql://localhost/db3:5431" > primary_conninfo will have dbname=''db3:5431'' like below: > primary_conninfo = 'user=vignesh passfile=''/home/vignesh/.pgpass'' > channel_binding=prefer host=localhost port=5431 sslmode=prefer > sslcompression=0 sslcertmode=allow sslsni=1 > ssl_min_protocol_version=TLSv1.2 gssencmode=disable > krbsrvname=postgres gssdelegation=0 target_session_attrs=any > load_balance_hosts=disable dbname=''db3:5431''' > > case 13: > ./pg_basebackup -D test10 -p 5431 -X s -P -R -d "postgresql://localhost/db3" > primary_conninfo will have dbname=db3 > > case 14: > ./pg_basebackup -D test10 -X s -P -R -d "postgresql://localhost:5431/db3" > primary_conninfo will have dbname=db3 > > case 15: > ./pg_basebackup -D test10 -X s -P -R -d > "postgresql://localhost:5431/db4,127.0.0.1:5431/db5" > primary_conninfo will have dbname=''db4,127.0.0.1:5431/db5'' like below: > primary_conninfo = 'user=vignesh passfile=''/home/vignesh/.pgpass'' > channel_binding=prefer host=localhost port=5431 sslmode=prefer > sslcompression=0 sslcertmode=allow sslsni=1 > ssl_min_protocol_version=TLSv1.2 gssencmode=disable > krbsrvname=postgres gssdelegation=0 target_session_attrs=any > load_balance_hosts=disable dbname=''db4,127.0.0.1:5431/db5''' > > case 16: > ./pg_basebackup -D test10 -X s -P -R -d > "postgresql://localhost:5431,127.0.0.1:5431/db5" > primary_conninfo will have dbname=db5 > > case 17: > ./pg_basebackup -D test10 -X s -P -R -d > "postgresql:///db6?host=localhost&port=5431" > primary_conninfo will have dbname=db6 > > case 18: > ./pg_basebackup -D test10 -p 5431 -X s -P -R -d > "postgresql:///db7?host=/home/vignesh/postgres/inst/bin" > primary_conninfo will have dbname=db7 > > case 19: > ./pg_basebackup -D test10 -p 5431 -X s -P -R -d > "postgresql:///db8?host=%2Fhome%2Fvignesh%2Fpostgres%2Finst%2Fbin" > primary_conninfo will have dbname=db8 > > In these cases, the database name specified will be written to the > conf file. The test results look good to me. Thank you for the tests! These results look good to me too. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Wed, 20 Mar 2024 at 17:09, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Mar 19, 2024 at 5:18 PM Hayato Kuroda (Fujitsu) > <kuroda.hayato@fujitsu.com> wrote: > > > > Thanks for giving comments! > > > > > This behavior makes sense to me. But do we want to handle the case of > > > using environment variables too? > > > > Yeah, v5 does not consider which libpq parameters are specified by environment > > variables. Such a variable should be used when the dbname is not expressly written > > in the connection string. > > Such a path was added in the v6 patch. If the dbname is not determined after > > parsing the connection string, we call PQconndefaults() to get settings from > > environment variables and service files [1], then start to search dbname again. > > > > The functionality implemented by the patch looks good to me. I have > made minor modifications in the function names, error handling, > comments, and doc updates in the attached patch. Let me know what you > think of the attached. While reviewing, I found the following changes could be done: a) we can add one test in 010_pg_basebackup.pl to verify the change b) Here two different styles of linking is used in the document, we can try to keep it same: + streaming replication and <link linkend="logicaldecoding-replication-slots-synchronization"> + logical replication slot synchronization</link> will use the same + settings later on. The dbname will be recorded only if the dbname was + specified explicitly in the connection string or environment variable + (see <xref linkend="libpq-envars"/>). The updated patch has the changes for the same. Regards, Vignesh