Обсуждение: Report search_path value back to the client.
Hi,
As of now postgres is reporting only "really" important variables, but among themthere are also not so important, like 'application_name'. The only reason to report
it, was: "help pgbouncer, and perhaps other connection poolers".
Change was introduced by commit: 59ed94ad0c9f74a3f057f359316c845cedc4461e
This fact makes me wonder, why 'search_path' value is not reported back to the
client? Use-case is absolutely the same as with 'application_name' but a little bit
client? Use-case is absolutely the same as with 'application_name' but a little bit
more important.
Our databases provides different version of stored procedures which are located
in a different schemas: 'api_version1', 'api_version2', 'api_version5', etc...
When application establish connection to the database it set search_path value
for example to api_version1. At the same time, new version of the same application
will set search_path value to api_version2. Sometimes we have hundreds of
instances of applications which may use different versions of stored procedures
which are located in different schemas.
which are located in different schemas.
It's really crazy to keep so many (hundreds) connections to the database and
it would be much better to have something like pgbouncer in front of postgres.
Right now it's not possible, because pgbouncer is not aware of search_path
and it's not really possible to fix pgbouncer, because there is no easy way to
get current value of search_path.
get current value of search_path.
I would like to mark 'search_path' as GUC_REPORT:
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2904,7 +2904,7 @@ static struct config_string ConfigureNamesString[] =
{"search_path", PGC_USERSET, CLIENT_CONN_STATEMENT,
gettext_noop("Sets the schema search order for names that are not schema-qualified."),
NULL,
- GUC_LIST_INPUT | GUC_LIST_QUOTE
+ GUC_LIST_INPUT | GUC_LIST_QUOTE | GUC_REPORT
},
&namespace_search_path,
"\"$user\",public",
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2904,7 +2904,7 @@ static struct config_string ConfigureNamesString[] =
{"search_path", PGC_USERSET, CLIENT_CONN_STATEMENT,
gettext_noop("Sets the schema search order for names that are not schema-qualified."),
NULL,
- GUC_LIST_INPUT | GUC_LIST_QUOTE
+ GUC_LIST_INPUT | GUC_LIST_QUOTE | GUC_REPORT
},
&namespace_search_path,
"\"$user\",public",
What do you think?
Regards,
--
Alexander Kukushkin
Hi, On Tue, Dec 2, 2014 at 5:59 PM, Alexander Kukushkin <cyberdemn@gmail.com> wrote: > It's really crazy to keep so many (hundreds) connections to the database and > it would be much better to have something like pgbouncer in front of > postgres. > Right now it's not possible, because pgbouncer is not aware of search_path > and it's not really possible to fix pgbouncer, because there is no easy way > to > get current value of search_path. > > I would like to mark 'search_path' as GUC_REPORT: > --- a/src/backend/utils/misc/guc.c > +++ b/src/backend/utils/misc/guc.c > @@ -2904,7 +2904,7 @@ static struct config_string ConfigureNamesString[] = > {"search_path", PGC_USERSET, CLIENT_CONN_STATEMENT, > gettext_noop("Sets the schema search order for names > that are not schema-qualified."), > NULL, > - GUC_LIST_INPUT | GUC_LIST_QUOTE > + GUC_LIST_INPUT | GUC_LIST_QUOTE | GUC_REPORT > }, > &namespace_search_path, > "\"$user\",public", > I know quite a lot of companies using search_path to switch to the version of sprocs in the database without changes to the application. Not being able to use pgbouncer in such scenario is rather annoying. It would be really great to have this functionality in the upcoming 9.5 and not wait for another year for it. Given this is a one-liner, which doesn't introduce any new code, but one flag to the function call, would it be possible to review it as a part of the current commitfest? Kind regards, -- Alexey Klyukin
Alexey Klyukin <alexk@hintbits.com> writes: > On Tue, Dec 2, 2014 at 5:59 PM, Alexander Kukushkin <cyberdemn@gmail.com> wrote: >> I would like to mark 'search_path' as GUC_REPORT: > Given this is a one-liner, which doesn't introduce any new code, but > one flag to the function call, would it be > possible to review it as a part of the current commitfest? I'm against this on a couple of different grounds: 1. Performance. search_path is something that many applications change quite a lot, so reporting changes in it would create enormous network overhead. Consider for example that a SQL function might set it as part of a function SET clause, and that could be invoked thousands of times per query. 2. Semantics. The existing GUC_REPORT variables are all things that directly relate to client-visible behavior, eg how values of type timestamp will be interpreted and printed. search_path is no such thing, so it's hard to make a principled argument for reporting it that doesn't lead to the conclusion that you want *everything* reported. (In particular, I don't believe at all your argument that this would help pgbouncer significantly.) We could possibly alleviate problem #1 by changing the behavior of guc.c so it doesn't report every single transition of flagged variables, but only (say) once just before ReadyForQuery if the variable changed in the just-finished command. That's not exactly a one-line fix though. regards, tom lane
Hi,
2015-02-20 16:19 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
2. Semantics. The existing GUC_REPORT variables are all things that
directly relate to client-visible behavior, eg how values of type
timestamp will be interpreted and printed. search_path is no such thing,
so it's hard to make a principled argument for reporting it that doesn't
lead to the conclusion that you want *everything* reported. (In
particular, I don't believe at all your argument that this would help
pgbouncer significantly.)
Well, I will try to explain in details why do we want to use pgbouncer in front of postgres and what stops us from doing this.
Usually our applications don't access data directly but via stored procedures. One database can be accessed by different applications in different versions. Different versions of applications may use different sets and versions of stored procedures. It achieved by keeping stored procedures in different schemas (we call them API schema). For example at the same time 'app-v1' is working with stored procedures located in schema 'app_api_v1', 'app-v2' with 'app_api_v2' and 'other-app-v3' is working with 'other_app_api_v3'. This is very convenient. Before deploying new version of application we create new 'api schema' for it and then create all stored procedures in this schema. Directly after connecting to database application executes 'set search_path to app_api_vX, public' and after this point it always call stored procedures without specifying schema name. This way we are able to keep different versions of the same stored procedure in database.
And here the problem come. Sometimes we have dozens of application instances. Each application instance might keep more than one connection to database. In total there can be hundreds connections but most of the time they are not in state 'active'. For such cases connection pools have been invented. It's cheep to keep even thousands open connections to event-based connection pool comparing with the real database. The first idea that comes into mind - lets put pgbouncer with pool_mode = transaction in front of database. But here our good old friend 'set search_path to app_api_vX, public' does weird things with pgbouncer. The same server connection might be reused by different applications in different versions and soon or later application will get: 'ERROR: function foo() does not exist' because search_path for this connection between pgbouncer and postgres has been changed by another application.
My first idea was - it should be possible to fix pgbouncer and make it aware of current search_path value of connections between application <-> pgbouncer and pgbouncer <-> postgres. I started to look into it's source code and yea, this can be done relatively easy, because pgbouncer already does similar job for "client_encoding", "DateStyle", "TimeZone", "standard_conforming_strings" and "application_name", which doesn't relate to the 'application visible behaviour' more than the search_path. Pgbouncer always keeps current values of these variables for each connection and when it takes a connection from the pool, it compares the values of client_connection and server_connection parameters. If they don't match - it changes the value on the server side by calling 'set variable to client_value'.
Execution of statement 'set some_variable to some_value' from the client side is not the unique way to change value of variable. As you already mentioned such statement can be executed from stored procedure as well. So scanning of client statements is not an option. Only postgres is able to tell at any time current value of search_path and asking it every time for current value also is not an option, because this will double amount of queries executed by database. Luckily, postgres already has this wonderful mechanism as GUC_REPORT...
If the postgres will report the value of search_path on connect and on change - pgbouncer could be really easy patched:
diff --git a/include/varcache.h b/include/varcache.h
index 4984b01..916fa01 100644
--- a/include/varcache.h
+++ b/include/varcache.h
@@ -5,6 +5,7 @@ enum VarCacheIdx {
VTimeZone,
VStdStr,
VAppName,
+ VSearchPath,
NumVars
};
diff --git a/lib b/lib
index fe4dd02..c0111d4 160000
--- a/lib
+++ b/lib
@@ -1 +1 @@
-Subproject commit fe4dd0255ea796e64599c0fc28e6900fbc07e6aa
+Subproject commit c0111d42712202df5ccb01c4e07e88ab5c8b94b8
diff --git a/src/varcache.c b/src/varcache.c
index 6321dc5..cb3f559 100644
--- a/src/varcache.c
+++ b/src/varcache.c
@@ -35,6 +35,7 @@ static const struct var_lookup lookup [] = {
{"TimeZone", VTimeZone },
{"standard_conforming_strings", VStdStr },
{"application_name", VAppName },
+ {"search_path", VSearchPath },
{NULL},
};
diff --git a/include/varcache.h b/include/varcache.h
index 4984b01..916fa01 100644
--- a/include/varcache.h
+++ b/include/varcache.h
@@ -5,6 +5,7 @@ enum VarCacheIdx {
VTimeZone,
VStdStr,
VAppName,
+ VSearchPath,
NumVars
};
diff --git a/lib b/lib
index fe4dd02..c0111d4 160000
--- a/lib
+++ b/lib
@@ -1 +1 @@
-Subproject commit fe4dd0255ea796e64599c0fc28e6900fbc07e6aa
+Subproject commit c0111d42712202df5ccb01c4e07e88ab5c8b94b8
diff --git a/src/varcache.c b/src/varcache.c
index 6321dc5..cb3f559 100644
--- a/src/varcache.c
+++ b/src/varcache.c
@@ -35,6 +35,7 @@ static const struct var_lookup lookup [] = {
{"TimeZone", VTimeZone },
{"standard_conforming_strings", VStdStr },
{"application_name", VAppName },
+ {"search_path", VSearchPath },
{NULL},
};
We could possibly alleviate problem #1 by changing the behavior of guc.c
so it doesn't report every single transition of flagged variables, but
only (say) once just before ReadyForQuery if the variable changed in
the just-finished command. That's not exactly a one-line fix though.
Probably for some variables this really make sense. Inside stored procedures any of GUC_REPORT variables can be changed the same way as search_path (thousands of times), but not all of these variables directly relate to the application visible behaviour.
Regards,
Alexander Kukushkin
I wanted to revive this thread, since it's by far one of the most common foot guns that people run into with PgBouncer. Almost all session level SET commands leak across transactions, but SET search_path is by far the one with the biggest impact when it is not the setting that you expect. As well as being a very common setting to change. In the Citus extension we actually change search_path to be GUC_REPORT so that PgBouncer can track it, because otherwise Citus its schema based sharding starts breaking completely when using PgBouncer. [1] On Fri, 3 Nov 2023 at 09:38, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm against this on a couple of different grounds: > > 1. Performance. ... > > 2. Semantics. The existing GUC_REPORT variables are all things that > directly relate to client-visible behavior, eg how values of type > timestamp will be interpreted and printed. search_path is no such thing, > > We could possibly alleviate problem #1 by changing the behavior of guc.c > so it doesn't report every single transition of flagged variables, but > only (say) once just before ReadyForQuery if the variable changed in > the just-finished command. That's not exactly a one-line fix though. The proposed fix for #1 has been committed since PG14 in 2432b1a04087edc2fd9536c7c9aa4ca03fd1b363 So that only leaves #2. I think search_path can arguably be considered to be client visible, because it changes how the client and its queries are parsed by Postgres. And even if you don't agree with that argument, it's simply not true that the only GUCs that are GUC_REPORT are related to client-visible behaviour. Things like application_name and is_superuser are also GUC_REPORT [2]. > so it's hard to make a principled argument for reporting it that doesn't > lead to the conclusion that you want *everything* reported. (In > particular, I don't believe at all your argument that this would help > pgbouncer significantly.) To be clear, I would like it to be configurable which settings are GUC_REPORT. Because there are others that are useful for PgBouncer to track (e.g. statement_timeout and lock_timeout) That's why I've been active on the thread proposing such a change [3]. But that thread has not been getting a lot of attention, I guess because it's a large change that includes protocol changes. So that's why I'm reviving this thread again. Since search_path is by far the most painful setting for PgBouncer users. A common foot-gun is that running pg_dump causes breakage for other clients, because its "SET search_path" is leaked to the next transaction [4]. [1]: https://github.com/citusdata/citus/blob/21646ca1e96175370be1472a14d5ab1baa55b471/src/backend/distributed/shared_library_init.c#L2686-L2695 [2]: https://www.postgresql.org/docs/15/protocol-flow.html#PROTOCOL-ASYNC [3]: https://www.postgresql.org/message-id/flat/0a6f5e6b-65b3-4272-260d-a17ce8f5b3a4%40eisentraut.org#bd6d06db7db9e4ded8100e4160050b17 [4]: https://github.com/pgbouncer/pgbouncer/issues/452
For completeness attached is a tiny patch implementing this, so this thread can be added to the commit fest.