Обсуждение: Report search_path value back to the client.

Поиск
Список
Период
Сортировка

Report search_path value back to the client.

От
Alexander Kukushkin
Дата:
Hi,

As of now postgres is reporting only "really" important variables, but among them
there are also not so important, like 'application_name'. The only reason to report
it, was: "help pgbouncer, and perhaps other connection poolers".
Change was introduced by commit: 59ed94ad0c9f74a3f057f359316c845cedc4461e

This fact makes me wonder, why 'search_path' value is not reported back to the
client? Use-case is absolutely the same as with 'application_name' but a little bit
more important.

Our databases provides different version of stored procedures which are located
in a different schemas: 'api_version1', 'api_version2', 'api_version5', etc...
When application establish connection to the database it set search_path value
for example to api_version1. At the same time, new version of the same application
will set search_path value to api_version2. Sometimes we have hundreds of
instances of applications which may use different versions of stored procedures
which are located in different schemas.

It's really crazy to keep so many (hundreds) connections to the database and
it would be much better to have something like pgbouncer in front of postgres.
Right now it's not possible, because pgbouncer is not aware of search_path
and it's not really possible to fix pgbouncer, because there is no easy way to
get current value of search_path.

I would like to mark 'search_path' as GUC_REPORT:
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2904,7 +2904,7 @@ static struct config_string ConfigureNamesString[] =
                {"search_path", PGC_USERSET, CLIENT_CONN_STATEMENT,
                        gettext_noop("Sets the schema search order for names that are not schema-qualified."),
                        NULL,
-                       GUC_LIST_INPUT | GUC_LIST_QUOTE
+                       GUC_LIST_INPUT | GUC_LIST_QUOTE | GUC_REPORT
                },
                &namespace_search_path,
                "\"$user\",public",

What do you think?

Regards,
--
Alexander Kukushkin

Re: Report search_path value back to the client.

От
Alexey Klyukin
Дата:
Hi,

On Tue, Dec 2, 2014 at 5:59 PM, Alexander Kukushkin <cyberdemn@gmail.com> wrote:

> It's really crazy to keep so many (hundreds) connections to the database and
> it would be much better to have something like pgbouncer in front of
> postgres.
> Right now it's not possible, because pgbouncer is not aware of search_path
> and it's not really possible to fix pgbouncer, because there is no easy way
> to
> get current value of search_path.
>
> I would like to mark 'search_path' as GUC_REPORT:
> --- a/src/backend/utils/misc/guc.c
> +++ b/src/backend/utils/misc/guc.c
> @@ -2904,7 +2904,7 @@ static struct config_string ConfigureNamesString[] =
>                 {"search_path", PGC_USERSET, CLIENT_CONN_STATEMENT,
>                         gettext_noop("Sets the schema search order for names
> that are not schema-qualified."),
>                         NULL,
> -                       GUC_LIST_INPUT | GUC_LIST_QUOTE
> +                       GUC_LIST_INPUT | GUC_LIST_QUOTE | GUC_REPORT
>                 },
>                 &namespace_search_path,
>                 "\"$user\",public",
>

I know quite a lot of companies using search_path to switch to the
version of sprocs in the database
without changes to the application. Not being able to use pgbouncer in
such scenario is rather annoying.

It would be really great to have this functionality in the upcoming
9.5 and not wait for another year for it.

Given this is a one-liner, which doesn't introduce any new code, but
one flag to the function call, would it be
possible to review it as a part of the current commitfest?

Kind regards,
-- 
Alexey Klyukin



Re: Report search_path value back to the client.

От
Tom Lane
Дата:
Alexey Klyukin <alexk@hintbits.com> writes:
> On Tue, Dec 2, 2014 at 5:59 PM, Alexander Kukushkin <cyberdemn@gmail.com> wrote:
>> I would like to mark 'search_path' as GUC_REPORT:

> Given this is a one-liner, which doesn't introduce any new code, but
> one flag to the function call, would it be
> possible to review it as a part of the current commitfest?

I'm against this on a couple of different grounds:

1. Performance.  search_path is something that many applications change
quite a lot, so reporting changes in it would create enormous network
overhead.  Consider for example that a SQL function might set it as
part of a function SET clause, and that could be invoked thousands of
times per query.

2. Semantics.  The existing GUC_REPORT variables are all things that
directly relate to client-visible behavior, eg how values of type
timestamp will be interpreted and printed.  search_path is no such thing,
so it's hard to make a principled argument for reporting it that doesn't
lead to the conclusion that you want *everything* reported.  (In
particular, I don't believe at all your argument that this would help
pgbouncer significantly.)

We could possibly alleviate problem #1 by changing the behavior of guc.c
so it doesn't report every single transition of flagged variables, but
only (say) once just before ReadyForQuery if the variable changed in
the just-finished command.  That's not exactly a one-line fix though.
        regards, tom lane



Re: Report search_path value back to the client.

От
Alexander Kukushkin
Дата:
Hi,

2015-02-20 16:19 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
2. Semantics.  The existing GUC_REPORT variables are all things that
directly relate to client-visible behavior, eg how values of type
timestamp will be interpreted and printed.  search_path is no such thing,
so it's hard to make a principled argument for reporting it that doesn't
lead to the conclusion that you want *everything* reported.  (In
particular, I don't believe at all your argument that this would help
pgbouncer significantly.)
 
Well, I will try to explain in details why do we want to use pgbouncer in front of postgres and what stops us from doing this.
Usually our applications don't access data directly but via stored procedures. One database can be accessed by different applications in different versions. Different versions of applications may use different sets and versions of stored procedures. It achieved by keeping stored procedures in different schemas (we call them API schema). For example at the same time 'app-v1' is working with stored procedures located in schema 'app_api_v1', 'app-v2' with 'app_api_v2' and 'other-app-v3' is working with 'other_app_api_v3'. This is very convenient. Before deploying new version of application we create new 'api schema' for it and then create all stored procedures in this schema. Directly after connecting to database application executes 'set search_path to app_api_vX, public' and after this point it always call stored procedures without specifying schema name. This way we are able to keep different versions of the same stored procedure in database.
And here the problem come. Sometimes we have dozens of application instances. Each application instance might keep more than one connection to database. In total there can be hundreds connections but most of the time they are not in state 'active'. For such cases connection pools have been invented. It's cheep to keep even thousands open connections to event-based connection pool comparing with the real database. The first idea that comes into mind - lets put pgbouncer with pool_mode = transaction in front of database. But here our good old friend 'set search_path to app_api_vX, public' does weird things with pgbouncer. The same server connection might be reused by different applications in different versions and soon or later application will get: 'ERROR:  function foo() does not exist' because search_path for this connection between pgbouncer and postgres has been changed by another application.

My first idea was - it should be possible to fix pgbouncer and make it aware of current search_path value of connections between application <-> pgbouncer and pgbouncer <-> postgres. I started to look into it's source code and yea, this can be done relatively easy, because pgbouncer already does similar job for "client_encoding", "DateStyle", "TimeZone", "standard_conforming_strings" and "application_name", which doesn't relate to the 'application visible behaviour' more than the search_path. Pgbouncer always keeps current values of these variables for each connection and when it takes a connection from the pool, it compares the values of client_connection and server_connection parameters. If they don't match - it changes the value on the server side by calling 'set variable to client_value'.

Execution of statement 'set some_variable to some_value' from the client side is not the unique way to change value of variable. As you already mentioned such statement can be executed from stored procedure as well. So scanning of client statements is not an option. Only postgres is able to tell at any time current value of search_path and asking it every time for current value also is not an option, because this will double amount of queries executed by database. Luckily, postgres already has this wonderful mechanism as GUC_REPORT...

If the postgres will report the value of search_path on connect and on change - pgbouncer could be really easy patched:

diff --git a/include/varcache.h b/include/varcache.h
index 4984b01..916fa01 100644
--- a/include/varcache.h
+++ b/include/varcache.h
@@ -5,6 +5,7 @@ enum VarCacheIdx {
        VTimeZone,
        VStdStr,
        VAppName,
+       VSearchPath,
        NumVars
 };
 
diff --git a/lib b/lib
index fe4dd02..c0111d4 160000
--- a/lib
+++ b/lib
@@ -1 +1 @@
-Subproject commit fe4dd0255ea796e64599c0fc28e6900fbc07e6aa
+Subproject commit c0111d42712202df5ccb01c4e07e88ab5c8b94b8
diff --git a/src/varcache.c b/src/varcache.c
index 6321dc5..cb3f559 100644
--- a/src/varcache.c
+++ b/src/varcache.c
@@ -35,6 +35,7 @@ static const struct var_lookup lookup [] = {
  {"TimeZone",                    VTimeZone },
  {"standard_conforming_strings", VStdStr },
  {"application_name",            VAppName },
+ {"search_path",                 VSearchPath },
  {NULL},
 };

We could possibly alleviate problem #1 by changing the behavior of guc.c
so it doesn't report every single transition of flagged variables, but
only (say) once just before ReadyForQuery if the variable changed in
the just-finished command.  That's not exactly a one-line fix though.

Probably for some variables this really make sense. Inside stored procedures any of GUC_REPORT variables can be changed the same way as search_path (thousands of times), but not all of these variables directly relate to the application visible behaviour.

Regards,
Alexander Kukushkin

Re: Report search_path value back to the client.

От
Jelte Fennema-Nio
Дата:
I wanted to revive this thread, since it's by far one of the most
common foot guns that people run into with PgBouncer. Almost all
session level SET commands leak across transactions, but SET
search_path is by far the one with the biggest impact when it is not
the setting that you expect. As well as being a very common setting to
change. In the Citus extension we actually change search_path to be
GUC_REPORT so that PgBouncer can track it, because otherwise Citus its
schema based sharding starts breaking completely when using PgBouncer.
[1]

On Fri, 3 Nov 2023 at 09:38, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm against this on a couple of different grounds:
>
> 1. Performance.  ...
>
> 2. Semantics.  The existing GUC_REPORT variables are all things that
> directly relate to client-visible behavior, eg how values of type
> timestamp will be interpreted and printed.  search_path is no such thing,

>
> We could possibly alleviate problem #1 by changing the behavior of guc.c
> so it doesn't report every single transition of flagged variables, but
> only (say) once just before ReadyForQuery if the variable changed in
> the just-finished command.  That's not exactly a one-line fix though.

The proposed fix for #1 has been committed since PG14 in
2432b1a04087edc2fd9536c7c9aa4ca03fd1b363

So that only leaves #2. I think search_path can arguably be considered
to be client visible, because it changes how the client and its
queries are parsed by Postgres. And even if you don't agree with that
argument, it's simply not true that the only GUCs that are GUC_REPORT
are related to client-visible behaviour. Things like application_name
and is_superuser are also GUC_REPORT [2].

> so it's hard to make a principled argument for reporting it that doesn't
> lead to the conclusion that you want *everything* reported.  (In
> particular, I don't believe at all your argument that this would help
> pgbouncer significantly.)

To be clear, I would like it to be configurable which settings are
GUC_REPORT. Because there are others that are useful for PgBouncer to
track (e.g. statement_timeout and lock_timeout) That's why I've been
active on the thread proposing such a change [3]. But that thread has
not been getting a lot of attention, I guess because it's a large
change that includes protocol changes. So that's why I'm reviving this
thread again. Since search_path is by far the most painful setting for
PgBouncer users. A common foot-gun is that running pg_dump causes
breakage for other clients, because its "SET search_path" is leaked to
the next transaction [4].

[1]:
https://github.com/citusdata/citus/blob/21646ca1e96175370be1472a14d5ab1baa55b471/src/backend/distributed/shared_library_init.c#L2686-L2695
[2]: https://www.postgresql.org/docs/15/protocol-flow.html#PROTOCOL-ASYNC
[3]:
https://www.postgresql.org/message-id/flat/0a6f5e6b-65b3-4272-260d-a17ce8f5b3a4%40eisentraut.org#bd6d06db7db9e4ded8100e4160050b17
[4]: https://github.com/pgbouncer/pgbouncer/issues/452



Re: Report search_path value back to the client.

От
Jelte Fennema-Nio
Дата:
For completeness attached is a tiny patch implementing this, so this
thread can be added to the commit fest.

Вложения