Обсуждение: jsonb, collection & postgres_fdw
Hi hackers, Right now jsonb functions are treated as non-shippable by postgres_fdw and so predicates with them are not pushed down to foreign server: create table jt(content jsonb); create extension postgres_fdw; create server pg_fdw FOREIGN DATA WRAPPER postgres_fdw options(host '127.0.0.1', dbname 'postgres'); create user mapping for current_user server pg_fdw options (user 'postgres'); create foreign table fjt(content jsonb) server pg_fdw options (table_name 'jt'); postgres=# explain select * from fjt where jsonb_exists(content, 'some'); QUERY PLAN -------------------------------------------------------------- Foreign Scan on fjt (cost=100.00..157.50 rows=487 width=32) Filter: jsonb_exists(content, 'some'::text) It is because of the following check in postgres_fdw: /* * If function's input collation is not derived from a foreign * Var, it can't be sent to remote. */ if (fe->inputcollid == InvalidOid) /* OK, inputs are all noncollatable */ ; else if (inner_cxt.state != FDW_COLLATE_SAFE || fe->inputcollid != inner_cxt.collation) return false; In my case (gdb) p fe->inputcollid $1 = 100 (gdb) p inner_cxt.collation $3 = 0 (gdb) p inner_cxt.state $4 = FDW_COLLATE_NONE I wonder if there is some way of making postgres_fdw to push this this function to foreign server? May be this check should be changed to: if (fe->inputcollid == InvalidOid || inner_cxt.state == FDW_COLLATE_NONE) /* OK, inputs are all noncollatable */ ;
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes: > Right now jsonb functions are treated as non-shippable by postgres_fdw > and so predicates with them are not pushed down to foreign server: Yeah, that's kind of annoying, but breaking the collation check is not an acceptable fix. And what you're proposing *does* break it. The issue here is that the function's input collation is coming from the default collation applied to the text constant, and we can't assume that that will be the same on the remote side. In reality, of course, jsonb_exists doesn't care about its input collation --- but postgres_fdw has no way to know that. I don't see any easy way around that. One idea that would probably work in a lot of postgres_fdw usage scenarios is to have a foreign-server-level flag that says "all the collations on that server behave the same as the local ones, and the default collation is the same too", and then we just skip the collation checking altogether. But I'm a bit worried that if someone mistakenly sets that flag, the misbehavior will be very hard to detect. regards, tom lane
On 13.08.2020 20:00, Tom Lane wrote: > Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes: >> Right now jsonb functions are treated as non-shippable by postgres_fdw >> and so predicates with them are not pushed down to foreign server: > Yeah, that's kind of annoying, but breaking the collation check > is not an acceptable fix. And what you're proposing *does* break it. > The issue here is that the function's input collation is coming from > the default collation applied to the text constant, and we can't assume > that that will be the same on the remote side. > > In reality, of course, jsonb_exists doesn't care about its input collation > --- but postgres_fdw has no way to know that. I don't see any easy way > around that. > > One idea that would probably work in a lot of postgres_fdw usage scenarios > is to have a foreign-server-level flag that says "all the collations on > that server behave the same as the local ones, and the default collation > is the same too", and then we just skip the collation checking altogether. > But I'm a bit worried that if someone mistakenly sets that flag, the > misbehavior will be very hard to detect. > > regards, tom lane Thank you for clarification. And sorry for mistyping in topic (there should be "collation" instead of "collection"). Actually I do not know much about handling collations in Postgres and particularly in postgres_fdw. Can you (or somebody else) provide more information about this fragment of code: /* * If function's input collation is not derived from a foreign * Var, it can't be sent to remote. */ if (fe->inputcollid == InvalidOid) /* OK, inputs are all noncollatable */ ; else if (inner_cxt.state != FDW_COLLATE_SAFE || fe->inputcollid != inner_cxt.collation) return false; So we have function call expression which arguments have associated collation, but function itself is collection-neutral: funccollid = 0 Why it is not safe to push this function call to the remote server? Why it breaks collation check? If there are some unsafe operations with collations during argument evaluation, then we detect it while recursive processing of arguments. I agree that my proposed fix is not correct. But what about this check: if (fe->inputcollid == InvalidOid) /* OK, inputs are all noncollatable */ ; else if (fe->funccollid == InvalidOid) /* OK, function is noncollatable */ ; Or funccollid=0 doesn't mean that collations of function arguments do not affect function behavior?
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes: > Or funccollid=0 doesn't mean that collations of function arguments do > not affect function behavior? No, it does not. As I said already, there is no way to tell from outside a function whether it pays attention to collation or not. funccollid is the collation to ascribe to the function's *output*, but that's always zero for a non-collatable output type such as boolean. An example is text_lt(), which returns boolean but surely does depend on the input collation. We don't really have any way to distinguish between that and jsonb_exists(). In hindsight, it was probably a bad idea not to have a way to mark whether functions care about collation. I don't know if it'd be practical to retrofit such a marker now. regards, tom lane
On Thu, Aug 13, 2020 at 8:54 PM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > > Right now jsonb functions are treated as non-shippable by postgres_fdw > and so predicates with them are not pushed down to foreign server: > > I wonder if there is some way of making postgres_fdw to push this this > function to foreign server? > May be this check should be changed to: > > if (fe->inputcollid == InvalidOid || inner_cxt.state == > FDW_COLLATE_NONE) > /* OK, inputs are all noncollatable */ ; > I think, in general, we may want to push the some of the local functions that may filter out tuples/rows to remote backend to reduce the data transfer(assuming collation and other settings are similar to that of the local backend), but definitely, not this way. One possible issue could be that, what if these functions are supported/installed on the local server, but not on the remote? May be because the remote postgres server version is different than that of the local? Is there a version check between local and remote servers in postgres_fdw? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On 14.08.2020 09:40, Bharath Rupireddy wrote: > On Thu, Aug 13, 2020 at 8:54 PM Konstantin Knizhnik > <k.knizhnik@postgrespro.ru> wrote: >> Right now jsonb functions are treated as non-shippable by postgres_fdw >> and so predicates with them are not pushed down to foreign server: >> >> I wonder if there is some way of making postgres_fdw to push this this >> function to foreign server? >> May be this check should be changed to: >> >> if (fe->inputcollid == InvalidOid || inner_cxt.state == >> FDW_COLLATE_NONE) >> /* OK, inputs are all noncollatable */ ; >> > I think, in general, we may want to push the some of the local > functions that may filter out tuples/rows to remote backend to reduce > the data transfer(assuming collation and other settings are similar to > that of the local backend), but definitely, not this way. One possible > issue could be that, what if these functions are supported/installed > on the local server, but not on the remote? May be because the remote > postgres server version is different than that of the local? Is there > a version check between local and remote servers in postgres_fdw? > > With Regards, > Bharath Rupireddy. > EnterpriseDB: http://www.enterprisedb.com Right now postgres_fdw treat as shippable only builtin functions or functions from extensions explicitly specified as shippable extensions in parameters of this FDW server. So I do no see a problem here. Yes, foreign server may have different version of Postgres which doesn't have this built-in function or its profile is different. It can happen if postgres_fdw is used to connect two different servers which are maintained independently. But in most cases I think, postgres_fdw is used to organize some kind of cluster. In this case all nodes are identical (hardware, OS, postgres version) and performance is very critical (because scalability - of one of the goal of replacing single node with cluster). This is why push down of predicates is very critical in this case. I still do not completely understand current criteria of shippable functions. I understood Tom's explanation, but: postgres=# create table t1(t text collate "C"); CREATE TABLE postgres=# create foreign table ft1(t text collate "ru_RU") server pg_fdw options (table_name 't1'); CREATE FOREIGN TABLE postgres=# explain select * from ft1 where lower(t)='some'; QUERY PLAN ------------------------------------------------------------ Foreign Scan on ft1 (cost=100.00..132.07 rows=7 width=32) (1 row) lower(t) is pushed to remote server despite to the fact that "t" has different collations at local and remote servers. Also when initialize postgres database, you can specify default collation. I have not found any place in postgres_fdw which tries to check if default collation of remote and local servers are the same or specify collation explicitly when them are different. From my point of view, it will be nice to have flag in postgres_fdw server indicating that foreign and remote servers are identical and treat all functions as shippable in this case (not only built-in ones are belonging to explicitly specified shippable extensions). It will simplify using postres_fdw in clusters and makes it more efficient.
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes: > I still do not completely understand current criteria of shippable > functions. > I understood Tom's explanation, but: > postgres=# create table t1(t text collate "C"); > CREATE TABLE > postgres=# create foreign table ft1(t text collate "ru_RU") server > pg_fdw options (table_name 't1'); > CREATE FOREIGN TABLE > postgres=# explain select * from ft1 where lower(t)='some'; > QUERY PLAN > ------------------------------------------------------------ > Foreign Scan on ft1 (cost=100.00..132.07 rows=7 width=32) > (1 row) > lower(t) is pushed to remote server despite to the fact that "t" has > different collations at local and remote servers. Well, that's the case because you lied while creating the foreign table. We have no practical way to cross-check whether the foreign table's declaration is an accurate representation of the remote table, so we just take it on faith that it is. The problem that the collation check is trying to solve is that we can't safely push COLLATE clauses to the remote server, because it may not have the same set of collation names as the local server. So we can only push clauses whose collation is entirely derivable from the table column(s) they use. And then, per the above, we rely on the user to make sure that the local and remote columns have equivalent collations. (Which conceivably would have different names.) > From my point of view, it will be nice to have flag in postgres_fdw > server indicating that foreign and remote servers are identical > and treat all functions as shippable in this case (not only built-in > ones are belonging to explicitly specified shippable extensions). Perhaps, but not everyone has that use-case. I'd even argue that it's a minority use-case. regards, tom lane
On Fri, Aug 14, 2020 at 12:46 PM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
>
> Right now postgres_fdw treat as shippable only builtin functions or
> functions from extensions explicitly specified as shippable extensions
> in parameters of this FDW server. So I do no see a problem here. Yes,
> foreign server may have different version of Postgres which doesn't have
> this built-in function or its profile is different. It can happen if
> postgres_fdw is used to connect two different servers which are
> maintained independently. But in most cases I think, postgres_fdw is
> used to organize some kind of cluster. In this case all nodes are
> identical (hardware, OS, postgres version) and performance is very
> critical (because scalability - of one of the goal of replacing single
> node with cluster).
> This is why push down of predicates is very critical in this case.
>
Agree, push down of predicates(with functions) to the remote backend helps a lot. But, is it safe to push all the functions? For instance, functions that deal with time/time zones, volatile functions etc. I'm not exactly sure whether we will have some issues here. Since postgres_fdw can also be used for independently maintained postgres servers(may be with different versions), we must have a mechanism to know the compatibility.
>
> From my point of view, it will be nice to have flag in postgres_fdw
> server indicating that foreign and remote servers are identical
> and treat all functions as shippable in this case (not only built-in
> ones are belonging to explicitly specified shippable extensions).
> It will simplify using postres_fdw in clusters and makes it more efficient.
>
I think it's better not to have a flag for this. As we have to deal with the compatibility not only at the server version level, but also at each function level. We could have something like a configuration file which allows the user to specify the list of functions that are safely pushable to remote in his/her own postgres_fdw setup, and let the postgres_fdw refer this configuration file, while checking the pushability of the functions to remote. This way, the user has some control over what's pushed and what's not. Of course, this pushability check can only happen after the mandatory checks happening currently such as remote backend configuration settings such as collations etc.
Feel free to correct me.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
>
> Right now postgres_fdw treat as shippable only builtin functions or
> functions from extensions explicitly specified as shippable extensions
> in parameters of this FDW server. So I do no see a problem here. Yes,
> foreign server may have different version of Postgres which doesn't have
> this built-in function or its profile is different. It can happen if
> postgres_fdw is used to connect two different servers which are
> maintained independently. But in most cases I think, postgres_fdw is
> used to organize some kind of cluster. In this case all nodes are
> identical (hardware, OS, postgres version) and performance is very
> critical (because scalability - of one of the goal of replacing single
> node with cluster).
> This is why push down of predicates is very critical in this case.
>
Agree, push down of predicates(with functions) to the remote backend helps a lot. But, is it safe to push all the functions? For instance, functions that deal with time/time zones, volatile functions etc. I'm not exactly sure whether we will have some issues here. Since postgres_fdw can also be used for independently maintained postgres servers(may be with different versions), we must have a mechanism to know the compatibility.
>
> From my point of view, it will be nice to have flag in postgres_fdw
> server indicating that foreign and remote servers are identical
> and treat all functions as shippable in this case (not only built-in
> ones are belonging to explicitly specified shippable extensions).
> It will simplify using postres_fdw in clusters and makes it more efficient.
>
I think it's better not to have a flag for this. As we have to deal with the compatibility not only at the server version level, but also at each function level. We could have something like a configuration file which allows the user to specify the list of functions that are safely pushable to remote in his/her own postgres_fdw setup, and let the postgres_fdw refer this configuration file, while checking the pushability of the functions to remote. This way, the user has some control over what's pushed and what's not. Of course, this pushability check can only happen after the mandatory checks happening currently such as remote backend configuration settings such as collations etc.
Feel free to correct me.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Mon, Aug 17, 2020 at 7:32 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Fri, Aug 14, 2020 at 12:46 PM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > > > > Right now postgres_fdw treat as shippable only builtin functions or > > functions from extensions explicitly specified as shippable extensions > > in parameters of this FDW server. So I do no see a problem here. Yes, > > foreign server may have different version of Postgres which doesn't have > > this built-in function or its profile is different. It can happen if > > postgres_fdw is used to connect two different servers which are > > maintained independently. But in most cases I think, postgres_fdw is > > used to organize some kind of cluster. In this case all nodes are > > identical (hardware, OS, postgres version) and performance is very > > critical (because scalability - of one of the goal of replacing single > > node with cluster). > > This is why push down of predicates is very critical in this case. > > > > Agree, push down of predicates(with functions) to the remote backend helps a lot. But, is it safe to push all the functions?For instance, functions that deal with time/time zones, volatile functions etc. I'm not exactly sure whether wewill have some issues here. Since postgres_fdw can also be used for independently maintained postgres servers(may be withdifferent versions), we must have a mechanism to know the compatibility. > > > > > From my point of view, it will be nice to have flag in postgres_fdw > > server indicating that foreign and remote servers are identical > > and treat all functions as shippable in this case (not only built-in > > ones are belonging to explicitly specified shippable extensions). > > It will simplify using postres_fdw in clusters and makes it more efficient. > > > > I think it's better not to have a flag for this. As we have to deal with the compatibility not only at the server versionlevel, but also at each function level. We could have something like a configuration file which allows the user tospecify the list of functions that are safely pushable to remote in his/her own postgres_fdw setup, and let the postgres_fdwrefer this configuration file, while checking the pushability of the functions to remote. This way, the userhas some control over what's pushed and what's not. Of course, this pushability check can only happen after the mandatorychecks happening currently such as remote backend configuration settings such as collations etc. I agree with most of this. We need a way for a user to tell us which function is safe to be executed on the foreign server (not just postgres_fdw, but other kinds of FDWs as well). But maintaining that as a configurable file and associating safety with an FDW isn't sufficient. We should maintain that as a catalog. A function may be safe to push down based on the FDW (a given function always behaves in the same way on any of the servers of an FDW as its peer locally), or may be associated with a server (a function is available and behaves same as its local peer on certain server/s but not all). Going further a local function may map to a function with a different name on the remote server/fdw, so that same catalog may maintain the function mapping. An FDW may decide to cache relevant information, update the catalog using IMPORT FOREIGN SCHEMA(or ROUTINE), or add some defaults when installing the extension. More details are required to be worked out but here my initial thoughts on this. -- Best Wishes, Ashutosh Bapat
On Tue, 18 Aug 2020 at 17:36, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Mon, Aug 17, 2020 at 7:32 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Aug 14, 2020 at 12:46 PM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
> >
> > Right now postgres_fdw treat as shippable only builtin functions or
> > functions from extensions explicitly specified as shippable extensions
> > in parameters of this FDW server. So I do no see a problem here. Yes,
> > foreign server may have different version of Postgres which doesn't have
> > this built-in function or its profile is different. It can happen if
> > postgres_fdw is used to connect two different servers which are
> > maintained independently. But in most cases I think, postgres_fdw is
> > used to organize some kind of cluster. In this case all nodes are
> > identical (hardware, OS, postgres version) and performance is very
> > critical (because scalability - of one of the goal of replacing single
> > node with cluster).
> > This is why push down of predicates is very critical in this case.
> >
>
> Agree, push down of predicates(with functions) to the remote backend helps a lot. But, is it safe to push all the functions? For instance, functions that deal with time/time zones, volatile functions etc. I'm not exactly sure whether we will have some issues here. Since postgres_fdw can also be used for independently maintained postgres servers(may be with different versions), we must have a mechanism to know the compatibility.
>
> >
> > From my point of view, it will be nice to have flag in postgres_fdw
> > server indicating that foreign and remote servers are identical
> > and treat all functions as shippable in this case (not only built-in
> > ones are belonging to explicitly specified shippable extensions).
> > It will simplify using postres_fdw in clusters and makes it more efficient.
> >
>
> I think it's better not to have a flag for this. As we have to deal with the compatibility not only at the server version level, but also at each function level. We could have something like a configuration file which allows the user to specify the list of functions that are safely pushable to remote in his/her own postgres_fdw setup, and let the postgres_fdw refer this configuration file, while checking the pushability of the functions to remote. This way, the user has some control over what's pushed and what's not. Of course, this pushability check can only happen after the mandatory checks happening currently such as remote backend configuration settings such as collations etc.
I agree with most of this. We need a way for a user to tell us which
function is safe to be executed on the foreign server (not just
postgres_fdw, but other kinds of FDWs as well). But maintaining that
as a configurable file and associating safety with an FDW isn't
sufficient. We should maintain that as a catalog. A function may be
safe to push down based on the FDW (a given function always behaves in
the same way on any of the servers of an FDW as its peer locally), or
may be associated with a server (a function is available and behaves
same as its local peer on certain server/s but not all). Going further
a local function may map to a function with a different name on the
remote server/fdw, so that same catalog may maintain the function
mapping. An FDW may decide to cache relevant information, update the
catalog using IMPORT FOREIGN SCHEMA(or ROUTINE), or add some defaults
when installing the extension.
While looking at something else in postgres_fdw, I came across an old feature which I had completely forgotten about. We allow extensions to be added to server options. Any object belonging to these extensions, including functions, can be shipped to the foreign server. See postres_fdw/sql/postgres_fdw.sql for examples. This is an awkward way since there is no way to control individual functions and a UDF has to be part of an extension to be shippable. It doesn't provide flexibility to map a local function to a remote one if their names differ. But we have something. May be we could dig past conversations to understand why it was done this way.
Best Wishes,
Ashutosh