Обсуждение: BUG #5611: SQL Function STABLE promoting to VOLATILE
The following bug has been logged online: Bug reference: 5611 Logged by: Brian Ceccarelli Email address: bceccarelli@net32.com PostgreSQL version: 8.4.4 Operating system: Windows XP 32 bit and Red Hat 5.4 64 bit Description: SQL Function STABLE promoting to VOLATILE Details: ---------------------------------------------------------------------------- ----- -- -- Demonstration of a PL/PGSQL stable-scoped function not working. -- -- To see the problem, run this entire script from PGAdmin. Bring up Task Manager too. -- After you the run this entire script, you can repeat the specific problem by running: -- -- select * from f_pass_4(); -- -- Note that "I am in f_return_ver_id_4() repeats once for every row returned from f_get_table_4(). -- Postgres should only call f_return_ver_id_4() once. -- The Problems: -- -- 1. It seems that STABLE functions called within a SQL language get promoted to VOLATILE. -- Even though I declare a function STABLE, Postgres calls it multiple times within a tranasaction. -- -- 2. The raise notice in f_return_ver_id_4() also causes a memory leak in PGAdmin (on Windows). -- -- Related Problems: -- -- Even the now() function gets called repeatedly within a stable SQL function. -- -- Postgres Version: -- -- I am running Postgres 8.4.4-1 on Windows. (Windows XP 32 bit) -- Same problem occurs on Postgres 8.4.4 on Linux-64. Red Hat 5.4. -- -- Problem NOT in Postgres 8.2. -- -- This problem does not happen in Postgres 8.2. -- ---------------------------------------------------------------------------- ----- drop type if exists type_pass_test cascade; create type type_pass_test as (ver_id int8); CREATE OR REPLACE FUNCTION f_get_table_4() RETURNS setof type_pass_test AS $BODY$ ---------------------------------------------------------------------------- ----- -- -- ---------------------------------------------------------------------------- ----- declare r type_pass_test; i int8; begin for i in 1..5 loop r.ver_id := i; return next r; end loop; return; end; $BODY$ language 'plpgsql' volatile; CREATE OR REPLACE FUNCTION f_return_ver_id_4() RETURNS int8 AS $BODY$ ---------------------------------------------------------------------------- ----- -- -- ---------------------------------------------------------------------------- ----- declare begin raise notice 'I am in f_return_ver_id_4()'; return 1; end; $BODY$ language 'plpgsql' stable; CREATE OR REPLACE FUNCTION f_do_4(ver_id_in int8) RETURNS setof type_pass_test AS $BODY$ ---------------------------------------------------------------------------- ----- -- -- When you run f_pass_4(), f_pass_4() calls f_do_4() passing ver_id_in as f_return_ver_id_4(). -- -- The error: -- -- Even though f_return_ver_id_4() is a STABLE function, the select -- statement below calls f_return_ver_id_4() once for every row coming back from -- f_get_table_4(). -- -- The repeat appears when I write the function in the SQL language. -- The repeat disappears when I write the function in the PL/PGSQL language. -- If I add now() to the where clause, you will even see that Postgres calls now() multiple times. -- ---------------------------------------------------------------------------- ----- select * from f_get_table_4() where ver_id = $1; $BODY$ language 'SQL' stable; /* CREATE OR REPLACE FUNCTION f_do_4(ver_id_in int8) RETURNS setof type_pass_test AS $BODY$ ---------------------------------------------------------------------------- ----- -- -- If I make the f_do_4(ver_id_in int8) a PL/PGSQL function, the problem goes away. -- ---------------------------------------------------------------------------- ----- begin return query select * from f_get_table_4() where ver_id = $1; return; end; $BODY$ language 'plpgsql' stable; */ CREATE OR REPLACE FUNCTION f_pass_4() RETURNS int4 AS $BODY$ ---------------------------------------------------------------------------- ----- -- -- Example: -- -- select * from f_pass_4(); -- ---------------------------------------------------------------------------- ----- declare rows_affected_w int4; begin select into rows_affected_w count(*) from f_do_4(f_return_ver_id_4()); return rows_affected_w; end; $BODY$ language 'plpgsql' stable; ---------------------------------------- select * from f_pass_4();
"Brian Ceccarelli" <bceccarelli@net32.com> writes: > -- 1. It seems that STABLE functions called within a SQL language > get promoted to VOLATILE. That has got nothing to do with it. The change in behavior from 8.2 is due to the fact that set-returning SQL functions can now be inlined. The statement in f_pass_4(), select into rows_affected_w count(*) from f_do_4(f_return_ver_id_4()); gets expanded (by inlining of f_do_4) into select into rows_affected_w count(*) from f_get_table_4() where ver_id = f_return_ver_id_4(); and then since f_get_table_4() returns multiple rows, the WHERE clause gets evaluated multiple times. As near as I can tell, your real complaint is that the side-effects of f_return_ver_id_4 (ie, the RAISE NOTICE) happen more than once. However, a function declared STABLE really shouldn't *have* any side effects, because that marking authorizes the optimizer to assume that it doesn't. If you marked it VOLATILE then this optimization wouldn't happen. > -- 2. The raise notice in f_return_ver_id_4() also causes a memory > leak in PGAdmin (on Windows). Hm, you probably ought to mention that part on the pgadmin mailing lists. I don't know whether the appropriate people will notice it here. regards, tom lane
Dear Tom, Thanks for getting back to me so soon. I see that SQL functions seem to inline their calling arguments.=20 My complaint remains. That inlined function f_return_ver_id_4() is a = STABLE function, inlined or not. Postgres now calls it multiple times duri= ng the transaction, even though the arguments to f_return_ver_id_4() have n= ot changed.=20=20 STABLE no longer means STABLE. This behavior is killing my performanc= e. I am getting 500% to 30000% increase in latency. The problem does not stop with just this. Functions that I have no con= trol over, like now(), behave like this too. The function now() gets calle= d multiple times in a transaction. See the example below. All this behavior seems only to happen when I call the SQL function fr= om a PL/PGSQL function. It doesn't happen when I call the SQL function di= rectly from the psql command line. For example, below, look what I had to do with the now() function. I= had to change: <<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>> create or replace function f_get_current_price_all_keys_by_mp_rank(culture_= in text, ver_id_min_in int8, ver_id_max_in int8, query_in tsquery) returns setof type_product_price_key as $BODY$ ---------------------------------------------------------------------------= ------ -- -- This function retrieves the all primary keys to the product_price table -- which point to the current prices of vendor products for a specific=20 -- tsearch query. ---------------------------------------------------------------------------= ------ select pp.vp_id, pp.ver_id, pp.pm_id, max(pp.pp_price_eff_time) as pp_price_eff_time, pp.pp_min_qty from product_price pp inner join vendor_product vp on vp.vp_id =3D pp.vp_i= d=20 inner join manufacturer_product_tsv tsv on tsv.mp_id =3D vp.mp_i= d and tsv.ver_id =3D pp.ver_id=20 inner join manufacturer_product mp on mp.mp_id =3D vp.mp_id inner join brand b on b.brand_id =3D mp.bran= d_id inner join manufacturer m on m.man_id =3D b.man_id inner join vendor_vertical vv on vv.vdr_id =3D vp.vdr_= id and vv.ver_id =3D pp.ver_id inner join vendor v on v.vdr_id =3D vv.vdr_= id inner join promotion pm on pm.pm_id =3D pp.pm_i= d=20=20=20=20=20=20=20 left join promo_vendor_behavior vb on vb.pm_id =3D pp.pm_i= d and vb.vdr_id =3D vp.vdr_id left join promo_type_attrs pa on pa.pmta_id =3D vb.pmta= _id inner join time_zone t on t.tz_id =3D pp.tz_i= d=20=20 where pp.active_cd =3D 1 and tsv.search_vector @@@ $4 and tsv.culture =3D $1 and pp.ver_id between $2 and $3 and vp.active_cd =3D 1 and vv.active_cd =3D 1 and v.active_cd =3D 1 and mp.active_cd =3D 1 and b.active_cd =3D 1 and m.active_cd =3D 1 and pm.active_cd =3D 1 and ((pa.pmta_id is null) or (pa.pmta_id is not null and pa.active_cd= =3D 1))=20=20=20=20=20=20=20=20 and ((pp.pp_end_time is null) or (pp.pp_end_time > now())) and pp.pp_price_eff_time <=3D now() and (pp.days_of_week & (1 << (extract(dow from now() at time zone= t.name)::int4))) <> 0 and (pp.days_of_month & (1 << (extract(day from now() at time zone= t.name)::int4))) <> 0 and (pp.months_of_year & (1 << (extract(month from now() at time zone= t.name)::int4))) <> 0 group by pp.vp_id, pp.ver_id, pp.pm_id, pp.pp_min_qty $BODY$ language 'SQL' STABLE; =20=20 <<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>> To this PL/PGSQL function because now() gets called multiple times. The fu= nction above runs in 1.8 seconds. The function below runs in 0.25 seconds. create or replace function f_get_current_price_all_keys_by_mp_rank(culture_= in text, ver_id_min_in int8, ver_id_max_in int8, query_in tsquery) returns setof type_product_price_key as $BODY$ ---------------------------------------------------------------------------= ------ -- -- This function retrieves the all primary keys to the product_price table -- which point to the current prices of vendor products for a specific=20 -- tsearch query. ---------------------------------------------------------------------------= ------ declare now_w timestamp with time zone; begin now_w :=3D now(); =20=20=20 return query select pp.vp_id, pp.ver_id, pp.pm_id, max(pp.pp_price_eff_time) as pp_price_eff_time, pp.pp_min_qty from product_price pp inner join vendor_product vp on vp.vp_id =3D pp.vp_i= d=20 inner join manufacturer_product_tsv tsv on tsv.mp_id =3D vp.mp_i= d and tsv.ver_id =3D pp.ver_id=20 inner join manufacturer_product mp on mp.mp_id =3D vp.mp_id inner join brand b on b.brand_id =3D mp.bran= d_id inner join manufacturer m on m.man_id =3D b.man_id inner join vendor_vertical vv on vv.vdr_id =3D vp.vdr_= id and vv.ver_id =3D pp.ver_id inner join vendor v on v.vdr_id =3D vv.vdr_= id inner join promotion pm on pm.pm_id =3D pp.pm_i= d=20=20=20=20=20=20=20 left join promo_vendor_behavior vb on vb.pm_id =3D pp.pm_i= d and vb.vdr_id =3D vp.vdr_id left join promo_type_attrs pa on pa.pmta_id =3D vb.pmta= _id inner join time_zone t on t.tz_id =3D pp.tz_i= d=20=20 where pp.active_cd =3D 1 and tsv.search_vector @@@ $4 and tsv.culture =3D $1 and pp.ver_id between $2 and $3 and vp.active_cd =3D 1 and vv.active_cd =3D 1 and v.active_cd =3D 1 and mp.active_cd =3D 1 and b.active_cd =3D 1 and m.active_cd =3D 1 and pm.active_cd =3D 1 and ((pa.pmta_id is null) or (pa.pmta_id is not null and pa.active_cd= =3D 1))=20=20=20=20=20=20=20=20 and ((pp.pp_end_time is null) or (pp.pp_end_time > now_w)) and pp.pp_price_eff_time <=3D now() and (pp.days_of_week & (1 << (extract(dow from now_w at time zone= t.name)::int4))) <> 0 and (pp.days_of_month & (1 << (extract(day from now_w at time zone= t.name)::int4))) <> 0 and (pp.months_of_year & (1 << (extract(month from now_w at time zone= t.name)::int4))) <> 0 group by pp.vp_id, pp.ver_id, pp.pm_id, pp.pp_min_qty; return; end; $BODY$ language 'plpgsql' STABLE rows 1000; =20 But that is not all that's going on. It turns out that in a SQL func= tion, now() also gets called multiple times. The function f_return_ver_id_4() is a STABLE function. Even when inl= ined, the behavior should still be that the function gets called only once = during a transaction -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us]=20 Sent: Tuesday, August 10, 2010 10:55 PM To: Brian Ceccarelli Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #5611: SQL Function STABLE promoting to VOLATILE=20 "Brian Ceccarelli" <bceccarelli@net32.com> writes: > -- 1. It seems that STABLE functions called within a SQL language > get promoted to VOLATILE.=20 That has got nothing to do with it. The change in behavior from 8.2 is due to the fact that set-returning SQL functions can now be inlined. The statement in f_pass_4(), select into rows_affected_w count(*) from f_do_4(f_return_ver_id_4()); gets expanded (by inlining of f_do_4) into select into rows_affected_w count(*) from f_get_table_4() where ver_id =3D f_return_ver_id_4(); and then since f_get_table_4() returns multiple rows, the WHERE clause gets evaluated multiple times. As near as I can tell, your real complaint is that the side-effects of f_return_ver_id_4 (ie, the RAISE NOTICE) happen more than once. However, a function declared STABLE really shouldn't *have* any side effects, because that marking authorizes the optimizer to assume that it doesn't. If you marked it VOLATILE then this optimization wouldn't happen. > -- 2. The raise notice in f_return_ver_id_4() also causes a memory > leak in PGAdmin (on Windows). Hm, you probably ought to mention that part on the pgadmin mailing lists. I don't know whether the appropriate people will notice it here. regards, tom lane
Brian Ceccarelli <bceccarelli@net32.com> writes: > STABLE no longer means STABLE. This behavior is killing my performance. I am getting 500% to 30000% increase inlatency. You seem to be under the illusion that "stable" is a control knob for a function result cache. It is not, and never has been. Postgres doesn't do function result caching. If you've constructed your app in such a way that it depends on not inlining SQL set-returning functions, it's fairly easy to defeat that. From memory, marking a SRF as either strict or volatile will do it (although volatile might cause you problems elsewhere --- I suspect your design is pretty brittle in this area). regards, tom lane
On Wed, Aug 11, 2010 at 11:01 AM, Brian Ceccarelli <bceccarelli@net32.com> wrote: > =A0 =A0 My complaint remains. =A0That inlined function f_return_ver_id_4(= ) is a STABLE function, inlined or not. =A0Postgres now calls it multiple t= imes during the transaction, even though the arguments to f_return_ver_id_4= () have not changed. > > =A0 =A0 STABLE no longer means STABLE. =A0This behavior is killing my per= formance. =A0I am getting 500% to 30000% increase in latency. We've never guaranteed that, and almost certainly never will. Marking a function STABLE means that the planner is *allowed to assume* that the results won't change for a given set of arguments, not that it is *required to prevent* it from being called multiple times with the same set of arguments. You can certainly prevent the function from being inlined, though (perhaps, by writing it in PL/pgsql). --=20 Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
OK. The documentation says "allows the optimizer to optimize . . . ." = But then the example guarantees the one-time-only for a index scan conditio= n.=20=20=20 =46rom the documentation: 8.4.4 Chapter 32 and 8.2.17 Chapter 33. .A STABLE function cannot modify the database and is guaranteed to retur= n the same results given the same arguments for all rows within a single s= tatement. This category allows the optimizer to optimize multiple calls of = the function to a single call. In particular, it is safe to use an expressi= on containing such a function in an index scan condition. (Since an index s= can will evaluate the comparison value only once, not once at each row, it = is not valid to use a VOLATILE function in an index scan condition.) The behavior of the optimizers <=3D 8.2 certainly fit the description. Th= e 8.4 behavior is vastly different.=20=20 I recommend that somebody change the documentation to say, "This category a= llows, but does not guarantee, the optimizer to optimize multiple calls . .= . ." That would be more clear. And then mention the inlining deal, if= you haven't already.=20=20=20=20 There remains the problem with the now() function. A SQL function repetit= ively calls now(). Is that what you intended? There remains the problem with PGAdmin memory leak. I will change my SQL functions to PL/PGSQL functions. I am glad that there= is a solution. Thank you for your help. -----Original Message----- From: Robert Haas [mailto:robertmhaas@gmail.com]=20 Sent: Wednesday, August 11, 2010 11:33 AM To: Brian Ceccarelli Cc: Tom Lane; pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #5611: SQL Function STABLE promoting to VOLATILE On Wed, Aug 11, 2010 at 11:01 AM, Brian Ceccarelli <bceccarelli@net32.com> wrote: > =A0 =A0 My complaint remains. =A0That inlined function f_return_ver_id_4(= ) is a STABLE function, inlined or not. =A0Postgres now calls it multiple t= imes during the transaction, even though the arguments to f_return_ver_id_4= () have not changed. > > =A0 =A0 STABLE no longer means STABLE. =A0This behavior is killing my per= formance. =A0I am getting 500% to 30000% increase in latency. We've never guaranteed that, and almost certainly never will. Marking a function STABLE means that the planner is *allowed to assume* that the results won't change for a given set of arguments, not that it is *required to prevent* it from being called multiple times with the same set of arguments. You can certainly prevent the function from being inlined, though (perhaps, by writing it in PL/pgsql). --=20 Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Wed, Aug 11, 2010 at 11:50 AM, Brian Ceccarelli <bceccarelli@net32.com> wrote: > OK. =A0 The documentation says "allows the optimizer to optimize . . . ."= =A0 =A0But then the example guarantees the one-time-only for a index scan = condition. > > From the documentation: =A0 =A08.4.4 Chapter 32 and 8.2.17 Chapter 33. > > =A0 .A STABLE function cannot modify the database and is guaranteed to re= turn the same results given the same arguments for all rows within a =A0sin= gle statement. This category allows the optimizer to optimize multiple call= s of the function to a single call. In particular, it is safe to use an exp= ression containing such a function in an index scan condition. (Since an in= dex scan will evaluate the comparison value only once, not once at each row= , it is not valid to use a VOLATILE function in an index scan condition.) > > The behavior of the optimizers <=3D 8.2 certainly fit the description. = =A0 The 8.4 behavior is vastly different. Reading between the lines, I think I sense that this has got you pretty frustrated, so in defense of the new behavior, let me just mention that, in general, inlining SQL queries results in a HUGE performance benefit. It's sort of unfortunate that it doesn't work out that way for you in this case, but I don't think it's a bad idea in general. *thinks* In theory, the optimization Brian wants is possible here, right? I mean, you could replace the functional call with a Param, and pull the Param out and make it an InitPlan. Seems like that would generally be a win, if you figure to loop more than once and the execution cost of the function is not too tiny. --=20 Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Brian Ceccarelli <bceccarelli@net32.com> writes: > OK. The documentation says "allows the optimizer to optimize . . . ." But then the example guarantees the one-time-onlyfor a index scan condition. No, the documentation states that *if* an index scan is used, functions involved in the indexscan's qual condition will be evaluated just once, rather than once per row. There is no "guarantee" of any sort that such a plan will be chosen. The point of the STABLE marking is to inform the optimizer that it is safe to choose an index scan because the function's results will not change compared to the naive SQL semantics wherein the WHERE condition is evaluated for each row. Thus, the guarantee actually runs the other way: you are promising the optimizer that your function doesn't have side effects or change its results intra-query. regards, tom lane
1. Basically, I was assuming that "STABLE" was more than just a optimizer = "hint". 2. Is it not better to evaluate a STABLE function once, and pass the retur= ned constant to a function, than pass the function itself and inline it? = Is not passing a constant more efficient than even an inline query? Please show me an example where an inline query gets a performance boost. Thanks! -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us]=20 Sent: Wednesday, August 11, 2010 4:34 PM To: Brian Ceccarelli Cc: Robert Haas; pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #5611: SQL Function STABLE promoting to VOLATILE=20 Brian Ceccarelli <bceccarelli@net32.com> writes: > OK. The documentation says "allows the optimizer to optimize . . . ." = But then the example guarantees the one-time-only for a index scan condit= ion.=20=20=20 No, the documentation states that *if* an index scan is used, functions involved in the indexscan's qual condition will be evaluated just once, rather than once per row. There is no "guarantee" of any sort that such a plan will be chosen. The point of the STABLE marking is to inform the optimizer that it is safe to choose an index scan because the function's results will not change compared to the naive SQL semantics wherein the WHERE condition is evaluated for each row. Thus, the guarantee actually runs the other way: you are promising the optimizer that your function doesn't have side effects or change its results intra-query. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > In theory, the optimization Brian wants is possible here, right? I > mean, you could replace the functional call with a Param, and pull the > Param out and make it an InitPlan. Seems like that would generally be > a win, if you figure to loop more than once and the execution cost of > the function is not too tiny. Yeah, possibly. It would probably be difficult for the planner to figure out where the cutover point is to make that worthwhile, though; the point where you'd need to make the transformation is long before we have any rowcount estimates. I was about to suggest that Brian could make that happen manually by writing ... FROM srf((SELECT expensive_func())) ... but I think actually that will just amount to another way of disabling inlining. regards, tom lane
Excerpts from Brian Ceccarelli's message of mié ago 11 16:47:50 -0400 2010: > Please show me an example where an inline query gets a performance boost. The reason it's a performance boost is that the query gets to be planned as a single query, instead of there being a black-box that needs to be planned separately. I don't have any example handy to share. -- Ãlvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Wed, Aug 11, 2010 at 4:47 PM, Brian Ceccarelli <bceccarelli@net32.com> wrote: > Please show me an example where an inline query gets a performance boost. Sure. rhaas=# create table example as select a from generate_series(1,100000) a; SELECT 100000 rhaas=# alter table example add primary key (a); NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "example_pkey" for table "example" ALTER TABLE rhaas=# create function f() returns setof int as $$select a from example$$ language sql stable; rhaas=# explain analyze select * from f() where f = 1; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Index Scan using example_pkey on example (cost=0.00..8.28 rows=1 width=4) (actual time=0.102..0.103 rows=1 loops=1) Index Cond: (a = 1) Total runtime: 0.149 ms (3 rows) rhaas=# alter function f() volatile; ALTER FUNCTION rhaas=# explain analyze select * from f() where f = 1; QUERY PLAN --------------------------------------------------------------------------------------------------- Function Scan on f (cost=0.25..12.75 rows=5 width=4) (actual time=34.585..51.972 rows=1 loops=1) Filter: (f = 1) Total runtime: 63.277 ms (3 rows) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Wed, Aug 11, 2010 at 5:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> In theory, the optimization Brian wants is possible here, right? =A0I >> mean, you could replace the functional call with a Param, and pull the >> Param out and make it an InitPlan. =A0Seems like that would generally be >> a win, if you figure to loop more than once and the execution cost of >> the function is not too tiny. > > Yeah, possibly. =A0It would probably be difficult for the planner to > figure out where the cutover point is to make that worthwhile, though; > the point where you'd need to make the transformation is long before we > have any rowcount estimates. This may be a stupid question, but why does the transformation have to be done before we have the row count estimates? I think we're just looking for a scan node with a filter condition that contains a stable subexpression that's expensive enough to be worth factoring out, so I feel like we have the necessary information when we're constructing the RelOptInfo. The startup cost is so trivial that I can't see generating mutiple paths for it; I think you could just make a local decision whether to apply the optimization or not. --=20 Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Aug 11, 2010 at 5:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah, possibly. It would probably be difficult for the planner to >> figure out where the cutover point is to make that worthwhile, though; >> the point where you'd need to make the transformation is long before we >> have any rowcount estimates. > This may be a stupid question, but why does the transformation have to > be done before we have the row count estimates? Well, I was thinking in terms of doing it when we do the SRF inlining. It might be that we could get away with just having an arbitrary cost limit like 100*cpu_operator_cost, and not think about how many rows would actually be involved. > I think we're just > looking for a scan node with a filter condition that contains a stable > subexpression that's expensive enough to be worth factoring out, I do *not* want to grovel over every subexpression (and sub-sub-expression, etc) in a query thinking about whether to do this. That gets O(expensive) pretty quickly. My idea of the appropriate scope of a hack like this is just to prevent any performance loss from SRF inlining. Another approach we could take is to fix the implementation limitation in inline_set_returning_function() about punting when there's a sub-select in the arguments. Then users could make this happen for themselves when it matters. regards, tom lane
On Thu, Aug 12, 2010 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Aug 11, 2010 at 5:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Yeah, possibly. =A0It would probably be difficult for the planner to >>> figure out where the cutover point is to make that worthwhile, though; >>> the point where you'd need to make the transformation is long before we >>> have any rowcount estimates. > >> This may be a stupid question, but why does the transformation have to >> be done before we have the row count estimates? > > Well, I was thinking in terms of doing it when we do the SRF inlining. > It might be that we could get away with just having an arbitrary cost > limit like 100*cpu_operator_cost, and not think about how many rows > would actually be involved. I'm not exactly following this. My guess is that the breakeven point is going to be pretty low because I think Param nodes are pretty cheap. >> I think we're just >> looking for a scan node with a filter condition that contains a stable >> subexpression that's expensive enough to be worth factoring out, > > I do *not* want to grovel over every subexpression (and > sub-sub-expression, etc) in a query thinking about whether to do this. > That gets O(expensive) pretty quickly. =A0My idea of the appropriate scope > of a hack like this is just to prevent any performance loss from SRF > inlining. Well, that's certainly a good place to start, but I was thinking that it would be nice to optimize things like this: SELECT * FROM foo WHERE somecolumn =3D somefunc(); This is OK if we choose a straight index scan, but it's probably very much worth optimizing if we end up doing anything else. If that's too hairy, then maybe not, but it's not obvious to me why it would be expensive. > Another approach we could take is to fix the implementation limitation > in inline_set_returning_function() about punting when there's a > sub-select in the arguments. =A0Then users could make this happen for > themselves when it matters. Hmm. I'm usually in favor of removing implementation restrictions, but I'm not too sure about the effects of removing this one. It seems like it would be nicer to have a solution that didn't require the user to write the query a certain way. --=20 Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Aug 12, 2010 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Well, I was thinking in terms of doing it when we do the SRF inlining. >> It might be that we could get away with just having an arbitrary cost >> limit like 100*cpu_operator_cost, and not think about how many rows >> would actually be involved. > I'm not exactly following this. My guess is that the breakeven point > is going to be pretty low because I think Param nodes are pretty > cheap. If you have any significant number of executions of the expression, then of course converting it to an initplan is a win. What I'm worried about is where you have just a small number (like maybe only one, if it gets converted to an indexqual for instance). Then the added expense of setting up the initplan isn't going to be repaid. As long as the expression is pretty expensive, the percentage overhead of wrapping it in an initplan will probably be tolerable anyway, but I'm not sure where the threshold of "pretty expensive" is for that. > Well, that's certainly a good place to start, but I was thinking that > it would be nice to optimize things like this: > SELECT * FROM foo WHERE somecolumn = somefunc(); > This is OK if we choose a straight index scan, but it's probably very > much worth optimizing if we end up doing anything else. If that's too > hairy, then maybe not, but it's not obvious to me why it would be > expensive. Because you have to look at every subexpression of every subexpression to figure out if it's (a) stable and (b) expensive. Each of those checks is un-cheap in itself, and if you blindly apply them at every node of an expression tree the cost will be exponential. regards, tom lane
On Thu, Aug 12, 2010 at 11:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm not exactly following this. =A0My guess is that the breakeven point >> is going to be pretty low because I think Param nodes are pretty >> cheap. > > If you have any significant number of executions of the expression, then > of course converting it to an initplan is a win. =A0What I'm worried about > is where you have just a small number (like maybe only one, if it gets > converted to an indexqual for instance). =A0Then the added expense of > setting up the initplan isn't going to be repaid. =A0As long as the > expression is pretty expensive, the percentage overhead of wrapping it > in an initplan will probably be tolerable anyway, but I'm not sure where > the threshold of "pretty expensive" is for that. Oh, I see. It seems a lot more elegant if we can start by determining whether the expression is in a context where it's likely to be executed more than once. *thinks* I wonder if we could make this decision mostly based on node types. Maybe it's reasonable to say that if we're doing say, a Seq Scan, we always assume it's going to get evaluated more than once. Yeah, the table could have <2 rows in it, but mostly it won't, and I don't know that it's wise to optimize for that case even if we *think* that's what the statistics are telling us. Similarly for a Function Scan, CTE Scan, Worktable Scan, etc. We *could* look at the row estimates, but I bet it isn't necessary. On the other hand, for an index qual, it's probably pointless. I guess the hard case is a filter qual on an index scan... it's not too clear to me what the right thing to do is in that case. >> Well, that's certainly a good place to start, but I was thinking that >> it would be nice to optimize things like this: > >> SELECT * FROM foo WHERE somecolumn =3D somefunc(); > >> This is OK if we choose a straight index scan, but it's probably very >> much worth optimizing if we end up doing anything else. =A0If that's too >> hairy, then maybe not, but it's not obvious to me why it would be >> expensive. > > Because you have to look at every subexpression of every subexpression > to figure out if it's (a) stable and (b) expensive. =A0Each of those > checks is un-cheap in itself, and if you blindly apply them at every > node of an expression tree the cost will be exponential. I think you'd need some kind of expression tree walker that builds up a list of maximal stable subexpression trees. It would be nice to figure this out at some point in the process where we already have to check volatility anyway. --=20 Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company