Обсуждение: BUG #5611: SQL Function STABLE promoting to VOLATILE

Поиск
Список
Период
Сортировка

BUG #5611: SQL Function STABLE promoting to VOLATILE

От
"Brian Ceccarelli"
Дата:
The following bug has been logged online:

Bug reference:      5611
Logged by:          Brian Ceccarelli
Email address:      bceccarelli@net32.com
PostgreSQL version: 8.4.4
Operating system:   Windows XP 32 bit and Red Hat 5.4 64 bit
Description:        SQL Function STABLE promoting to VOLATILE
Details:

----------------------------------------------------------------------------
-----
--
--    Demonstration of a PL/PGSQL stable-scoped function not working.
--
--        To see the problem, run this entire script from PGAdmin.   Bring
up Task Manager too.
--        After you the run this entire script, you can repeat the specific
problem by running:
--
--                select * from f_pass_4();
--
--        Note that "I am in f_return_ver_id_4() repeats once for every row
returned from f_get_table_4().
--        Postgres should only call f_return_ver_id_4() once.

--    The Problems:
--
--        1.  It seems that STABLE functions called within a SQL language
get promoted to VOLATILE.
--            Even though I declare a function STABLE, Postgres calls it
multiple times within a tranasaction.
--
--        2.  The raise notice in f_return_ver_id_4() also causes a memory
leak in PGAdmin (on Windows).
--
--    Related Problems:
--
--         Even the now() function gets called repeatedly within a stable
SQL function.
--
--    Postgres Version:
--
--        I am running Postgres 8.4.4-1 on Windows.   (Windows XP 32 bit)
--        Same problem occurs on Postgres 8.4.4 on Linux-64.   Red Hat 5.4.
--
--    Problem NOT in Postgres 8.2.
--
--         This problem does not happen in Postgres 8.2.
--
----------------------------------------------------------------------------
-----

drop type if exists type_pass_test cascade;
create type type_pass_test as (ver_id int8);

CREATE OR REPLACE FUNCTION f_get_table_4()
   RETURNS setof type_pass_test AS
$BODY$
----------------------------------------------------------------------------
-----
--
--
----------------------------------------------------------------------------
-----
declare
     r          type_pass_test;
     i          int8;
begin

   for i in 1..5
   loop
      r.ver_id := i;
      return next r;
   end loop;

   return;
end;
$BODY$
  language 'plpgsql' volatile;



CREATE OR REPLACE FUNCTION f_return_ver_id_4()
   RETURNS int8 AS
$BODY$
----------------------------------------------------------------------------
-----
--
--
----------------------------------------------------------------------------
-----
declare
begin
   raise notice 'I am in f_return_ver_id_4()';
   return 1;
end;
$BODY$
  language 'plpgsql' stable;



CREATE OR REPLACE FUNCTION f_do_4(ver_id_in  int8)
   RETURNS setof type_pass_test AS
$BODY$
----------------------------------------------------------------------------
-----
--
--   When you run f_pass_4(), f_pass_4() calls f_do_4() passing ver_id_in as
f_return_ver_id_4().
--
--   The error:
--
--       Even though f_return_ver_id_4() is a STABLE function, the select
--   statement below calls f_return_ver_id_4() once for every row coming
back from
--   f_get_table_4().
--
--       The repeat appears when I write the function in the SQL language.
--       The repeat disappears when I write the function in the PL/PGSQL
language.
--       If I add now() to the where clause, you will even see that Postgres
calls now() multiple times.
--
----------------------------------------------------------------------------
-----

 select *
   from f_get_table_4()
  where ver_id = $1;

$BODY$
  language 'SQL' stable;


/*
CREATE OR REPLACE FUNCTION f_do_4(ver_id_in  int8)
   RETURNS setof type_pass_test AS
$BODY$
----------------------------------------------------------------------------
-----
--
--   If I make the f_do_4(ver_id_in int8) a PL/PGSQL function, the problem
goes away.
--
----------------------------------------------------------------------------
-----
begin

   return query
     select *
       from f_get_table_4()
      where ver_id = $1;

   return;

end;
$BODY$
  language 'plpgsql' stable;

*/

CREATE OR REPLACE FUNCTION f_pass_4()
   RETURNS int4 AS
$BODY$
----------------------------------------------------------------------------
-----
--
--    Example:
--
--    select * from f_pass_4();
--
----------------------------------------------------------------------------
-----
declare
   rows_affected_w  int4;
begin



   select into rows_affected_w
               count(*)
          from f_do_4(f_return_ver_id_4());

   return rows_affected_w;
end;
$BODY$
  language 'plpgsql' stable;

----------------------------------------

select * from f_pass_4();

Re: BUG #5611: SQL Function STABLE promoting to VOLATILE

От
Tom Lane
Дата:
"Brian Ceccarelli" <bceccarelli@net32.com> writes:
> --        1.  It seems that STABLE functions called within a SQL language
> get promoted to VOLATILE.

That has got nothing to do with it.  The change in behavior from 8.2
is due to the fact that set-returning SQL functions can now be inlined.
The statement in f_pass_4(),

   select into rows_affected_w count(*)
          from f_do_4(f_return_ver_id_4());

gets expanded (by inlining of f_do_4) into

   select into rows_affected_w count(*)
          from f_get_table_4()
          where ver_id = f_return_ver_id_4();

and then since f_get_table_4() returns multiple rows, the WHERE clause
gets evaluated multiple times.

As near as I can tell, your real complaint is that the side-effects of
f_return_ver_id_4 (ie, the RAISE NOTICE) happen more than once.
However, a function declared STABLE really shouldn't *have* any side
effects, because that marking authorizes the optimizer to assume that it
doesn't.  If you marked it VOLATILE then this optimization wouldn't
happen.

> --        2.  The raise notice in f_return_ver_id_4() also causes a memory
> leak in PGAdmin (on Windows).

Hm, you probably ought to mention that part on the pgadmin mailing
lists.  I don't know whether the appropriate people will notice it here.

            regards, tom lane

Re: BUG #5611: SQL Function STABLE promoting to VOLATILE

От
Brian Ceccarelli
Дата:
Dear Tom,

     Thanks for getting back to me so soon.

     I see that SQL functions seem to inline their calling arguments.=20

     My complaint remains.  That inlined function f_return_ver_id_4() is a =
STABLE function, inlined or not.  Postgres now calls it multiple times duri=
ng the transaction, even though the arguments to f_return_ver_id_4() have n=
ot changed.=20=20

     STABLE no longer means STABLE.  This behavior is killing my performanc=
e.  I am getting 500% to 30000% increase in latency.

     The problem does not stop with just this. Functions that I have no con=
trol over, like now(), behave like this too.  The function now() gets calle=
d multiple times in a transaction.    See the example below.

     All this behavior seems only to happen when I call the SQL function fr=
om a PL/PGSQL function.   It doesn't happen when I call the SQL function di=
rectly from the psql command line.


     For example, below, look what I had to do with the now() function.   I=
 had to change:


<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>

create or replace function f_get_current_price_all_keys_by_mp_rank(culture_=
in text, ver_id_min_in int8, ver_id_max_in int8, query_in tsquery)
   returns setof type_product_price_key
as
$BODY$
---------------------------------------------------------------------------=
------
--
--   This function retrieves the all primary keys to the product_price table
--   which point to the current prices of vendor products for a specific=20
--   tsearch query.
---------------------------------------------------------------------------=
------
   select pp.vp_id,
          pp.ver_id,
          pp.pm_id,
          max(pp.pp_price_eff_time) as pp_price_eff_time,
          pp.pp_min_qty
     from product_price pp
          inner join vendor_product           vp  on vp.vp_id   =3D pp.vp_i=
d=20
          inner join manufacturer_product_tsv tsv on tsv.mp_id  =3D vp.mp_i=
d and tsv.ver_id =3D pp.ver_id=20
          inner join manufacturer_product     mp  on mp.mp_id   =3D vp.mp_id
          inner join brand                    b   on b.brand_id =3D mp.bran=
d_id
          inner join manufacturer             m   on m.man_id   =3D b.man_id
          inner join vendor_vertical          vv  on vv.vdr_id  =3D vp.vdr_=
id and vv.ver_id =3D pp.ver_id
          inner join vendor                   v   on v.vdr_id   =3D vv.vdr_=
id
          inner join promotion                pm  on pm.pm_id   =3D pp.pm_i=
d=20=20=20=20=20=20=20
          left  join promo_vendor_behavior    vb  on vb.pm_id   =3D pp.pm_i=
d and vb.vdr_id =3D vp.vdr_id
          left  join promo_type_attrs         pa  on pa.pmta_id =3D vb.pmta=
_id
          inner join time_zone                t   on t.tz_id    =3D pp.tz_i=
d=20=20
    where pp.active_cd  =3D 1
      and tsv.search_vector @@@ $4
      and tsv.culture   =3D $1
      and pp.ver_id between $2 and $3
      and vp.active_cd  =3D 1
      and vv.active_cd  =3D 1
      and v.active_cd   =3D 1
      and mp.active_cd  =3D 1
      and b.active_cd   =3D 1
      and m.active_cd   =3D 1
      and pm.active_cd  =3D 1
      and ((pa.pmta_id is null) or (pa.pmta_id is not null and pa.active_cd=
 =3D 1))=20=20=20=20=20=20=20=20
      and ((pp.pp_end_time is null) or (pp.pp_end_time > now()))
      and pp.pp_price_eff_time <=3D now()
      and (pp.days_of_week   & (1 << (extract(dow from now()   at time zone=
 t.name)::int4))) <> 0
      and (pp.days_of_month  & (1 << (extract(day from now()   at time zone=
 t.name)::int4))) <> 0
      and (pp.months_of_year & (1 << (extract(month from now() at time zone=
 t.name)::int4))) <> 0
 group by pp.vp_id, pp.ver_id, pp.pm_id, pp.pp_min_qty
$BODY$
  language 'SQL' STABLE;
=20=20

<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>

To this PL/PGSQL function because now() gets called multiple times.  The fu=
nction above runs in 1.8 seconds.   The function below runs in 0.25 seconds.


create or replace function f_get_current_price_all_keys_by_mp_rank(culture_=
in text, ver_id_min_in int8, ver_id_max_in int8, query_in tsquery)
   returns setof type_product_price_key
as
$BODY$
---------------------------------------------------------------------------=
------
--
--   This function retrieves the all primary keys to the product_price table
--   which point to the current prices of vendor products for a specific=20
--   tsearch query.
---------------------------------------------------------------------------=
------
declare
    now_w    timestamp with time zone;
begin

   now_w :=3D now();
=20=20=20
   return query
   select pp.vp_id,
          pp.ver_id,
          pp.pm_id,
          max(pp.pp_price_eff_time) as pp_price_eff_time,
          pp.pp_min_qty
     from product_price pp
          inner join vendor_product           vp  on vp.vp_id   =3D pp.vp_i=
d=20
          inner join manufacturer_product_tsv tsv on tsv.mp_id  =3D vp.mp_i=
d and tsv.ver_id =3D pp.ver_id=20
          inner join manufacturer_product     mp  on mp.mp_id   =3D vp.mp_id
          inner join brand                    b   on b.brand_id =3D mp.bran=
d_id
          inner join manufacturer             m   on m.man_id   =3D b.man_id
          inner join vendor_vertical          vv  on vv.vdr_id  =3D vp.vdr_=
id and vv.ver_id =3D pp.ver_id
          inner join vendor                   v   on v.vdr_id   =3D vv.vdr_=
id
          inner join promotion                pm  on pm.pm_id   =3D pp.pm_i=
d=20=20=20=20=20=20=20
          left  join promo_vendor_behavior    vb  on vb.pm_id   =3D pp.pm_i=
d and vb.vdr_id =3D vp.vdr_id
          left  join promo_type_attrs         pa  on pa.pmta_id =3D vb.pmta=
_id
          inner join time_zone                t   on t.tz_id    =3D pp.tz_i=
d=20=20
    where pp.active_cd  =3D 1
      and tsv.search_vector @@@ $4
      and tsv.culture   =3D $1
      and pp.ver_id between $2 and $3
      and vp.active_cd  =3D 1
      and vv.active_cd  =3D 1
      and v.active_cd   =3D 1
      and mp.active_cd  =3D 1
      and b.active_cd   =3D 1
      and m.active_cd   =3D 1
      and pm.active_cd  =3D 1
      and ((pa.pmta_id is null) or (pa.pmta_id is not null and pa.active_cd=
 =3D 1))=20=20=20=20=20=20=20=20
      and ((pp.pp_end_time is null) or (pp.pp_end_time > now_w))
      and pp.pp_price_eff_time <=3D now()
      and (pp.days_of_week   & (1 << (extract(dow from now_w   at time zone=
 t.name)::int4))) <> 0
      and (pp.days_of_month  & (1 << (extract(day from now_w   at time zone=
 t.name)::int4))) <> 0
      and (pp.months_of_year & (1 << (extract(month from now_w at time zone=
 t.name)::int4))) <> 0
 group by pp.vp_id, pp.ver_id, pp.pm_id, pp.pp_min_qty;
   return;
end;
$BODY$
  language 'plpgsql' STABLE
  rows 1000;

=20

     But that is not all that's going on.   It turns out that in a SQL func=
tion, now() also gets called multiple times.




     The function f_return_ver_id_4() is a STABLE function.   Even when inl=
ined, the behavior should still be that the function gets called only once =
during a transaction


-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]=20
Sent: Tuesday, August 10, 2010 10:55 PM
To: Brian Ceccarelli
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #5611: SQL Function STABLE promoting to VOLATILE=20

"Brian Ceccarelli" <bceccarelli@net32.com> writes:
> --        1.  It seems that STABLE functions called within a SQL language
> get promoted to VOLATILE.=20

That has got nothing to do with it.  The change in behavior from 8.2
is due to the fact that set-returning SQL functions can now be inlined.
The statement in f_pass_4(),

   select into rows_affected_w count(*)
          from f_do_4(f_return_ver_id_4());

gets expanded (by inlining of f_do_4) into

   select into rows_affected_w count(*)
          from f_get_table_4()
          where ver_id =3D f_return_ver_id_4();

and then since f_get_table_4() returns multiple rows, the WHERE clause
gets evaluated multiple times.

As near as I can tell, your real complaint is that the side-effects of
f_return_ver_id_4 (ie, the RAISE NOTICE) happen more than once.
However, a function declared STABLE really shouldn't *have* any side
effects, because that marking authorizes the optimizer to assume that it
doesn't.  If you marked it VOLATILE then this optimization wouldn't
happen.

> --        2.  The raise notice in f_return_ver_id_4() also causes a memory
> leak in PGAdmin (on Windows).

Hm, you probably ought to mention that part on the pgadmin mailing
lists.  I don't know whether the appropriate people will notice it here.

            regards, tom lane

Re: BUG #5611: SQL Function STABLE promoting to VOLATILE

От
Tom Lane
Дата:
Brian Ceccarelli <bceccarelli@net32.com> writes:
>      STABLE no longer means STABLE.  This behavior is killing my performance.  I am getting 500% to 30000% increase
inlatency. 

You seem to be under the illusion that "stable" is a control knob for a
function result cache.  It is not, and never has been.  Postgres doesn't
do function result caching.

If you've constructed your app in such a way that it depends on not
inlining SQL set-returning functions, it's fairly easy to defeat that.
From memory, marking a SRF as either strict or volatile will do it
(although volatile might cause you problems elsewhere --- I suspect
your design is pretty brittle in this area).

            regards, tom lane

Re: BUG #5611: SQL Function STABLE promoting to VOLATILE

От
Robert Haas
Дата:
On Wed, Aug 11, 2010 at 11:01 AM, Brian Ceccarelli
<bceccarelli@net32.com> wrote:
> =A0 =A0 My complaint remains. =A0That inlined function f_return_ver_id_4(=
) is a STABLE function, inlined or not. =A0Postgres now calls it multiple t=
imes during the transaction, even though the arguments to f_return_ver_id_4=
() have not changed.
>
> =A0 =A0 STABLE no longer means STABLE. =A0This behavior is killing my per=
formance. =A0I am getting 500% to 30000% increase in latency.

We've never guaranteed that, and almost certainly never will.  Marking
a function STABLE means that the planner is *allowed to assume* that
the results won't change for a given set of arguments, not that it is
*required to prevent* it from being called multiple times with the
same set of arguments.

You can certainly prevent the function from being inlined, though
(perhaps, by writing it in PL/pgsql).

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: BUG #5611: SQL Function STABLE promoting to VOLATILE

От
Brian Ceccarelli
Дата:
OK.   The documentation says "allows the optimizer to optimize . . . ."    =
But then the example guarantees the one-time-only for a index scan conditio=
n.=20=20=20

=46rom the documentation:    8.4.4 Chapter 32 and 8.2.17 Chapter 33.

   .A STABLE function cannot modify the database and is guaranteed to retur=
n the same results given the same arguments for all rows within a  single s=
tatement. This category allows the optimizer to optimize multiple calls of =
the function to a single call. In particular, it is safe to use an expressi=
on containing such a function in an index scan condition. (Since an index s=
can will evaluate the comparison value only once, not once at each row, it =
is not valid to use a VOLATILE function in an index scan condition.)


The behavior of the optimizers <=3D 8.2 certainly fit the description.   Th=
e 8.4 behavior is vastly different.=20=20


I recommend that somebody change the documentation to say, "This category a=
llows, but does not guarantee, the optimizer to optimize multiple calls . .=
 . ."   That would be more clear.    And then mention the inlining deal, if=
 you haven't already.=20=20=20=20

There remains the problem with the now() function.   A SQL function repetit=
ively calls now().   Is that what you intended?
There remains the problem with PGAdmin memory leak.

I will change my SQL functions to PL/PGSQL functions.  I am glad that there=
 is a solution.

Thank you for your help.




-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]=20
Sent: Wednesday, August 11, 2010 11:33 AM
To: Brian Ceccarelli
Cc: Tom Lane; pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #5611: SQL Function STABLE promoting to VOLATILE

On Wed, Aug 11, 2010 at 11:01 AM, Brian Ceccarelli
<bceccarelli@net32.com> wrote:
> =A0 =A0 My complaint remains. =A0That inlined function f_return_ver_id_4(=
) is a STABLE function, inlined or not. =A0Postgres now calls it multiple t=
imes during the transaction, even though the arguments to f_return_ver_id_4=
() have not changed.
>
> =A0 =A0 STABLE no longer means STABLE. =A0This behavior is killing my per=
formance. =A0I am getting 500% to 30000% increase in latency.

We've never guaranteed that, and almost certainly never will.  Marking
a function STABLE means that the planner is *allowed to assume* that
the results won't change for a given set of arguments, not that it is
*required to prevent* it from being called multiple times with the
same set of arguments.

You can certainly prevent the function from being inlined, though
(perhaps, by writing it in PL/pgsql).

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: BUG #5611: SQL Function STABLE promoting to VOLATILE

От
Robert Haas
Дата:
On Wed, Aug 11, 2010 at 11:50 AM, Brian Ceccarelli
<bceccarelli@net32.com> wrote:
> OK. =A0 The documentation says "allows the optimizer to optimize . . . ."=
 =A0 =A0But then the example guarantees the one-time-only for a index scan =
condition.
>
> From the documentation: =A0 =A08.4.4 Chapter 32 and 8.2.17 Chapter 33.
>
> =A0 .A STABLE function cannot modify the database and is guaranteed to re=
turn the same results given the same arguments for all rows within a =A0sin=
gle statement. This category allows the optimizer to optimize multiple call=
s of the function to a single call. In particular, it is safe to use an exp=
ression containing such a function in an index scan condition. (Since an in=
dex scan will evaluate the comparison value only once, not once at each row=
, it is not valid to use a VOLATILE function in an index scan condition.)
>
> The behavior of the optimizers <=3D 8.2 certainly fit the description. =
=A0 The 8.4 behavior is vastly different.

Reading between the lines, I think I sense that this has got you
pretty frustrated, so in defense of the new behavior, let me just
mention that, in general, inlining SQL queries results in a HUGE
performance benefit.  It's sort of unfortunate that it doesn't work
out that way for you in this case, but I don't think it's a bad idea
in general.

*thinks*

In theory, the optimization Brian wants is possible here, right?  I
mean, you could replace the functional call with a Param, and pull the
Param out and make it an InitPlan.  Seems like that would generally be
a win, if you figure to loop more than once and the execution cost of
the function is not too tiny.

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: BUG #5611: SQL Function STABLE promoting to VOLATILE

От
Tom Lane
Дата:
Brian Ceccarelli <bceccarelli@net32.com> writes:
> OK.   The documentation says "allows the optimizer to optimize . . . ."    But then the example guarantees the
one-time-onlyfor a index scan condition.    

No, the documentation states that *if* an index scan is used, functions
involved in the indexscan's qual condition will be evaluated just once,
rather than once per row.  There is no "guarantee" of any sort that such
a plan will be chosen.  The point of the STABLE marking is to inform the
optimizer that it is safe to choose an index scan because the function's
results will not change compared to the naive SQL semantics wherein the
WHERE condition is evaluated for each row.  Thus, the guarantee actually
runs the other way: you are promising the optimizer that your function
doesn't have side effects or change its results intra-query.

            regards, tom lane

Re: BUG #5611: SQL Function STABLE promoting to VOLATILE

От
Brian Ceccarelli
Дата:
1.  Basically, I was assuming that "STABLE" was more than just a optimizer =
"hint".

2.  Is it not better to evaluate a STABLE function once, and pass the retur=
ned constant to a function, than pass the function itself and inline it?   =
Is not passing a constant more efficient than even an inline query?

Please show me an example where an inline query gets a performance boost.

Thanks!


-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]=20
Sent: Wednesday, August 11, 2010 4:34 PM
To: Brian Ceccarelli
Cc: Robert Haas; pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #5611: SQL Function STABLE promoting to VOLATILE=20

Brian Ceccarelli <bceccarelli@net32.com> writes:
> OK.   The documentation says "allows the optimizer to optimize . . . ."  =
  But then the example guarantees the one-time-only for a index scan condit=
ion.=20=20=20

No, the documentation states that *if* an index scan is used, functions
involved in the indexscan's qual condition will be evaluated just once,
rather than once per row.  There is no "guarantee" of any sort that such
a plan will be chosen.  The point of the STABLE marking is to inform the
optimizer that it is safe to choose an index scan because the function's
results will not change compared to the naive SQL semantics wherein the
WHERE condition is evaluated for each row.  Thus, the guarantee actually
runs the other way: you are promising the optimizer that your function
doesn't have side effects or change its results intra-query.

            regards, tom lane

Re: BUG #5611: SQL Function STABLE promoting to VOLATILE

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> In theory, the optimization Brian wants is possible here, right?  I
> mean, you could replace the functional call with a Param, and pull the
> Param out and make it an InitPlan.  Seems like that would generally be
> a win, if you figure to loop more than once and the execution cost of
> the function is not too tiny.

Yeah, possibly.  It would probably be difficult for the planner to
figure out where the cutover point is to make that worthwhile, though;
the point where you'd need to make the transformation is long before we
have any rowcount estimates.

I was about to suggest that Brian could make that happen manually
by writing ... FROM srf((SELECT expensive_func())) ... but I think
actually that will just amount to another way of disabling inlining.

            regards, tom lane

Re: BUG #5611: SQL Function STABLE promoting to VOLATILE

От
Alvaro Herrera
Дата:
Excerpts from Brian Ceccarelli's message of mié ago 11 16:47:50 -0400 2010:

> Please show me an example where an inline query gets a performance boost.

The reason it's a performance boost is that the query gets to be planned
as a single query, instead of there being a black-box that needs to be
planned separately.  I don't have any example handy to share.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #5611: SQL Function STABLE promoting to VOLATILE

От
Robert Haas
Дата:
On Wed, Aug 11, 2010 at 4:47 PM, Brian Ceccarelli <bceccarelli@net32.com> wrote:
> Please show me an example where an inline query gets a performance boost.

Sure.

rhaas=# create table example as select a from
generate_series(1,100000) a;      SELECT 100000
rhaas=# alter table example add primary key (a);
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
"example_pkey" for table "example"
ALTER TABLE
rhaas=# create function f() returns setof int as $$select a from
example$$ language sql stable;
rhaas=# explain analyze select * from f() where f = 1;
                                                               QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------
 Index Scan using example_pkey on example  (cost=0.00..8.28 rows=1
width=4) (actual time=0.102..0.103 rows=1 loops=1)
   Index Cond: (a = 1)
 Total runtime: 0.149 ms
(3 rows)
rhaas=# alter function f() volatile;
ALTER FUNCTION
rhaas=# explain analyze select * from f() where f = 1;
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Function Scan on f  (cost=0.25..12.75 rows=5 width=4) (actual
time=34.585..51.972 rows=1 loops=1)
   Filter: (f = 1)
 Total runtime: 63.277 ms
(3 rows)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: BUG #5611: SQL Function STABLE promoting to VOLATILE

От
Robert Haas
Дата:
On Wed, Aug 11, 2010 at 5:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> In theory, the optimization Brian wants is possible here, right? =A0I
>> mean, you could replace the functional call with a Param, and pull the
>> Param out and make it an InitPlan. =A0Seems like that would generally be
>> a win, if you figure to loop more than once and the execution cost of
>> the function is not too tiny.
>
> Yeah, possibly. =A0It would probably be difficult for the planner to
> figure out where the cutover point is to make that worthwhile, though;
> the point where you'd need to make the transformation is long before we
> have any rowcount estimates.

This may be a stupid question, but why does the transformation have to
be done before we have the row count estimates?  I think we're just
looking for a scan node with a filter condition that contains a stable
subexpression that's expensive enough to be worth factoring out, so I
feel like we have the necessary information when we're constructing
the RelOptInfo.  The startup cost is so trivial that I can't see
generating mutiple paths for it; I think you could just make a local
decision whether to apply the optimization or not.

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: BUG #5611: SQL Function STABLE promoting to VOLATILE

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Aug 11, 2010 at 5:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah, possibly.  It would probably be difficult for the planner to
>> figure out where the cutover point is to make that worthwhile, though;
>> the point where you'd need to make the transformation is long before we
>> have any rowcount estimates.

> This may be a stupid question, but why does the transformation have to
> be done before we have the row count estimates?

Well, I was thinking in terms of doing it when we do the SRF inlining.
It might be that we could get away with just having an arbitrary cost
limit like 100*cpu_operator_cost, and not think about how many rows
would actually be involved.

> I think we're just
> looking for a scan node with a filter condition that contains a stable
> subexpression that's expensive enough to be worth factoring out,

I do *not* want to grovel over every subexpression (and
sub-sub-expression, etc) in a query thinking about whether to do this.
That gets O(expensive) pretty quickly.  My idea of the appropriate scope
of a hack like this is just to prevent any performance loss from SRF
inlining.

Another approach we could take is to fix the implementation limitation
in inline_set_returning_function() about punting when there's a
sub-select in the arguments.  Then users could make this happen for
themselves when it matters.

            regards, tom lane

Re: BUG #5611: SQL Function STABLE promoting to VOLATILE

От
Robert Haas
Дата:
On Thu, Aug 12, 2010 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Aug 11, 2010 at 5:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Yeah, possibly. =A0It would probably be difficult for the planner to
>>> figure out where the cutover point is to make that worthwhile, though;
>>> the point where you'd need to make the transformation is long before we
>>> have any rowcount estimates.
>
>> This may be a stupid question, but why does the transformation have to
>> be done before we have the row count estimates?
>
> Well, I was thinking in terms of doing it when we do the SRF inlining.
> It might be that we could get away with just having an arbitrary cost
> limit like 100*cpu_operator_cost, and not think about how many rows
> would actually be involved.

I'm not exactly following this.  My guess is that the breakeven point
is going to be pretty low because I think Param nodes are pretty
cheap.

>> I think we're just
>> looking for a scan node with a filter condition that contains a stable
>> subexpression that's expensive enough to be worth factoring out,
>
> I do *not* want to grovel over every subexpression (and
> sub-sub-expression, etc) in a query thinking about whether to do this.
> That gets O(expensive) pretty quickly. =A0My idea of the appropriate scope
> of a hack like this is just to prevent any performance loss from SRF
> inlining.

Well, that's certainly a good place to start, but I was thinking that
it would be nice to optimize things like this:

SELECT * FROM foo WHERE somecolumn =3D somefunc();

This is OK if we choose a straight index scan, but it's probably very
much worth optimizing if we end up doing anything else.  If that's too
hairy, then maybe not, but it's not obvious to me why it would be
expensive.

> Another approach we could take is to fix the implementation limitation
> in inline_set_returning_function() about punting when there's a
> sub-select in the arguments. =A0Then users could make this happen for
> themselves when it matters.

Hmm.  I'm usually in favor of removing implementation restrictions,
but I'm not too sure about the effects of removing this one.  It seems
like it would be nicer to have a solution that didn't require the user
to write the query a certain way.

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: BUG #5611: SQL Function STABLE promoting to VOLATILE

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Aug 12, 2010 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Well, I was thinking in terms of doing it when we do the SRF inlining.
>> It might be that we could get away with just having an arbitrary cost
>> limit like 100*cpu_operator_cost, and not think about how many rows
>> would actually be involved.

> I'm not exactly following this.  My guess is that the breakeven point
> is going to be pretty low because I think Param nodes are pretty
> cheap.

If you have any significant number of executions of the expression, then
of course converting it to an initplan is a win.  What I'm worried about
is where you have just a small number (like maybe only one, if it gets
converted to an indexqual for instance).  Then the added expense of
setting up the initplan isn't going to be repaid.  As long as the
expression is pretty expensive, the percentage overhead of wrapping it
in an initplan will probably be tolerable anyway, but I'm not sure where
the threshold of "pretty expensive" is for that.

> Well, that's certainly a good place to start, but I was thinking that
> it would be nice to optimize things like this:

> SELECT * FROM foo WHERE somecolumn = somefunc();

> This is OK if we choose a straight index scan, but it's probably very
> much worth optimizing if we end up doing anything else.  If that's too
> hairy, then maybe not, but it's not obvious to me why it would be
> expensive.

Because you have to look at every subexpression of every subexpression
to figure out if it's (a) stable and (b) expensive.  Each of those
checks is un-cheap in itself, and if you blindly apply them at every
node of an expression tree the cost will be exponential.

            regards, tom lane

Re: BUG #5611: SQL Function STABLE promoting to VOLATILE

От
Robert Haas
Дата:
On Thu, Aug 12, 2010 at 11:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm not exactly following this. =A0My guess is that the breakeven point
>> is going to be pretty low because I think Param nodes are pretty
>> cheap.
>
> If you have any significant number of executions of the expression, then
> of course converting it to an initplan is a win. =A0What I'm worried about
> is where you have just a small number (like maybe only one, if it gets
> converted to an indexqual for instance). =A0Then the added expense of
> setting up the initplan isn't going to be repaid. =A0As long as the
> expression is pretty expensive, the percentage overhead of wrapping it
> in an initplan will probably be tolerable anyway, but I'm not sure where
> the threshold of "pretty expensive" is for that.

Oh, I see.  It seems a lot more elegant if we can start by determining
whether the expression is in a context where it's likely to be
executed more than once.

*thinks*

I wonder if we could make this decision mostly based on node types.
Maybe it's reasonable to say that if we're doing say, a Seq Scan, we
always assume it's going to get evaluated more than once.  Yeah, the
table could have <2 rows in it, but mostly it won't, and I don't know
that it's wise to optimize for that case even if we *think* that's
what the statistics are telling us.  Similarly for a Function Scan,
CTE Scan, Worktable Scan, etc.  We *could* look at the row estimates,
but I bet it isn't necessary.  On the other hand, for an index qual,
it's probably pointless.  I guess the hard case is a filter qual on an
index scan... it's not too clear to me what the right thing to do is
in that case.

>> Well, that's certainly a good place to start, but I was thinking that
>> it would be nice to optimize things like this:
>
>> SELECT * FROM foo WHERE somecolumn =3D somefunc();
>
>> This is OK if we choose a straight index scan, but it's probably very
>> much worth optimizing if we end up doing anything else. =A0If that's too
>> hairy, then maybe not, but it's not obvious to me why it would be
>> expensive.
>
> Because you have to look at every subexpression of every subexpression
> to figure out if it's (a) stable and (b) expensive. =A0Each of those
> checks is un-cheap in itself, and if you blindly apply them at every
> node of an expression tree the cost will be exponential.

I think you'd need some kind of expression tree walker that builds up
a list of maximal stable subexpression trees.  It would be nice to
figure this out at some point in the process where we already have to
check volatility anyway.

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company