Обсуждение: Horrible/never returning performance using stable function on WHERE clause

Поиск
Список
Период
Сортировка

Horrible/never returning performance using stable function on WHERE clause

От
Achilleas Mantzios
Дата:
Hello list,
I have written some functions to extract some data from our DB, from an hierarchical structure, the problem is that if
afunction doing lookups is defined as STABLE in the WHERE clause the performance  
is horrible. What I am trying to achieve is given a specific node in an hierarchical structure (a machine definition)
tofind all its equivalent sister nodes and then for a specific instance of this  
hierarchy to find the max RH (running hours) among all sister nodes.I am using some functions/opers from intarray. Here
arethe functions : 

Compares two nodes for sister property:

CREATE OR REPLACE FUNCTION public.is_defid_sister_node(vdefid1 integer, vdefid2 integer)
  RETURNS boolean
  LANGUAGE plpgsql
  STABLE
AS $function$DECLARE
vparents1 INTEGER[];
vparents2 INTEGER[];
descr1 TEXT;
descr2 TEXT;
i INTEGER;
BEGIN

                 SELECT
COALESCE(partid,0)||'__'||regexp_replace(coalesce(description,''),'[nN][oO]([0-9]+)',''),parentsinto descr1,vparents1
FROMmachdefs where defid=vdefid1; 
                 SELECT
COALESCE(partid,0)||'__'||regexp_replace(coalesce(description,''),'[nN][oO]([0-9]+)',''),parentsinto descr2,vparents2
FROMmachdefs where defid=vdefid2; 

         IF (level(vparents1) = 0 AND level(vparents2) = 0) THEN
                 RETURN vdefid1=vdefid2;
         ELSIF (level(vparents1) <> level(vparents2)) THEN
                 RETURN false;
         ELSE
                 RETURN ((descr1=descr2) AND is_defid_sister_node(first(vparents1),first(vparents2)));
         END IF;

END;$function$

Finds the set of sister nodes for a given node:

CREATE OR REPLACE FUNCTION public.get_machdef_sister_defids(vdefid integer)
  RETURNS INTEGER[]
  LANGUAGE plpgsql
  STABLE
AS $function$
DECLARE
tmp INTEGER[];
BEGIN

         select (select array_agg(mdsis.defid) FROM machdefs mdsis WHERE mdsis.machtypeid=md.machtypeid AND
level(mdsis.parents)=level(md.parents)AND last(mdsis.parents)=last(md.parents) AND  
is_defid_sister_node(mdsis.defid,md.defid)  ) INTO tmp from machdefs md where md.defid=vdefid;

         IF (tmp IS NULL) THEN
                 tmp := '{}';
         END IF;
         RETURN tmp;

END;
$function$

Finds max RH for a given tree instance among all sister nodes of a given node :

CREATE OR REPLACE FUNCTION public.get_machdef_sister_defids_maxrh(vvslid INTEGER,vdefid INTEGER)
  RETURNS INTEGER
  LANGUAGE plpgsql
  STABLE
AS $function$
DECLARE
tmp INTEGER;
BEGIN
         select max(rh) into tmp from items where vslwhid=vvslid and itoar(defid) ~ get_machdef_sister_defids(vdefid);
         RETURN tmp;
END;
$function$


Query :
select get_machdef_sister_defids_maxrh(479,319435);

never ends (I have waited till 2-3 minutes), however, *doing the wrong thing* and declaring get_machdef_sister_defids
asIMMUTABLE makes the above call return fast : 

# select get_machdef_sister_defids_maxrh(479,319435);
  get_machdef_sister_defids_maxrh
---------------------------------
                            10320
(1 row)

Time: 110.211 ms

We are using PostgreSQL 9.3. Shouldn't the optimizer use a single call to
get_machdef_sister_defids in get_machdef_sister_defids_maxrh ??

Defining get_machdef_sister_defids back to STABLE and forcing get_machdef_sister_defids_maxrh to only call
get_machdef_sister_defidsonce makes things work again : 

CREATE OR REPLACE FUNCTION public.get_machdef_sister_defids_maxrh(vvslid integer, vdefid integer)
  RETURNS integer
  LANGUAGE plpgsql
  STABLE
AS $function$
DECLARE
tmp INTEGER;
tmppars INTEGER[];
BEGIN
         tmppars := get_machdef_sister_defids(vdefid);
         select max(rh) into tmp from items where vslwhid=vvslid and itoar(defid) ~ tmppars;
         RETURN tmp;
END;
$function$

# select get_machdef_sister_defids_maxrh(479,319435);
  get_machdef_sister_defids_maxrh
---------------------------------
                            10320
(1 row)

Time: 111.318 ms

Is this expected ?

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt



Re: Horrible/never returning performance using stable function on WHERE clause

От
David Rowley
Дата:
On 29 March 2016 at 20:01, Achilleas Mantzios
<achill@matrix.gatewaynet.com> wrote:
> We are using PostgreSQL 9.3. Shouldn't the optimizer use a single call to
> get_machdef_sister_defids in get_machdef_sister_defids_maxrh ??

It shouldn't be up to the optimizer to evaluate a STABLE function.
Only IMMUTABLE functions will be evaluated during planning.

What's not that clear to me is if the planner might be able to work a
bit harder to create an "Initplan" for stable functions with Const
arguments. Right now I can't quite see a reason why that couldn't be
improved upon, after all, the documentation does claim that a STABLE
function during a "single table scan it will consistently return the
same result for the same argument values".

However it would be quite simple just for you to force the STABLE
function to be evaluated once, instead of once per row, just by
modifying your query to become:

select max(rh) into tmp from items where vslwhid=vvslid and
itoar(defid) ~ (select get_machdef_sister_defids(vdefid));

Viewing the EXPLAIN of this, you'll notice the InitPlan, which will
evaluate the function and allow the use the output value as a
parameter in the main query.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Horrible/never returning performance using stable function on WHERE clause

От
Achilleas Mantzios
Дата:
Hello David

On 29/03/2016 14:04, David Rowley wrote:
On 29 March 2016 at 20:01, Achilleas Mantzios
<achill@matrix.gatewaynet.com> wrote:
We are using PostgreSQL 9.3. Shouldn't the optimizer use a single call to
get_machdef_sister_defids in get_machdef_sister_defids_maxrh ??

It shouldn't be up to the optimizer to evaluate a STABLE function.
Only IMMUTABLE functions will be evaluated during planning.
What's not that clear to me is if the planner might be able to work a
bit harder to create an "Initplan" for stable functions with Const
arguments. Right now I can't quite see a reason why that couldn't be
improved upon, after all, the documentation does claim that a STABLE
function during a "single table scan it will consistently return the
same result for the same argument values".
And to add here the docs (http://www.postgresql.org/docs/9.3/static/xfunc-volatility.html) also say :
"A STABLE function cannot modify the database and is guaranteed to return the same results given the same arguments for all rows within a single statement. This category allows the optimizer to optimize multiple calls of the function to a single call. In particular, it is safe to use an expression containing such a function in an index scan condition."
However it would be quite simple just for you to force the STABLE
function to be evaluated once, instead of once per row, just by
modifying your query to become:

select max(rh) into tmp from items where vslwhid=vvslid and
itoar(defid) ~ (select get_machdef_sister_defids(vdefid));

Viewing the EXPLAIN of this, you'll notice the InitPlan, which will
evaluate the function and allow the use the output value as a
parameter in the main query.
That's true, this worked indeed. But still cannot understand why the distinction between ~ get_machdef_sister_defids(...) and  ~ (SELECT get_machdef_sister_defids(...)).
Why is the planner forced in the second case and not in the first, since clearly the input argument is not dependent on any query result? (judging by the docs).


-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt