Обсуждение: Mystery: functions are slow with group by, but...

Поиск
Список
Период
Сортировка

Mystery: functions are slow with group by, but...

От
Allan Kelly
Дата:
Hi there. I appear to have much quicker performance by asking for more work!

First of all, though: can I see the definition of a function in psql?
\df total_live_a4_properties( text )
just gives me the args, return type etc. Hox can I see the AS '...' clause?

Q2:
I have a big performance problem using a function:

create function total_live_a4_properties( text )
returns int4
as
'select count(*) from properties where
status<>''sold'' and status<>''suspended''
and a4 = $1'
language 'sql';

This takes 5-10 _minutes_ :

select a4, total_live_a4_properties( a4 )
from properties group by a4;

There are 5132 records in the properties table, and 64 distinct a4 values.

Explain gives

NOTICE: QUERY PLAN:

Group ( cost=411.70 rows=5112 width=12) -> Sort ( cost=411.70 rows=5112 width=12)   -> Seq Scan on Properties (
cost=411.70rows=5112 width=12)
 

However this next one is < 3 _seconds_!

select a4, count(*), (100*count(*)/total_live_a4_properties( a4 )) as percentage
from properties group by a4;

Explain gives

NOTICE: QUERY PLAN:

Aggregate ( cost=411.70 rows=5112 width=12) Group ( cost=411.70 rows=5112 width=12)   -> Sort ( cost=411.70 rows=5112
width=12)    -> Seq Scan on Properties ( cost=411.70 rows=5112 width=12)
 

Can anyone tell me what's going on?
thanks, al.

-- 
# Allan Kelly                                http://www.plotsearch.co.uk# (+44) (0)131 524 8500#
allan.kelly@buildstore.co.uk...    ..# /Software Engineer/i            . .    .    . .# ------------------------------
*      . .     .    . .# "If you are a Visual Basic programmer,   *       . .     .#  these details are none of your
business."       *       .  . .# Mr Bunny's Guide to Active X, by Carlton Egremont III      *     . .#
------------------------------     vi: set noet tw=80 sts=4 ts=8  : .
 


Re: Mystery: functions are slow with group by, but...

От
Tom Lane
Дата:
Allan Kelly <allan.kelly@buildstore.co.uk> writes:
> I have a big performance problem using a function:

> This takes 5-10 _minutes_ :
> select a4, total_live_a4_properties( a4 )
> from properties group by a4;

> There are 5132 records in the properties table, and 64 distinct a4 values.

> However this next one is < 3 _seconds_!
> select a4, count(*), (100*count(*)/total_live_a4_properties( a4 )) as percentage
> from properties group by a4;

> Can anyone tell me what's going on?

You were right to look to EXPLAIN for a hint about the problem, but
unfortunately EXPLAIN doesn't show you anything about where a function
call is evaluated in the plan tree.  If you dig into the EXPLAIN VERBOSE
output, you can see where it's evaluated.  It turns out that in the
first query, 6.5.* evaluates the function call in the seqscan's target
list, which means that the function is evaluated separately at each row
of the table :-(.  Then, all the duplicate rows are thrown away by the
group step, so you just did 5000+ useless function evaluations, each of
which needed a scan of the table.  In the second case, the behavior is
different because of the presence of an aggregate function --- the
planner knows it cannot evaluate that until the top level, so the whole
expression including the user function call is done at the output of the
aggregate step.  Net result, only one function evaluation per group.

This is, of course, pretty braindead.  It's fixed in 7.0: functions on
group columns are not evaluated until after the group step, with or
without aggregates.
        regards, tom lane