Обсуждение: Mystery: functions are slow with group by, but...
Hi there. I appear to have much quicker performance by asking for more work! First of all, though: can I see the definition of a function in psql? \df total_live_a4_properties( text ) just gives me the args, return type etc. Hox can I see the AS '...' clause? Q2: I have a big performance problem using a function: create function total_live_a4_properties( text ) returns int4 as 'select count(*) from properties where status<>''sold'' and status<>''suspended'' and a4 = $1' language 'sql'; This takes 5-10 _minutes_ : select a4, total_live_a4_properties( a4 ) from properties group by a4; There are 5132 records in the properties table, and 64 distinct a4 values. Explain gives NOTICE: QUERY PLAN: Group ( cost=411.70 rows=5112 width=12) -> Sort ( cost=411.70 rows=5112 width=12) -> Seq Scan on Properties ( cost=411.70rows=5112 width=12) However this next one is < 3 _seconds_! select a4, count(*), (100*count(*)/total_live_a4_properties( a4 )) as percentage from properties group by a4; Explain gives NOTICE: QUERY PLAN: Aggregate ( cost=411.70 rows=5112 width=12) Group ( cost=411.70 rows=5112 width=12) -> Sort ( cost=411.70 rows=5112 width=12) -> Seq Scan on Properties ( cost=411.70 rows=5112 width=12) Can anyone tell me what's going on? thanks, al. -- # Allan Kelly http://www.plotsearch.co.uk# (+44) (0)131 524 8500# allan.kelly@buildstore.co.uk... ..# /Software Engineer/i . . . . .# ------------------------------ * . . . . .# "If you are a Visual Basic programmer, * . . .# these details are none of your business." * . . .# Mr Bunny's Guide to Active X, by Carlton Egremont III * . .# ------------------------------ vi: set noet tw=80 sts=4 ts=8 : .
Allan Kelly <allan.kelly@buildstore.co.uk> writes: > I have a big performance problem using a function: > This takes 5-10 _minutes_ : > select a4, total_live_a4_properties( a4 ) > from properties group by a4; > There are 5132 records in the properties table, and 64 distinct a4 values. > However this next one is < 3 _seconds_! > select a4, count(*), (100*count(*)/total_live_a4_properties( a4 )) as percentage > from properties group by a4; > Can anyone tell me what's going on? You were right to look to EXPLAIN for a hint about the problem, but unfortunately EXPLAIN doesn't show you anything about where a function call is evaluated in the plan tree. If you dig into the EXPLAIN VERBOSE output, you can see where it's evaluated. It turns out that in the first query, 6.5.* evaluates the function call in the seqscan's target list, which means that the function is evaluated separately at each row of the table :-(. Then, all the duplicate rows are thrown away by the group step, so you just did 5000+ useless function evaluations, each of which needed a scan of the table. In the second case, the behavior is different because of the presence of an aggregate function --- the planner knows it cannot evaluate that until the top level, so the whole expression including the user function call is done at the output of the aggregate step. Net result, only one function evaluation per group. This is, of course, pretty braindead. It's fixed in 7.0: functions on group columns are not evaluated until after the group step, with or without aggregates. regards, tom lane