Обсуждение: indexed function performance

Поиск
Список
Период
Сортировка

indexed function performance

От
"mikelin"
Дата:
I'm trying to do a complicated ordering of a table with ~40k rows.

I have an IMMUTABLE plpgsql function that returns an integer that I'll
be sorting by, but the function is slow, so I want to cache it somehow.

I found in the docs:
"the index expressions are not recomputed during an indexed search,
since they are already stored in the index."
- http://www.postgresql.org/docs/8.1/static/indexes-expressional.html

which sounds like caching, so I created an index on that function,
expecting stellar performance, but the performance turned out to be
pretty bad:

words=# explain analyse select * from word order by
word_difficulty(word) limit 100;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..90.57 rows=100 width=48) (actual
time=43.718..3891.817 rows=100 loops=1)
   ->  Index Scan using word_word_difficulty_idx on word
(cost=0.00..37989.19 rows=41946 width=48) (actual time=43.711..3891.251
rows=100 loops=1)
 Total runtime: 3892.253 ms
(3 rows)


I wouldn't have expected that Index Scan to be so slow. In comparison,
I added another column to the table, and cached the results there, and
the index scan on the new column is way faster:

words=# alter table word add column difficulty integer;
ALTER TABLE
words=# update word set difficulty=word_difficulty(word);
UPDATE 41946
words=# create index word_difficulty_idx on word(difficulty);
CREATE INDEX

words=# explain analyse select * from word order by difficulty limit
100;
                                                                QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..89.89 rows=100 width=48) (actual time=0.028..0.646
rows=100 loops=1)
   ->  Index Scan using word_difficulty_idx on word
(cost=0.00..37706.32 rows=41946 width=48) (actual time=0.023..0.341
rows=100 loops=1)
 Total runtime: 0.870 ms
(3 rows)


So I'll probably just end up using the latter approach, but I'm
curious, so I ask if anyone can explain why the indexed function is so
slow.

Thanks!
Mikelin


Re: indexed function performance

От
Tom Lane
Дата:
"mikelin" <mikelin.ca@gmail.com> writes:
> which sounds like caching, so I created an index on that function,
> expecting stellar performance, but the performance turned out to be
> pretty bad:

> words=# explain analyse select * from word order by
> word_difficulty(word) limit 100;

> I wouldn't have expected that Index Scan to be so slow.

The index scan actually is not slow; the problem is that the planner
neglects to suppress the computation of the sort-key columns in the
output rows, even though they're not referenced anywhere.  Normally
that doesn't matter a whole lot, but if it's a really expensive function
then it does matter.  We just noticed this problem a few weeks ago:
http://archives.postgresql.org/pgsql-performance/2006-11/msg00054.php

I'm hoping to fix this for 8.3, but suspect that the fix will be too
invasive to consider back-patching to older releases.

            regards, tom lane