Well, I've improved my regex text searches to actually use the indexes properly
now for the basic case, but I have found another 'problem' (or feature, call it
what you will ;) - to demonstrate:
with locale turned on (the default RPMS are like this):
the following takes a LONG time to run on 1.6 million records:
-------------------------------------
explain select isbn, count from inv_word_i where
word~'^foo'
order by count
Sort (cost=35148.70 rows=353 width=16)
-> Index Scan using i3 on inv_word_i (cost=35148.70 rows=353 width=16)
-------------------------------------
the following runs instantly, and does (nearly) the same thing:
-------------------------------------
explain select isbn, count from inv_word_i where
word>='foo' and word<'fop'
order by count
Sort (cost=11716.57 rows=183852 width=16)
-> Index Scan using i3 on inv_word_i (cost=11716.57 rows=183852 width=16)
-------------------------------------
but what about the following? :
-------------------------------------
explain select isbn , sum(count) from inv_word_i where
(word>='window' and word<'windox')
or
(word>='idiot' and word<'idiou')
group by isbn
order by sum(count) desc
Sort (cost=70068.84 rows=605525 width=16)
-> Aggregate (cost=70068.84 rows=605525 width=16)
-> Group (cost=70068.84 rows=605525 width=16)
-> Sort (cost=70068.84 rows=605525 width=16)
-> Seq Scan on inv_word_i (cost=70068.84 rows=605525 width=16)
-------------------------------------
this is the fastest way I've found so far to do a multi-word search (window and
idiot as the root words in this case), you note it does NOT use the indexes,
but falls back to a linear scan?!? it takes well over 30 seconds (much much too
long)
I've tried a LOT of different combinations, and have yet to find a way of
getting the system to use the indexes correctly to do what I want, the closest
I've ffound is using a select intersect select method to find all docs
containing both word (what I really want, although the query above is a ranked
or query), but it gets slow as soon as I select more than one field for the
results (I need to line isbn in this case to another database in the final
application)
I assume there is some reason the system falls back to a linear scan in this
case? it seems two index lookups would be much much more efficient..
am I missing something again?
--
------------------------------------------------------------
Stuart Woolford, stuartw@newmail.net
Unix Consultant.
Software Developer.
Supra Club of New Zealand.
------------------------------------------------------------