Re: LIKE search and performance

Поиск
Список
Период
Сортировка
От Alexander Staubo
Тема Re: LIKE search and performance
Дата
Msg-id 88daf38c0705230905u1a981102k691b47bd36011a80@mail.gmail.com
обсуждение исходный текст
Ответ на LIKE search and performance  ("Andy" <frum@ar-sd.net>)
Ответы Re: LIKE search and performance  (James Mansion <james@mansionfamily.plus.com>)
Список pgsql-performance
On 5/23/07, Andy <frum@ar-sd.net> wrote:
> An example would be:
> SELECT * FROM table
>                              WHERE name like '%john%' or street like '%srt%'
>
> Anyway, the query planner always does seq scan on the whole table and that
> takes some time. How can this be optimized or made in another way to be
> faster?

There's no algorithm in existence that can "index" arbitrary
substrings the way you think. The only rational way to accomplish this
is to first break the text into substrings using some algorithm (eg.,
words delimited by whitespace and punctuation), and index the
substrings individually.

You can do this using vanilla PostgreSQL, and you can use Tsearch2
and/or its GIN indexes to help speed up the searches. The simplest
solution would be to put all the substrings in a table, one row per
substring, along with an attribute referencing the source table/row.

At this point you have effectively reduced your search space: you can
use a query to isolate the words in your "dictionary" that contain the
substrings. So a query might be:

  select ... from persons where id in (
    select person_id from person_words
    where word like '%john%';
  )

The "like" search, even though it will use a sequential scan, is bound
to be much faster on these small words than searching for substrings
through large gobs of text in the persons table.

Note that PostgreSQL *can* exploit the index for *prefix* matching, if
you tell it to use the right operator class:

  create index persons_name_index on persons (name text_pattern_ops);

or, if you're using varchars (though there is rarely any reason to):

  create index persons_name_index on persons (name varchar_pattern_ops);

(These two may be identical internally. Check the docs.)

Now you can do:

  select ... from persons where name like 'john%';

which will yield a query plan such as this:

 Index Scan using persons_name_index on persons  (cost=0.00..8.27
rows=1 width=29) (actual time=0.184..0.373 rows=51 loops=1)
   Index Cond: ((name ~>=~ 'john'::text) AND (name ~<~ 'joho'::text))
   Filter: (title ~~ 'john%'::text)

Alexander.

В списке pgsql-performance по дате отправления:

Предыдущее
От: Guido Neitzer
Дата:
Сообщение: Re: LIKE search and performance
Следующее
От: Rigmor Ukuhe
Дата:
Сообщение: Re: LIKE search and performance