Обсуждение: Searching a gin index

Поиск
Список
Период
Сортировка

Searching a gin index

От
James Dooley
Дата:
Hi again,

I have set my configuration as default and I have created a GIN index on three columns, like this

create index textsearch_index on products using gin(strip( to_tsvector( 'my_config', title || '' || description || '' || name)))

Searching these columns the way I have

... AND (title || '' || description || '' || name) @@ plainto_tsquery('car')

seems not to be correct since it's taking as much time as non-indexed. Also changing the word car to cars returns nothing even though

select * from plainto_tsquery('cars')

returns 'car'

Could you please point me in the right direction ?

James

Re: Searching a gin index

От
Oleg Bartunov
Дата:
James,

you provide us a little useful information. If you really want help
you need to provide all information about your setup, sample queries and
explain analyze, test data illustrated your problem.

I, personally, usually prefer to have copy-pasteable information.
I think we all are quite busy, so don't expect to get help if you can't
give yours attention.

For example, this post would be reduced to the simple test query

test=# select version();
                                 version
------------------------------------------------------------------------
  PostgreSQL 8.3.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.2.3
(1 row)

test=# select strip(to_tsvector('english', 'cars and vehicles')) @@
        plainto_tsquery('english', 'cars');
  ?column?
----------
  t

If not, you have problem.

Oleg
On Fri, 6 Feb 2009, James Dooley wrote:

> Hi again,
>
> I have set my configuration as default and I have created a GIN index on
> three columns, like this
>
> create index textsearch_index on products using gin(strip( to_tsvector(
> 'my_config', title || '' || description || '' || name)))
>
> Searching these columns the way I have
>
> ... AND (title || '' || description || '' || name) @@ plainto_tsquery('car')
>
> seems not to be correct since it's taking as much time as non-indexed. Also
> changing the word car to cars returns nothing even though
>
> select * from plainto_tsquery('cars')
>
> returns 'car'
>
> Could you please point me in the right direction ?
>
> James
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Searching a gin index

От
Richard Huxton
Дата:
James Dooley wrote:
> Hi again,
>
> I have set my configuration as default and I have created a GIN index on
> three columns, like this
>
> create index textsearch_index on products using gin(strip( to_tsvector(
> 'my_config', title || '' || description || '' || name)))
>
> Searching these columns the way I have
>
> ... AND (title || '' || description || '' || name) @@ plainto_tsquery('car')
>
> seems not to be correct since it's taking as much time as non-indexed.

PG's planner isn't smart enough to transform a complex expression so as
to use a functional index (which is what you've got). You need to
mention the function explicitly.

So, if you had:
  CREATE INDEX lowercase_idx ON mytable ( lower(mycolumn) )
You then need to search against "lower(mycolumn)" and not just expect
the planner to notice that mycolumn="abc" could use the index.

The following should work for you as a starting point:

CREATE TABLE tsearch_tbl (id SERIAL, title text, body text,
PRIMARY KEY (id));

INSERT INTO tsearch_tbl (title, body)
SELECT 'title number ' || n, 'This is body number ' || n
FROM generate_series(1,9999) n;

ANALYSE tsearch_tbl;

CREATE INDEX tsearch_tbl_words_idx ON tsearch_tbl USING gin (
to_tsvector('english', title || body) );

EXPLAIN ANALYSE SELECT * FROM tsearch_tbl WHERE to_tsvector('english',
title || body) @@ to_tsquery('17');
                                                          QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tsearch_tbl  (cost=4.34..34.76 rows=10 width=45)
(actual time=0.067..0.067 rows=1 loops=1)
   Recheck Cond: (to_tsvector('english'::regconfig, (title || body)) @@
to_tsquery('17'::text))
   ->  Bitmap Index Scan on tsearch_tbl_words_idx  (cost=0.00..4.34
rows=10 width=0) (actual time=0.059..0.059 rows=1 loops=1)
         Index Cond: (to_tsvector('english'::regconfig, (title || body))
@@ to_tsquery('17'::text))
 Total runtime: 0.121 ms


Note that you'll have problems if any of your text-fields contain nulls
(since null || anything = null).

Personally, unless I'm dealing with a large table, I like to have a
separate tsvector column which I keep up to date with triggers. It makes
it easier to debug problems.

--
  Richard Huxton
  Archonet Ltd

Re: Searching a gin index

От
James Dooley
Дата:
Oleg, but I am only interested in whether or not the syntax of my search-query is correct.

Having created the index as I mentioned above, would the correct way of searching and using that index be

... AND (title || '' || description || '' || name) @@ plainto_tsquery('car')

or should it be as Richard just mentioned

... AND to_tsvector(title || '' || description || '' || name) @@ plainto_tsquery('car')

or some other way ?



On Fri, Feb 6, 2009 at 3:30 PM, Richard Huxton <dev@archonet.com> wrote:
James Dooley wrote:
> Hi again,
>
> I have set my configuration as default and I have created a GIN index on
> three columns, like this
>
> create index textsearch_index on products using gin(strip( to_tsvector(
> 'my_config', title || '' || description || '' || name)))
>
> Searching these columns the way I have
>
> ... AND (title || '' || description || '' || name) @@ plainto_tsquery('car')
>
> seems not to be correct since it's taking as much time as non-indexed.

PG's planner isn't smart enough to transform a complex expression so as
to use a functional index (which is what you've got). You need to
mention the function explicitly.

So, if you had:
 CREATE INDEX lowercase_idx ON mytable ( lower(mycolumn) )
You then need to search against "lower(mycolumn)" and not just expect
the planner to notice that mycolumn="abc" could use the index.

The following should work for you as a starting point:

CREATE TABLE tsearch_tbl (id SERIAL, title text, body text,
PRIMARY KEY (id));

INSERT INTO tsearch_tbl (title, body)
SELECT 'title number ' || n, 'This is body number ' || n
FROM generate_series(1,9999) n;

ANALYSE tsearch_tbl;

CREATE INDEX tsearch_tbl_words_idx ON tsearch_tbl USING gin (
to_tsvector('english', title || body) );

EXPLAIN ANALYSE SELECT * FROM tsearch_tbl WHERE to_tsvector('english',
title || body) @@ to_tsquery('17');
                                                         QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tsearch_tbl  (cost=4.34..34.76 rows=10 width=45)
(actual time=0.067..0.067 rows=1 loops=1)
  Recheck Cond: (to_tsvector('english'::regconfig, (title || body)) @@
to_tsquery('17'::text))
  ->  Bitmap Index Scan on tsearch_tbl_words_idx  (cost=0.00..4.34
rows=10 width=0) (actual time=0.059..0.059 rows=1 loops=1)
        Index Cond: (to_tsvector('english'::regconfig, (title || body))
@@ to_tsquery('17'::text))
 Total runtime: 0.121 ms


Note that you'll have problems if any of your text-fields contain nulls
(since null || anything = null).

Personally, unless I'm dealing with a large table, I like to have a
separate tsvector column which I keep up to date with triggers. It makes
it easier to debug problems.

--
 Richard Huxton
 Archonet Ltd

Re: Searching a gin index

От
Richard Huxton
Дата:
James - don't repeat the whole of the previous message in your replies,
trim it to the relevant part. The message is already archived on the list.

James Dooley wrote:
> Oleg, but I am only interested in whether or not the syntax of my
> search-query is correct.
>
> Having created the index as I mentioned above, would the correct way of
> searching and using that index be
>
> ... AND (title || '' || description || '' || name) @@ plainto_tsquery('car')

That looks like a correct syntax, but it's unlikely to use your index -
your index was created on to_tsvector('my_config', ...)

> or should it be as Richard just mentioned
>
> ... AND to_tsvector(title || '' || description || '' || name) @@
> plainto_tsquery('car')
>
> or some other way ?

Try playing around with the example I gave, and try variations on it. Do
you understand why you have to supply a config when creating the
functional index? Do you understand the difference between
plainto_tsquery() and to_tsquery()?

--
  Richard Huxton
  Archonet Ltd

Re: Searching a gin index

От
Oleg Bartunov
Дата:
James,

syntax is documented on
http://www.postgresql.org/docs/8.3/static/textsearch-tables.html#TEXTSEARCH-TABLES-SEARCH
and in the Introduction
http://www.postgresql.org/docs/8.3/static/textsearch-intro.html#TEXTSEARCH-MATCHING
text-search operator was specified for tsvector @@ tsquery.
You did wrong twice, you didn't specified type tsvector and you
forgot about coalesce.

There is general rule for partial indexes - you should use the same
expression in query as you used in create index command.

Oleg
On Fri, 6 Feb 2009, James Dooley wrote:

> Oleg, but I am only interested in whether or not the syntax of my
> search-query is correct.
>
> Having created the index as I mentioned above, would the correct way of
> searching and using that index be
>
> ... AND (title || '' || description || '' || name) @@ plainto_tsquery('car')
>
> or should it be as Richard just mentioned
>
> ... AND to_tsvector(title || '' || description || '' || name) @@
> plainto_tsquery('car')
>
> or some other way ?
>
>
>
> On Fri, Feb 6, 2009 at 3:30 PM, Richard Huxton <dev@archonet.com> wrote:
>
>> James Dooley wrote:
>>> Hi again,
>>>
>>> I have set my configuration as default and I have created a GIN index on
>>> three columns, like this
>>>
>>> create index textsearch_index on products using gin(strip( to_tsvector(
>>> 'my_config', title || '' || description || '' || name)))
>>>
>>> Searching these columns the way I have
>>>
>>> ... AND (title || '' || description || '' || name) @@
>> plainto_tsquery('car')
>>>
>>> seems not to be correct since it's taking as much time as non-indexed.
>>
>> PG's planner isn't smart enough to transform a complex expression so as
>> to use a functional index (which is what you've got). You need to
>> mention the function explicitly.
>>
>> So, if you had:
>>  CREATE INDEX lowercase_idx ON mytable ( lower(mycolumn) )
>> You then need to search against "lower(mycolumn)" and not just expect
>> the planner to notice that mycolumn="abc" could use the index.
>>
>> The following should work for you as a starting point:
>>
>> CREATE TABLE tsearch_tbl (id SERIAL, title text, body text,
>> PRIMARY KEY (id));
>>
>> INSERT INTO tsearch_tbl (title, body)
>> SELECT 'title number ' || n, 'This is body number ' || n
>> FROM generate_series(1,9999) n;
>>
>> ANALYSE tsearch_tbl;
>>
>> CREATE INDEX tsearch_tbl_words_idx ON tsearch_tbl USING gin (
>> to_tsvector('english', title || body) );
>>
>> EXPLAIN ANALYSE SELECT * FROM tsearch_tbl WHERE to_tsvector('english',
>> title || body) @@ to_tsquery('17');
>>                                                          QUERY PLAN
>>
>>
>>
-------------------------------------------------------------------------------------------------------------------------------
>>  Bitmap Heap Scan on tsearch_tbl  (cost=4.34..34.76 rows=10 width=45)
>> (actual time=0.067..0.067 rows=1 loops=1)
>>   Recheck Cond: (to_tsvector('english'::regconfig, (title || body)) @@
>> to_tsquery('17'::text))
>>   ->  Bitmap Index Scan on tsearch_tbl_words_idx  (cost=0.00..4.34
>> rows=10 width=0) (actual time=0.059..0.059 rows=1 loops=1)
>>         Index Cond: (to_tsvector('english'::regconfig, (title || body))
>> @@ to_tsquery('17'::text))
>>  Total runtime: 0.121 ms
>>
>>
>> Note that you'll have problems if any of your text-fields contain nulls
>> (since null || anything = null).
>>
>> Personally, unless I'm dealing with a large table, I like to have a
>> separate tsvector column which I keep up to date with triggers. It makes
>> it easier to debug problems.
>>
>> --
>>  Richard Huxton
>>  Archonet Ltd
>>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Searching a gin index

От
James Dooley
Дата:


On Fri, Feb 6, 2009 at 4:01 PM, Richard Huxton <dev@archonet.com> wrote:
James - don't repeat the whole of the previous message in your replies,
trim it to the relevant part. The message is already archived on the list.

James Dooley wrote:
> Oleg, but I am only interested in whether or not the syntax of my
> search-query is correct.
>
> Having created the index as I mentioned above, would the correct way of
> searching and using that index be
>
> ... AND (title || '' || description || '' || name) @@ plainto_tsquery('car')

That looks like a correct syntax, but it's unlikely to use your index -
your index was created on to_tsvector('my_config', ...)

Got it, I will use the same function and parameters as the one created on the index. As a matter a fact now the index seems to be used after I added the strip(to_tsvector(...))

Good!!



> or should it be as Richard just mentioned
>
> ... AND to_tsvector(title || '' || description || '' || name) @@
> plainto_tsquery('car')
>
> or some other way ?

Try playing around with the example I gave, and try variations on it. Do
you understand why you have to supply a config when creating the
functional index? Do you understand the difference between
plainto_tsquery() and to_tsquery()?

Yes, I do understand the difference.



--
 Richard Huxton
 Archonet Ltd



Now everything is working, I added my config to

plainto_tsquery('my_config', 'cars')

and I get results back. I find that strange though since I set the default to be by config, but appearantly that was just temporary and for the PID that changed it. Running from my app seemed to keep using the old configuration.

Thanks all for your help.

Now I have a complete and working fulltext search!

PostgreSQL rules!!

Re: Searching a gin index

От
Richard Huxton
Дата:
James Dooley wrote:
>
> Now everything is working, I added my config to
>
> plainto_tsquery('my_config', 'cars')
>
> and I get results back. I find that strange though since I set the default
> to be by config, but appearantly that was just temporary and for the PID
> that changed it. Running from my app seemed to keep using the old
> configuration.

Not quite clear on what you mean here, but you always need to specify
the config to use a functional index. The default config when you run
the query isn't guaranteed to be the same as the default when you create
the index. Actually, you can change it mid-transaction if you like.

If you want to set the default permanently do something like "ALTER
DATABASE mydb SET .... = ..."

--
  Richard Huxton
  Archonet Ltd

Re: Searching a gin index

От
James Dooley
Дата:
BTW, Oleg I don't need colasque since those values can't be null.





On Fri, Feb 6, 2009 at 4:11 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:
James,

syntax is documented on
http://www.postgresql.org/docs/8.3/static/textsearch-tables.html#TEXTSEARCH-TABLES-SEARCH
and in the Introduction http://www.postgresql.org/docs/8.3/static/textsearch-intro.html#TEXTSEARCH-MATCHING
text-search operator was specified for tsvector @@ tsquery.
You did wrong twice, you didn't specified type tsvector and you forgot about coalesce.

There is general rule for partial indexes - you should use the same expression in query as you used in create index command.


Oleg
On Fri, 6 Feb 2009, James Dooley wrote:

Oleg, but I am only interested in whether or not the syntax of my
search-query is correct.

Having created the index as I mentioned above, would the correct way of
searching and using that index be

... AND (title || '' || description || '' || name) @@ plainto_tsquery('car')

or should it be as Richard just mentioned

... AND to_tsvector(title || '' || description || '' || name) @@
plainto_tsquery('car')

or some other way ?



On Fri, Feb 6, 2009 at 3:30 PM, Richard Huxton <dev@archonet.com> wrote:

James Dooley wrote:
Hi again,

I have set my configuration as default and I have created a GIN index on
three columns, like this

create index textsearch_index on products using gin(strip( to_tsvector(
'my_config', title || '' || description || '' || name)))

Searching these columns the way I have

... AND (title || '' || description || '' || name) @@
plainto_tsquery('car')

seems not to be correct since it's taking as much time as non-indexed.

PG's planner isn't smart enough to transform a complex expression so as
to use a functional index (which is what you've got). You need to
mention the function explicitly.

So, if you had:
 CREATE INDEX lowercase_idx ON mytable ( lower(mycolumn) )
You then need to search against "lower(mycolumn)" and not just expect
the planner to notice that mycolumn="abc" could use the index.

The following should work for you as a starting point:

CREATE TABLE tsearch_tbl (id SERIAL, title text, body text,
PRIMARY KEY (id));

INSERT INTO tsearch_tbl (title, body)
SELECT 'title number ' || n, 'This is body number ' || n
FROM generate_series(1,9999) n;

ANALYSE tsearch_tbl;

CREATE INDEX tsearch_tbl_words_idx ON tsearch_tbl USING gin (
to_tsvector('english', title || body) );

EXPLAIN ANALYSE SELECT * FROM tsearch_tbl WHERE to_tsvector('english',
title || body) @@ to_tsquery('17');
                                                        QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tsearch_tbl  (cost=4.34..34.76 rows=10 width=45)
(actual time=0.067..0.067 rows=1 loops=1)
 Recheck Cond: (to_tsvector('english'::regconfig, (title || body)) @@
to_tsquery('17'::text))
 ->  Bitmap Index Scan on tsearch_tbl_words_idx  (cost=0.00..4.34
rows=10 width=0) (actual time=0.059..0.059 rows=1 loops=1)
       Index Cond: (to_tsvector('english'::regconfig, (title || body))
@@ to_tsquery('17'::text))
 Total runtime: 0.121 ms


Note that you'll have problems if any of your text-fields contain nulls
(since null || anything = null).

Personally, unless I'm dealing with a large table, I like to have a
separate tsvector column which I keep up to date with triggers. It makes
it easier to debug problems.

--
 Richard Huxton
 Archonet Ltd



       Regards,
               Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Searching a gin index

От
James Dooley
Дата:
Richard, I was refereing to the plainto_tsquery

Oleg, I only need colasque if any of those columns can be null, which is not the case here. Correct?

Re: Searching a gin index

От
Oleg Bartunov
Дата:
On Fri, 6 Feb 2009, James Dooley wrote:

> Richard, I was refereing to the plainto_tsquery
>
> Oleg, I only need colasque if any of those columns can be null, which is not
> the case here. Correct?

yes, but for safety and generality I'd use coalesce.


     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83