Обсуждение: Full Text Index Scanning

Поиск
Список
Период
Сортировка

Full Text Index Scanning

От
Matt Warner
Дата:
I'm in the process of migrating a project from Oracle to Postgres and have run into a feature question. I know that Postgres has a full-text search feature, but it does not allow scanning the index (as opposed to the data). Specifically, in Oracle you can do "select * from table where contains(colname,'%part_of_word%')>1". While this isn't terribly efficient, it's much faster than full-scanning the raw data and is relatively quick.

It doesn't seem that Postgres works this way. Attempting to do this returns no rows: "select * from table where to_tsvector(colname) @@ to_tsquery('%part_of_word%')"

The reason I want to do this is that the partial word search does not involve dictionary words (it's scanning names).

Is this something Postgres can do? Or is there a different way to do scan the index?

TIA,

Matt

Re: Full Text Index Scanning

От
Oleg Bartunov
Дата:
Matt, I'd try to use prefix search on original string concatenated with reverse string:

Just tried on some spare table

knn=# \d spot_toulouse
             Table "public.spot_toulouse"
        Column        |       Type        | Modifiers
---------------------+-------------------+-----------
  clean_name          | character varying |


1. create index
knn=# create index clean_name_tlz_idx on spot_toulouse using gin(to_tsvector('french', clean_name || ' ' ||
reverse(clean_name)));
2.
select clean_name from spot_toulouse where to_tsvector('french', clean_name|| ' ' ||  reverse(clean_name) ) @@
to_tsquery('french','the:*| et:*'); 

Select looks cumbersome, but you can always write wrapper functions.
The only drawback I see for now is that ranking function will a bit confused,
since coordinates of original and reversed words will be not the same,
but again, it's possible to obtain tsvector by custom function, which
aware about reversing.

Good luck and let me know if this help you.

Oleg

On Fri, 28 Jan 2011, Matt Warner wrote:

> I'm in the process of migrating a project from Oracle to Postgres and have
> run into a feature question. I know that Postgres has a full-text search
> feature, but it does not allow scanning the index (as opposed to the data).
> Specifically, in Oracle you can do "select * from table where
> contains(colname,'%part_of_word%')>1". While this isn't terribly efficient,
> it's much faster than full-scanning the raw data and is relatively quick.
>
> It doesn't seem that Postgres works this way. Attempting to do this returns
> no rows: "select * from table where to_tsvector(colname) @@
> to_tsquery('%part_of_word%')"
>
> The reason I want to do this is that the partial word search does not
> involve dictionary words (it's scanning names).
>
> Is this something Postgres can do? Or is there a different way to do scan
> the index?
>
> TIA,
>
> Matt
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Full Text Index Scanning

От
Matt Warner
Дата:
Thanks Oleg. I'm going to have to experiment with this so that I understand it better.

Matt

On Fri, Jan 28, 2011 at 1:12 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:
Matt, I'd try to use prefix search on original string concatenated with reverse string:

Just tried on some spare table

knn=# \d spot_toulouse
           Table "public.spot_toulouse"
      Column        |       Type        | Modifiers ---------------------+-------------------+-----------
 clean_name          | character varying |


1. create index knn=# create index clean_name_tlz_idx on spot_toulouse using gin(to_tsvector('french', clean_name || ' ' || reverse(clean_name)));
2.
select clean_name from spot_toulouse where to_tsvector('french', clean_name|| ' ' ||  reverse(clean_name) ) @@ to_tsquery('french','the:* | et:*');

Select looks cumbersome, but you can always write wrapper functions. The only drawback I see for now is that ranking function will a bit confused,
since coordinates of original and reversed words will be not the same, but again, it's possible to obtain tsvector by custom function, which aware about reversing.

Good luck and let me know if this help you.

Oleg


On Fri, 28 Jan 2011, Matt Warner wrote:

I'm in the process of migrating a project from Oracle to Postgres and have
run into a feature question. I know that Postgres has a full-text search
feature, but it does not allow scanning the index (as opposed to the data).
Specifically, in Oracle you can do "select * from table where
contains(colname,'%part_of_word%')>1". While this isn't terribly efficient,
it's much faster than full-scanning the raw data and is relatively quick.

It doesn't seem that Postgres works this way. Attempting to do this returns
no rows: "select * from table where to_tsvector(colname) @@
to_tsquery('%part_of_word%')"

The reason I want to do this is that the partial word search does not
involve dictionary words (it's scanning names).

Is this something Postgres can do? Or is there a different way to do scan
the index?

TIA,

Matt


       Regards,
               Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Full Text Index Scanning

От
Matt Warner
Дата:
Reverse isn't a built-in Postgres function, so I found one and installed it. However, attempting to use it in creating an index gets me the message "ERROR:  functions in index expression must be marked IMMUTABLE", even though the function declaration already has the immutable argument.

Is there a specific version of the reverse function you're using? Or am I just missing something obvious? This is Postgres 9, BTW.

Thanks,

Matt

On Sat, Jan 29, 2011 at 6:46 AM, Matt Warner <matt@warnertechnology.com> wrote:
Thanks Oleg. I'm going to have to experiment with this so that I understand it better.

Matt


On Fri, Jan 28, 2011 at 1:12 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:
Matt, I'd try to use prefix search on original string concatenated with reverse string:

Just tried on some spare table

knn=# \d spot_toulouse
           Table "public.spot_toulouse"
      Column        |       Type        | Modifiers ---------------------+-------------------+-----------
 clean_name          | character varying |


1. create index knn=# create index clean_name_tlz_idx on spot_toulouse using gin(to_tsvector('french', clean_name || ' ' || reverse(clean_name)));
2.
select clean_name from spot_toulouse where to_tsvector('french', clean_name|| ' ' ||  reverse(clean_name) ) @@ to_tsquery('french','the:* | et:*');

Select looks cumbersome, but you can always write wrapper functions. The only drawback I see for now is that ranking function will a bit confused,
since coordinates of original and reversed words will be not the same, but again, it's possible to obtain tsvector by custom function, which aware about reversing.

Good luck and let me know if this help you.

Oleg


On Fri, 28 Jan 2011, Matt Warner wrote:

I'm in the process of migrating a project from Oracle to Postgres and have
run into a feature question. I know that Postgres has a full-text search
feature, but it does not allow scanning the index (as opposed to the data).
Specifically, in Oracle you can do "select * from table where
contains(colname,'%part_of_word%')>1". While this isn't terribly efficient,
it's much faster than full-scanning the raw data and is relatively quick.

It doesn't seem that Postgres works this way. Attempting to do this returns
no rows: "select * from table where to_tsvector(colname) @@
to_tsquery('%part_of_word%')"

The reason I want to do this is that the partial word search does not
involve dictionary words (it's scanning names).

Is this something Postgres can do? Or is there a different way to do scan
the index?

TIA,

Matt


       Regards,
               Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Re: Full Text Index Scanning

От
Oleg Bartunov
Дата:
What version of Pg you run ? Try latest version.

Oleg

On Sat, 29 Jan 2011, Matt Warner wrote:

> Reverse isn't a built-in Postgres function, so I found one and installed it.
> However, attempting to use it in creating an index gets me the message
> "ERROR:  functions in index expression must be marked IMMUTABLE", even
> though the function declaration already has the immutable argument.
>
> Is there a specific version of the reverse function you're using? Or am I
> just missing something obvious? This is Postgres 9, BTW.
>
> Thanks,
>
> Matt
>
> On Sat, Jan 29, 2011 at 6:46 AM, Matt Warner <matt@warnertechnology.com>wrote:
>
>> Thanks Oleg. I'm going to have to experiment with this so that I understand
>> it better.
>>
>> Matt
>>
>>
>> On Fri, Jan 28, 2011 at 1:12 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:
>>
>>> Matt, I'd try to use prefix search on original string concatenated with
>>> reverse string:
>>>
>>> Just tried on some spare table
>>>
>>> knn=# \d spot_toulouse
>>>            Table "public.spot_toulouse"
>>>       Column        |       Type        | Modifiers
>>> ---------------------+-------------------+-----------
>>>  clean_name          | character varying |
>>>
>>>
>>> 1. create index knn=# create index clean_name_tlz_idx on spot_toulouse
>>> using gin(to_tsvector('french', clean_name || ' ' || reverse(clean_name)));
>>> 2.
>>> select clean_name from spot_toulouse where to_tsvector('french',
>>> clean_name|| ' ' ||  reverse(clean_name) ) @@ to_tsquery('french','the:* |
>>> et:*');
>>>
>>> Select looks cumbersome, but you can always write wrapper functions. The
>>> only drawback I see for now is that ranking function will a bit confused,
>>> since coordinates of original and reversed words will be not the same, but
>>> again, it's possible to obtain tsvector by custom function, which aware
>>> about reversing.
>>>
>>> Good luck and let me know if this help you.
>>>
>>> Oleg
>>>
>>>
>>> On Fri, 28 Jan 2011, Matt Warner wrote:
>>>
>>>  I'm in the process of migrating a project from Oracle to Postgres and
>>>> have
>>>> run into a feature question. I know that Postgres has a full-text search
>>>> feature, but it does not allow scanning the index (as opposed to the
>>>> data).
>>>> Specifically, in Oracle you can do "select * from table where
>>>> contains(colname,'%part_of_word%')>1". While this isn't terribly
>>>> efficient,
>>>> it's much faster than full-scanning the raw data and is relatively quick.
>>>>
>>>> It doesn't seem that Postgres works this way. Attempting to do this
>>>> returns
>>>> no rows: "select * from table where to_tsvector(colname) @@
>>>> to_tsquery('%part_of_word%')"
>>>>
>>>> The reason I want to do this is that the partial word search does not
>>>> involve dictionary words (it's scanning names).
>>>>
>>>> Is this something Postgres can do? Or is there a different way to do scan
>>>> the index?
>>>>
>>>> TIA,
>>>>
>>>> Matt
>>>>
>>>>
>>>        Regards,
>>>                Oleg
>>> _____________________________________________________________
>>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>>> Sternberg Astronomical Institute, Moscow University, Russia
>>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>>> phone: +007(495)939-16-83, +007(495)939-23-83
>>>
>>
>>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Full Text Index Scanning

От
Matt Warner
Дата:
9.0.2

On Sat, Jan 29, 2011 at 9:35 AM, Oleg Bartunov <oleg@sai.msu.su> wrote:
What version of Pg you run ? Try latest version.

Oleg


On Sat, 29 Jan 2011, Matt Warner wrote:

Reverse isn't a built-in Postgres function, so I found one and installed it.
However, attempting to use it in creating an index gets me the message
"ERROR:  functions in index expression must be marked IMMUTABLE", even
though the function declaration already has the immutable argument.

Is there a specific version of the reverse function you're using? Or am I
just missing something obvious? This is Postgres 9, BTW.

Thanks,

Matt

On Sat, Jan 29, 2011 at 6:46 AM, Matt Warner <matt@warnertechnology.com>wrote:

Thanks Oleg. I'm going to have to experiment with this so that I understand
it better.

Matt


On Fri, Jan 28, 2011 at 1:12 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:

Matt, I'd try to use prefix search on original string concatenated with
reverse string:

Just tried on some spare table

knn=# \d spot_toulouse
          Table "public.spot_toulouse"
     Column        |       Type        | Modifiers
---------------------+-------------------+-----------
 clean_name          | character varying |


1. create index knn=# create index clean_name_tlz_idx on spot_toulouse
using gin(to_tsvector('french', clean_name || ' ' || reverse(clean_name)));
2.
select clean_name from spot_toulouse where to_tsvector('french',
clean_name|| ' ' ||  reverse(clean_name) ) @@ to_tsquery('french','the:* |
et:*');

Select looks cumbersome, but you can always write wrapper functions. The
only drawback I see for now is that ranking function will a bit confused,
since coordinates of original and reversed words will be not the same, but
again, it's possible to obtain tsvector by custom function, which aware
about reversing.

Good luck and let me know if this help you.

Oleg


On Fri, 28 Jan 2011, Matt Warner wrote:

 I'm in the process of migrating a project from Oracle to Postgres and
have
run into a feature question. I know that Postgres has a full-text search
feature, but it does not allow scanning the index (as opposed to the
data).
Specifically, in Oracle you can do "select * from table where
contains(colname,'%part_of_word%')>1". While this isn't terribly
efficient,
it's much faster than full-scanning the raw data and is relatively quick.

It doesn't seem that Postgres works this way. Attempting to do this
returns
no rows: "select * from table where to_tsvector(colname) @@
to_tsquery('%part_of_word%')"

The reason I want to do this is that the partial word search does not
involve dictionary words (it's scanning names).

Is this something Postgres can do? Or is there a different way to do scan
the index?

TIA,

Matt


      Regards,
              Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83





       Regards,
               Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Full Text Index Scanning

От
Matt Warner
Дата:
Doesn't seem to work either. Maybe something changed in 9.1?

create index test_idx on testtable using gin(to_tsvector(wordcolumn||' '||reverse(wordcolumn)));
ERROR:  functions in index expression must be marked IMMUTABLE


On Sun, Jan 30, 2011 at 3:28 AM, Oleg Bartunov <oleg@sai.msu.su> wrote:
I used 9.1dev, but you can try immutable function (from http://andreas.scherbaum.la/blog/archives/10-Reverse-a-text-in-PostgreSQL.html)

create function reverse(text) returns text as $$

select case when length($1)>0

then substring($1, length($1), 1) || reverse(substring($1, 1, length($1)-1))

else '' end $$ language sql immutable strict;



On Sat, 29 Jan 2011, Matt Warner wrote:

9.0.2

On Sat, Jan 29, 2011 at 9:35 AM, Oleg Bartunov <oleg@sai.msu.su> wrote:

What version of Pg you run ? Try latest version.

Oleg


On Sat, 29 Jan 2011, Matt Warner wrote:

 Reverse isn't a built-in Postgres function, so I found one and installed
it.
However, attempting to use it in creating an index gets me the message
"ERROR:  functions in index expression must be marked IMMUTABLE", even
though the function declaration already has the immutable argument.

Is there a specific version of the reverse function you're using? Or am I
just missing something obvious? This is Postgres 9, BTW.

Thanks,

Matt

On Sat, Jan 29, 2011 at 6:46 AM, Matt Warner <matt@warnertechnology.com
wrote:

 Thanks Oleg. I'm going to have to experiment with this so that I
understand
it better.

Matt


On Fri, Jan 28, 2011 at 1:12 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:

 Matt, I'd try to use prefix search on original string concatenated with
reverse string:

Just tried on some spare table

knn=# \d spot_toulouse
         Table "public.spot_toulouse"
    Column        |       Type        | Modifiers
---------------------+-------------------+-----------
 clean_name          | character varying |


1. create index knn=# create index clean_name_tlz_idx on spot_toulouse
using gin(to_tsvector('french', clean_name || ' ' ||
reverse(clean_name)));
2.
select clean_name from spot_toulouse where to_tsvector('french',
clean_name|| ' ' ||  reverse(clean_name) ) @@ to_tsquery('french','the:*
|
et:*');

Select looks cumbersome, but you can always write wrapper functions. The
only drawback I see for now is that ranking function will a bit
confused,
since coordinates of original and reversed words will be not the same,
but
again, it's possible to obtain tsvector by custom function, which aware
about reversing.

Good luck and let me know if this help you.

Oleg


On Fri, 28 Jan 2011, Matt Warner wrote:

 I'm in the process of migrating a project from Oracle to Postgres and

have
run into a feature question. I know that Postgres has a full-text
search
feature, but it does not allow scanning the index (as opposed to the
data).
Specifically, in Oracle you can do "select * from table where
contains(colname,'%part_of_word%')>1". While this isn't terribly
efficient,
it's much faster than full-scanning the raw data and is relatively
quick.

It doesn't seem that Postgres works this way. Attempting to do this
returns
no rows: "select * from table where to_tsvector(colname) @@
to_tsquery('%part_of_word%')"

The reason I want to do this is that the partial word search does not
involve dictionary words (it's scanning names).

Is this something Postgres can do? Or is there a different way to do
scan
the index?

TIA,

Matt


      Regards,
             Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83





      Regards,
              Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83



       Regards,
               Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Full Text Index Scanning

От
Tom Lane
Дата:
Matt Warner <matt@warnertechnology.com> writes:
> Doesn't seem to work either. Maybe something changed in 9.1?
> create index test_idx on testtable using gin(to_tsvector(wordcolumn||'
> '||reverse(wordcolumn)));
> ERROR:  functions in index expression must be marked IMMUTABLE

That's not the same case he tested.  The single-parameter form of
to_tsvector isn't immutable, because it depends on the default text
search configuration parameter.  It should work, AFAICS, with the
two-parameter form.

            regards, tom lane

Re: Full Text Index Scanning

От
Matt Warner
Дата:
Aha! Thanks for pointing that out. It's indexing now.

Thanks!

Matt

On Sun, Jan 30, 2011 at 9:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Matt Warner <matt@warnertechnology.com> writes:
> Doesn't seem to work either. Maybe something changed in 9.1?
> create index test_idx on testtable using gin(to_tsvector(wordcolumn||'
> '||reverse(wordcolumn)));
> ERROR:  functions in index expression must be marked IMMUTABLE

That's not the same case he tested.  The single-parameter form of
to_tsvector isn't immutable, because it depends on the default text
search configuration parameter.  It should work, AFAICS, with the
two-parameter form.

                       regards, tom lane

Re: Full Text Index Scanning

От
Matt Warner
Дата:
If I understand this, it looks like this approach allows me to match the beginnings and endings of words, but not the middle sections. Is that correct? That is, if I search for "jag" I will find "jaeger" but not "lobenjager".

Or am I (again) not understanding how this works?

TIA,

Matt

On Sun, Jan 30, 2011 at 9:59 AM, Matt Warner <matt@warnertechnology.com> wrote:
Aha! Thanks for pointing that out. It's indexing now.

Thanks!

Matt


On Sun, Jan 30, 2011 at 9:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Matt Warner <matt@warnertechnology.com> writes:
> Doesn't seem to work either. Maybe something changed in 9.1?
> create index test_idx on testtable using gin(to_tsvector(wordcolumn||'
> '||reverse(wordcolumn)));
> ERROR:  functions in index expression must be marked IMMUTABLE

That's not the same case he tested.  The single-parameter form of
to_tsvector isn't immutable, because it depends on the default text
search configuration parameter.  It should work, AFAICS, with the
two-parameter form.

                       regards, tom lane


Re: Full Text Index Scanning

От
Tom Lane
Дата:
Matt Warner <matt@warnertechnology.com> writes:
> If I understand this, it looks like this approach allows me to match the
> beginnings and endings of words, but not the middle sections.

Yeah, probably.  You might consider using contrib/pg_trgm instead if
you need arbitrary substrings.

            regards, tom lane

Re: Full Text Index Scanning

От
Matt Warner
Дата:
Thanks. pg_trgm looks interesting, but after installing the pg_trgm.sql, I get error messages when following the documentation.

sggeeorg=> create index test_idx on test using gist(columnname gist_trgm_ops);
ERROR:  operator class "gist_trgm_ops" does not exist for access method "gist"
STATEMENT:  create index test_idx on test using gist(columnname gist_trgm_ops);
ERROR:  operator class "gist_trgm_ops" does not exist for access method "gist"



On Sun, Jan 30, 2011 at 10:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Matt Warner <matt@warnertechnology.com> writes:
> If I understand this, it looks like this approach allows me to match the
> beginnings and endings of words, but not the middle sections.

Yeah, probably.  You might consider using contrib/pg_trgm instead if
you need arbitrary substrings.

                       regards, tom lane

Re: Full Text Index Scanning

От
Robin Coe
Дата:
I was looking for an answer to the same problem posted a while back (sorry, not sure how to join that thread):
Thanks. pg_trgm looks interesting, but after installing the pg_trgm.sql, I
get error messages when following the documentation.

sggeeorg=> create index test_idx on test using gist(columnname
gist_trgm_ops);
ERROR:  operator class "gist_trgm_ops" does not exist for access method
"gist"
STATEMENT:  create index test_idx on test using
gist(columnname gist_trgm_ops);
ERROR:  operator class "gist_trgm_ops" does not exist for access method
"gist"



On Sun, Jan 30, 2011 at 10:36 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Matt Warner <matt(at)warnertechnology(dot)com> writes:
> > If I understand this, it looks like this approach allows me to match the
> > beginnings and endings of words, but not the middle sections.
>
> Yeah, probably.  You might consider using contrib/pg_trgm instead if
> you need arbitrary substrings.
>
>                        regards, tom lane
>

I just ran into the same problem using pgsql 9.1 on both Windows and Linux.  It happened when I tried 
to create an index on a new table in a new schema of a database that already had the pg_trgm extension
loaded. The only way I could get the index to build was to first drop the extension and then re-add it.
Once I re-added the extension, I was then able to create the index.