Обсуждение: FTS phrase searches

Поиск
Список
Период
Сортировка

FTS phrase searches

От
Glenn Maynard
Дата:
How are adjacent word searches handled with FTS?  tsquery doesn't do
this, so I assume this has to be done as a separate filter step, eg.:

  # "large house" sales
  SELECT * FROM data WHERE fts @@ to_tsquery('large & house & sales')
AND tsvector_contains_phrase(fts, to_tsvector('large house')));

to do an indexed search for "large & house & sales" and then to narrow
the results to where "large house" actually appears as a phrase (eg.
adjacent positions at the same weight).  I can't find any function to
do that, though.  (Presumably, it would return true if all of the
words in the second tsvector exist in the first, with the same
positions relative to each other.)

"tsvector <@ tsvector" seems logical, but isn't supported.

This isn't as simple as using LIKE, since that'll ignore stemming,
tokenization rules, etc.  If the language rules allow this to match
"larger house" or "large-house", then a phrase restriction should,
too.  It's also painful when the FTS column is an aggregate of several
other columns (eg. title and body), since a LIKE match needs to know
that and check all of them separately.

Any hints?  This is pretty important to even simpler search systems.

--
Glenn Maynard

Re: FTS phrase searches

От
Glenn Maynard
Дата:
I guess no response means it's not possible.  I ended up doing a
manual substring match for quoted strings, but that's a poor hack.
Maybe I'll take a poke at implementing something like
tsvector_contains_phrase; it seems like a natural extension of what's
in there now.


On Mon, Nov 1, 2010 at 4:35 PM, Glenn Maynard <glenn@zewt.org> wrote:
> How are adjacent word searches handled with FTS?  tsquery doesn't do
> this, so I assume this has to be done as a separate filter step, eg.:
>
>  # "large house" sales
>  SELECT * FROM data WHERE fts @@ to_tsquery('large & house & sales')
> AND tsvector_contains_phrase(fts, to_tsvector('large house')));
>
> to do an indexed search for "large & house & sales" and then to narrow
> the results to where "large house" actually appears as a phrase (eg.
> adjacent positions at the same weight).  I can't find any function to
> do that, though.  (Presumably, it would return true if all of the
> words in the second tsvector exist in the first, with the same
> positions relative to each other.)
>
> "tsvector <@ tsvector" seems logical, but isn't supported.
>
> This isn't as simple as using LIKE, since that'll ignore stemming,
> tokenization rules, etc.  If the language rules allow this to match
> "larger house" or "large-house", then a phrase restriction should,
> too.  It's also painful when the FTS column is an aggregate of several
> other columns (eg. title and body), since a LIKE match needs to know
> that and check all of them separately.
>
> Any hints?  This is pretty important to even simpler search systems.

--
Glenn Maynard

Re: FTS phrase searches

От
Oleg Bartunov
Дата:
You might be interested in http://www.sai.msu.su/~megera/wiki/2009-08-12

Oleg
On Sun, 19 Dec 2010, Glenn Maynard wrote:

> I guess no response means it's not possible.  I ended up doing a
> manual substring match for quoted strings, but that's a poor hack.
> Maybe I'll take a poke at implementing something like
> tsvector_contains_phrase; it seems like a natural extension of what's
> in there now.
>
>
> On Mon, Nov 1, 2010 at 4:35 PM, Glenn Maynard <glenn@zewt.org> wrote:
>> How are adjacent word searches handled with FTS?  tsquery doesn't do
>> this, so I assume this has to be done as a separate filter step, eg.:
>>
>>  # "large house" sales
>>  SELECT * FROM data WHERE fts @@ to_tsquery('large & house & sales')
>> AND tsvector_contains_phrase(fts, to_tsvector('large house')));
>>
>> to do an indexed search for "large & house & sales" and then to narrow
>> the results to where "large house" actually appears as a phrase (eg.
>> adjacent positions at the same weight).  I can't find any function to
>> do that, though.  (Presumably, it would return true if all of the
>> words in the second tsvector exist in the first, with the same
>> positions relative to each other.)
>>
>> "tsvector <@ tsvector" seems logical, but isn't supported.
>>
>> This isn't as simple as using LIKE, since that'll ignore stemming,
>> tokenization rules, etc.  If the language rules allow this to match
>> "larger house" or "large-house", then a phrase restriction should,
>> too.  It's also painful when the FTS column is an aggregate of several
>> other columns (eg. title and body), since a LIKE match needs to know
>> that and check all of them separately.
>>
>> Any hints?  This is pretty important to even simpler search systems.
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: FTS phrase searches

От
Glenn Maynard
Дата:
2010/12/19 Oleg Bartunov <oleg@sai.msu.su>:
> You might be interested in http://www.sai.msu.su/~megera/wiki/2009-08-12

Thanks, that looks pretty much like what I had in mind.  Hopefully
that'll get merged for 9.0+1; phrases are a major part of all text
searches.

--
Glenn Maynard

Re: FTS phrase searches

От
Oleg Bartunov
Дата:
On Sun, 19 Dec 2010, Glenn Maynard wrote:

> 2010/12/19 Oleg Bartunov <oleg@sai.msu.su>:
>> You might be interested in http://www.sai.msu.su/~megera/wiki/2009-08-12
>
> Thanks, that looks pretty much like what I had in mind.  Hopefully
> that'll get merged for 9.0+1; phrases are a major part of all text
> searches.

Several companies interested in phrase search, but actually we got no
support for this, so we postpone it.


     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83