Обсуждение: Full text search prefix matching

Поиск
Список
Период
Сортировка

Full text search prefix matching

От
Heikki Rauhala
Дата:
Hi,

I’m trying to use PostgreSQL's full text search for searching names with prefix matching. I’ve got a materialized view
withthe  tsvector’s in an indexed column which I’m then searching with prefix matching, as in the sqlfiddle:
http://sqlfiddle.com/#!15/a2389/6and below. 

My problem is that when using the ‘finnish’ text search configuration, the names are split before the end, and they’re
notmatched when searching with prefix search that has exactly one character more than the lexeme, as also demonstrated
inthe above fiddle. When there are two characters after the lexeme, it does match. 

I’m working around the issue by using ‘simple’ configuration, which includes the full words in the lexemes.

Should text search prefixes work predicatably as documented in [1] even if the lexemes are shorter than the query? How
canI get it to work? 

Best regards,

Heikki Rauhala


[1] http://www.postgresql.org/docs/9.3/static/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES

create table names (name varchar);
insert into names (name) values ('Sofia');

create materialized view name_fulltext as
  select
    name,
    to_tsvector('finnish', name) as
      searchable_index_col
  from names;

select * from name_fulltext;
select to_tsquery('finnish','sof:*');

select 'found sof', name
from name_fulltext
where searchable_index_col@@to_tsquery('finnish','sof:*');

select 'notfound sofi', name
from name_fulltext
where searchable_index_col@@to_tsquery('finnish','sofi:*');

select 'found sofia', name
from name_fulltext
where searchable_index_col@@to_tsquery('finnish','sofia:*');

Re: Full text search prefix matching

От
Vincent Veyron
Дата:
On Tue, 16 Dec 2014 14:59:51 +0200
Heikki Rauhala <heikki.rauhala@reaktor.fi> wrote:

Hi Heikki,

There is a typo :

>
> create materialized view name_fulltext as

ERREUR:  erreur de syntaxe sur ou près de « materialized »
LIGNE 1 : create materialized view name_fulltext as

You might want to post the results too, next time; I posted them with a corrected script below:

create table names (name varchar);
insert into names (name) values ('Sofia');

create view name_fulltext as
  select
    name,
    to_tsvector('finnish', name) as
      searchable_index_col
  from names;

select * from name_fulltext;
select to_tsquery('finnish','sof:*');

select 'found sof', name
from name_fulltext
where searchable_index_col@@to_tsquery('finnish','sof:*');

select 'notfound sofi', name
from name_fulltext
where searchable_index_col@@to_tsquery('finnish','sofi:*');

select 'found sofia', name
from name_fulltext
where searchable_index_col@@to_tsquery('finnish','sofia:*');


vv=> select * from name_fulltext;
 name  | searchable_index_col
-------+----------------------
 Sofia | 'sof':1
(1 ligne)

vv=> select to_tsquery('finnish','sof:*');
 to_tsquery
------------
 'sof':*
(1 ligne)

vv=>
vv=> select 'found sof', name
vv-> from name_fulltext
vv-> where searchable_index_col@@to_tsquery('finnish','sof:*');
 ?column?  | name
-----------+-------
 found sof | Sofia
(1 ligne)

vv=>
vv=> select 'notfound sofi', name
vv-> from name_fulltext
vv-> where searchable_index_col@@to_tsquery('finnish','sofi:*');
 ?column? | name
----------+------
(0 ligne)

vv=>
vv=> select 'found sofia', name
vv-> from name_fulltext
vv-> where searchable_index_col@@to_tsquery('finnish','sofia:*');
  ?column?   | name
-------------+-------
 found sofia | Sofia
(1 ligne)




--
                    Regards, Vincent Veyron

https://libremen.com/
Legal case, contract and insurance claim management software


Re: Full text search prefix matching

От
Tom Lane
Дата:
Heikki Rauhala <heikki.rauhala@reaktor.fi> writes:
> Should text search prefixes work predicatably as documented in [1] even if the lexemes are shorter than the query?
Howcan I get it to work? 

I believe what you're seeing can be explained by these observations:

regression=# select to_tsvector('finnish', 'sofia');
 to_tsvector
-------------
 'sof':1
(1 row)

regression=# select to_tsquery('finnish','sofia:*');
 to_tsquery
------------
 'sof':*
(1 row)

regression=# select to_tsquery('finnish','sofi:*');
 to_tsquery
------------
 'sofi':*
(1 row)

regression=# select to_tsquery('finnish','sof:*');
 to_tsquery
------------
 'sof':*
(1 row)

What this shows is that the finnish configuration includes a word-stemming
rule that strips off "ia".  It won't strip off just "i" though, so "sofi"
doesn't get reduced to the same root and therefore doesn't match "sofia".
The "*" addition does nothing for you here since it allows matching in
the other direction (query shorter than target).

I know nothing of Finnish so I can't say just how correct these particular
stemming rules are for that language; perhaps they need adjustment.  But
it seems to me that if you want blind non-language-aware prefix matching,
you probably don't want the full-text-search machinery at all.  Full text
search is meant to deal with words, both in the documents and the queries.
You might take a look at pg_trgm as an alternative.

            regards, tom lane


Re: Full text search prefix matching

От
Heikki Rauhala
Дата:
On 16.12.2014, at 17:09, Vincent Veyron <vv.lists@wanadoo.fr> wrote:

> On Tue, 16 Dec 2014 14:59:51 +0200
> Heikki Rauhala <heikki.rauhala@reaktor.fi> wrote:
>
> Hi Heikki,
>
> There is a typo :
>
>> create materialized view name_fulltext as
>
> ERREUR:  erreur de syntaxe sur ou près de « materialized »
> LIGNE 1 : create materialized view name_fulltext as
>

I forgot to mention the postgres version, which is 9.3. Materialized views were introduced in 9.3 and are important to
mein this case, because they can be indexed. 

Next time I’ll include both the version and the full output.

Best regards,

 - Heikki Rauhala