Обсуждение: Full text search prefix matching
Hi, I’m trying to use PostgreSQL's full text search for searching names with prefix matching. I’ve got a materialized view withthe tsvector’s in an indexed column which I’m then searching with prefix matching, as in the sqlfiddle: http://sqlfiddle.com/#!15/a2389/6and below. My problem is that when using the ‘finnish’ text search configuration, the names are split before the end, and they’re notmatched when searching with prefix search that has exactly one character more than the lexeme, as also demonstrated inthe above fiddle. When there are two characters after the lexeme, it does match. I’m working around the issue by using ‘simple’ configuration, which includes the full words in the lexemes. Should text search prefixes work predicatably as documented in [1] even if the lexemes are shorter than the query? How canI get it to work? Best regards, Heikki Rauhala [1] http://www.postgresql.org/docs/9.3/static/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES create table names (name varchar); insert into names (name) values ('Sofia'); create materialized view name_fulltext as select name, to_tsvector('finnish', name) as searchable_index_col from names; select * from name_fulltext; select to_tsquery('finnish','sof:*'); select 'found sof', name from name_fulltext where searchable_index_col@@to_tsquery('finnish','sof:*'); select 'notfound sofi', name from name_fulltext where searchable_index_col@@to_tsquery('finnish','sofi:*'); select 'found sofia', name from name_fulltext where searchable_index_col@@to_tsquery('finnish','sofia:*');
On Tue, 16 Dec 2014 14:59:51 +0200 Heikki Rauhala <heikki.rauhala@reaktor.fi> wrote: Hi Heikki, There is a typo : > > create materialized view name_fulltext as ERREUR: erreur de syntaxe sur ou près de « materialized » LIGNE 1 : create materialized view name_fulltext as You might want to post the results too, next time; I posted them with a corrected script below: create table names (name varchar); insert into names (name) values ('Sofia'); create view name_fulltext as select name, to_tsvector('finnish', name) as searchable_index_col from names; select * from name_fulltext; select to_tsquery('finnish','sof:*'); select 'found sof', name from name_fulltext where searchable_index_col@@to_tsquery('finnish','sof:*'); select 'notfound sofi', name from name_fulltext where searchable_index_col@@to_tsquery('finnish','sofi:*'); select 'found sofia', name from name_fulltext where searchable_index_col@@to_tsquery('finnish','sofia:*'); vv=> select * from name_fulltext; name | searchable_index_col -------+---------------------- Sofia | 'sof':1 (1 ligne) vv=> select to_tsquery('finnish','sof:*'); to_tsquery ------------ 'sof':* (1 ligne) vv=> vv=> select 'found sof', name vv-> from name_fulltext vv-> where searchable_index_col@@to_tsquery('finnish','sof:*'); ?column? | name -----------+------- found sof | Sofia (1 ligne) vv=> vv=> select 'notfound sofi', name vv-> from name_fulltext vv-> where searchable_index_col@@to_tsquery('finnish','sofi:*'); ?column? | name ----------+------ (0 ligne) vv=> vv=> select 'found sofia', name vv-> from name_fulltext vv-> where searchable_index_col@@to_tsquery('finnish','sofia:*'); ?column? | name -------------+------- found sofia | Sofia (1 ligne) -- Regards, Vincent Veyron https://libremen.com/ Legal case, contract and insurance claim management software
Heikki Rauhala <heikki.rauhala@reaktor.fi> writes: > Should text search prefixes work predicatably as documented in [1] even if the lexemes are shorter than the query? Howcan I get it to work? I believe what you're seeing can be explained by these observations: regression=# select to_tsvector('finnish', 'sofia'); to_tsvector ------------- 'sof':1 (1 row) regression=# select to_tsquery('finnish','sofia:*'); to_tsquery ------------ 'sof':* (1 row) regression=# select to_tsquery('finnish','sofi:*'); to_tsquery ------------ 'sofi':* (1 row) regression=# select to_tsquery('finnish','sof:*'); to_tsquery ------------ 'sof':* (1 row) What this shows is that the finnish configuration includes a word-stemming rule that strips off "ia". It won't strip off just "i" though, so "sofi" doesn't get reduced to the same root and therefore doesn't match "sofia". The "*" addition does nothing for you here since it allows matching in the other direction (query shorter than target). I know nothing of Finnish so I can't say just how correct these particular stemming rules are for that language; perhaps they need adjustment. But it seems to me that if you want blind non-language-aware prefix matching, you probably don't want the full-text-search machinery at all. Full text search is meant to deal with words, both in the documents and the queries. You might take a look at pg_trgm as an alternative. regards, tom lane
On 16.12.2014, at 17:09, Vincent Veyron <vv.lists@wanadoo.fr> wrote: > On Tue, 16 Dec 2014 14:59:51 +0200 > Heikki Rauhala <heikki.rauhala@reaktor.fi> wrote: > > Hi Heikki, > > There is a typo : > >> create materialized view name_fulltext as > > ERREUR: erreur de syntaxe sur ou près de « materialized » > LIGNE 1 : create materialized view name_fulltext as > I forgot to mention the postgres version, which is 9.3. Materialized views were introduced in 9.3 and are important to mein this case, because they can be indexed. Next time I’ll include both the version and the full output. Best regards, - Heikki Rauhala