Re: Tsearch limitations

Поиск
Список
Период
Сортировка
От Teodor Sigaev
Тема Re: Tsearch limitations
Дата
Msg-id 3F3901DF.1020900@sigaev.ru
обсуждение исходный текст
Ответ на Re: Tsearch limitations  (Mike Benoit <mikeb@netnation.com>)
Список pgsql-general

Mike Benoit wrote:
> Oleg,
>
>     Is it possible to have Tsearch support soundex, or levenshtein
> (http://ca3.php.net/manual/en/function.levenshtein.php) when searching?
Sorrry, No


Function of calculating levenshtein distance defined as
int levenshtein ( string str1, string str2)

So, it can't be used as dictionary. :(

Index stores only signature of lexized word and we can't find distance between
query word and signature.

>
> I've never used Tsearch before, but I assume this might just be a matter
> of writing a different parser to add soundex'd versions of words to the
> index, then modify the query functions to search on both versions of the
> word?

For work with tsearch2, dictionary must return "canonical" kind of input lexemes
(usially infinitive). If you can write function which corrects some mistakes in
word then you can use it in tsearch.



>
>
> On Mon, 2003-08-11 at 07:30, Oleg Bartunov wrote:
>
>>On Mon, 11 Aug 2003 psql-mail@freeuk.com wrote:
>>
>>
>>>Oleg,
>>>
>>>I understand (i think) how the parser breaks up the input into words
>>>and builds ts_vector's.
>>>
>>>And i understand how to do queries as described into the documentation.
>>>(I have read it!)
>>>
>>>SELECT * FROM vectors WHERE vector @@ to_tsquery('(leads|forks) & !
>>>crawl')
>>>
>>>But i haven't seen any mention of if i add the word:
>>>
>>>cathedral
>>>
>>>if there is any query which will match if I search for "thed".
>>
>>No, tsearch2 is a word oriented search. It doesn't supports substring
>>search.
>>
>>
>>>The documentation seems to say that this cannot be done - but i'd just
>>>like to check. Tsearch2 does everything i want except this.
>>>
>>>"remember that the search operator @@ finds only exact matches between
>>>query lexemes and vector lexemes ≈ if they are not exactly the same
>>>string, they will not be considered a match"
>>>
>>>
>>>
>>>>Mat,
>>>>
>>>>there are several function you may use to see (please, read
>>>
>>>documentation):
>>>
>>>>apod=# select to_tsvector('Hi my email addres is psql-mail@freeuk.com'
>>>
>>>);
>>>
>>>>                    to_tsvector
>>>>----------------------------------------------------
>>>> 'hi':1 'addr':4 'email':3 'psql-mail@freeuk.com':6
>>>>(1 row)
>>>>
>>>>or, even better
>>>>
>>>>apod=# select * from ts_debug('Hi my email addres is psql-mail@freeuk.
>>>
>>>com');
>>>
>>>>     ts_name     | tok_type | description |        token         |
>>>
>>>dict_name |        tsvector
>>>
>>>>-----------------+----------+-------------+----------------------+----
>>>
>>>-------+------------------------
>>>
>>>> default_russian | lword    | Latin word  | Hi                   | {
>>>
>>>en_stem} | 'hi'
>>>
>>>> default_russian | lword    | Latin word  | my                   | {
>>>
>>>en_stem} |
>>>
>>>> default_russian | lword    | Latin word  | email                | {
>>>
>>>en_stem} | 'email'
>>>
>>>> default_russian | lword    | Latin word  | addres               | {
>>>
>>>en_stem} | 'addr'
>>>
>>>> default_russian | lword    | Latin word  | is                   | {
>>>
>>>en_stem} |
>>>
>>>> default_russian | email    | Email       | psql-mail@freeuk.com | {
>>>
>>>simple}  | 'psql-mail@freeuk.com'
>>>
>>>>(6 rows)
>>>>
>>>>You may write your own parser or preprocess text before tsearch.
>>>>
>>>>    Oleg
>>>>On Mon, 11 Aug 2003, Mat wrote:
>>>>
>>>>
>>>>>Can Tsearch be used to return substring matches?
>>>>>
>>>>>i.e
>>>>>
>>>>>Text to search: Hi my email addres is psql-mail@freeuk.com
>>>>>
>>>>>Query "psql" would match the email address?
>>>>>
>>>>>Query "mail" would also match?
>>>>>
>>>>>Query "reeu" would also match?
>>>>>
>>>>>Or is tsearch not suitable for this type of query? should i use FTI
>>>
>>>>>instead?
>>>>>
>>>>>Thanks.
>>>>>
>>>>>
>>>>>---------------------------(end of broadcast)-----------------------
>>>
>>>----
>>>
>>>>>TIP 6: Have you searched our list archives?
>>>>>
>>>>>               http://archives.postgresql.org
>>>>>
>>>>
>>>>    Regards,
>>>>        Oleg
>>>>_____________________________________________________________
>>>>Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
>>>>Sternberg Astronomical Institute, Moscow University (Russia)
>>>>Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>>>>phone: +007(095)939-16-83, +007(095)939-23-83
>>>>
>>>
>>>
>>    Regards,
>>        Oleg
>>_____________________________________________________________
>>Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
>>Sternberg Astronomical Institute, Moscow University (Russia)
>>Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>>phone: +007(095)939-16-83, +007(095)939-23-83
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 8: explain analyze is your friend

--
Teodor Sigaev                                  E-mail: teodor@sigaev.ru


В списке pgsql-general по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: 7.4Beta1 "failed to create socket: Address family not
Следующее
От: Dennis Gearon
Дата:
Сообщение: Re: How to prevent vacuum and reindex from deadlocking.