Обсуждение: TSearch: Need debug help

Поиск
Список
Период
Сортировка

TSearch: Need debug help

От
Hannes Dorbath
Дата:
SELECT ts_debug('durst');
(default_german,lword,"Latin word",durst,"{de_ispell,de}","'dur' 'sen'")

SELECT ts_debug('höchsten');
(default_german,word,Word,höchsten,"{de_ispell,de}","'sen' 'höch'
'höchst' 'höchsten'")

For some reason both produce the lexem 'sen'. That leads to strange
results. Search for `durst' will highlight `höchsten' with headline().

Server is PG 8.0.4,
german snowball stemmer,
dictionary used is http://hannes.imos.net/german_iso.med
(From OpenOffice)

What causes some words to result in `sen', though they don't contain
that lexem?

Thanks!

--
Regards,
Hannes Dorbath

Re: TSearch: Need debug help

От
Oleg Bartunov
Дата:
Hannes,

I don't know german, sorry, but does 'dursten' is a some form of 'durst' ?
Probably, here we have false hit from compound word support. I'd suggest
to use exclusion dictionary (on the base of synonym dictionary)
before  ispell. It could be very simple:
durst : durst


Oleg

On Thu, 3 Aug 2006, Hannes Dorbath wrote:

> SELECT ts_debug('durst');
> (default_german,lword,"Latin word",durst,"{de_ispell,de}","'dur' 'sen'")
>
> SELECT ts_debug('h?chsten');
> (default_german,word,Word,h?chsten,"{de_ispell,de}","'sen' 'h?ch' 'h?chst'
> 'h?chsten'")
>
> For some reason both produce the lexem 'sen'. That leads to strange results.
> Search for `durst' will highlight `h?chsten' with headline().
>
> Server is PG 8.0.4,
> german snowball stemmer,
> dictionary used is http://hannes.imos.net/german_iso.med
> (From OpenOffice)
>
> What causes some words to result in `sen', though they don't contain that
> lexem?
>
> Thanks!
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: TSearch: Need debug help

От
Hannes Dorbath
Дата:
> but does 'dursten' is a some form of 'durst' ?

Yes it is.

Hm, even when I remove `dursten' and `durst' all together from the dict
I still get `sen'.

How can I update a tsvector column stripping the `sen' lexem?

Thanks!

On 03.08.2006 12:54, Oleg Bartunov wrote:
> Hannes,
>
> I don't know german, sorry, but does 'dursten' is a some form of 'durst' ?
> Probably, here we have false hit from compound word support. I'd suggest
> to use exclusion dictionary (on the base of synonym dictionary) before
> ispell. It could be very simple:
> durst : durst
>
>
> Oleg
>
> On Thu, 3 Aug 2006, Hannes Dorbath wrote:
>
>> SELECT ts_debug('durst');
>> (default_german,lword,"Latin word",durst,"{de_ispell,de}","'dur' 'sen'")
>>
>> SELECT ts_debug('h?chsten');
>> (default_german,word,Word,h?chsten,"{de_ispell,de}","'sen' 'h?ch'
>> 'h?chst' 'h?chsten'")
>>
>> For some reason both produce the lexem 'sen'. That leads to strange
>> results. Search for `durst' will highlight `h?chsten' with headline().
>>
>> Server is PG 8.0.4,
>> german snowball stemmer,
>> dictionary used is http://hannes.imos.net/german_iso.med
>> (From OpenOffice)
>>
>> What causes some words to result in `sen', though they don't contain
>> that lexem?
>>
>> Thanks!
>>
>>
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>


--
Regards,
Hannes Dorbath

Re: TSearch: Need debug help

От
Hannes Dorbath
Дата:
> hmm, I don't like this. Why not create synonym dictionary as written on
http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes

Because I found some more words with the same problem, and I have no
idea how much there are in total :/

> you need to reindex when you change dictionaries.

I just tested with ts_debug() in a new session (dict was reloaded)..

On 03.08.2006 13:22, Oleg Bartunov wrote:
> On Thu, 3 Aug 2006, Hannes Dorbath wrote:
>
>>> but does 'dursten' is a some form of 'durst' ?
>>
>> Yes it is.
>>
>> Hm, even when I remove `dursten' and `durst' all together from the
>> dict I still get `sen'.
>
> hmm, I don't like this. Why not create synonym dictionary as written on
> http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes
>
>
>>
>> How can I update a tsvector column stripping the `sen' lexem?
>
> you need to reindex when you change dictionaries.
>
>>
>> Thanks!
>>
>> On 03.08.2006 12:54, Oleg Bartunov wrote:
>>> Hannes,
>>>
>>> I don't know german, sorry, but does 'dursten' is a some form of
>>> 'durst' ?
>>> Probably, here we have false hit from compound word support. I'd suggest
>>> to use exclusion dictionary (on the base of synonym dictionary)
>>> before ispell. It could be very simple:
>>> durst : durst
>>>
>>>
>>> Oleg
>>>
>>> On Thu, 3 Aug 2006, Hannes Dorbath wrote:
>>>
>>>> SELECT ts_debug('durst');
>>>> (default_german,lword,"Latin word",durst,"{de_ispell,de}","'dur'
>>>> 'sen'")
>>>>
>>>> SELECT ts_debug('h?chsten');
>>>> (default_german,word,Word,h?chsten,"{de_ispell,de}","'sen' 'h?ch'
>>>> 'h?chst' 'h?chsten'")
>>>>
>>>> For some reason both produce the lexem 'sen'. That leads to strange
>>>> results. Search for `durst' will highlight `h?chsten' with headline().
>>>>
>>>> Server is PG 8.0.4,
>>>> german snowball stemmer,
>>>> dictionary used is http://hannes.imos.net/german_iso.med
>>>> (From OpenOffice)
>>>>
>>>> What causes some words to result in `sen', though they don't contain
>>>> that lexem?
>>>>
>>>> Thanks!
>>>>
>>>>
>>>
>>>     Regards,
>>>         Oleg
>>> _____________________________________________________________
>>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>>> Sternberg Astronomical Institute, Moscow University, Russia
>>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>>> phone: +007(495)939-16-83, +007(495)939-23-83
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 3: Have you checked our extensive FAQ?
>>>
>>>               http://www.postgresql.org/docs/faq
>>>
>>
>>
>>
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83

--
imos  Gesellschaft fuer Internet-Marketing und Online-Services mbH
Alfons-Feifel-Str. 9 // D-73037 Goeppingen  // Stauferpark Ost
Tel: 07161 93339-14 // Fax: 07161 93339-99 // Internet: www.imos.net