Обсуждение: TSearch: Need debug help
SELECT ts_debug('durst'); (default_german,lword,"Latin word",durst,"{de_ispell,de}","'dur' 'sen'") SELECT ts_debug('höchsten'); (default_german,word,Word,höchsten,"{de_ispell,de}","'sen' 'höch' 'höchst' 'höchsten'") For some reason both produce the lexem 'sen'. That leads to strange results. Search for `durst' will highlight `höchsten' with headline(). Server is PG 8.0.4, german snowball stemmer, dictionary used is http://hannes.imos.net/german_iso.med (From OpenOffice) What causes some words to result in `sen', though they don't contain that lexem? Thanks! -- Regards, Hannes Dorbath
Hannes, I don't know german, sorry, but does 'dursten' is a some form of 'durst' ? Probably, here we have false hit from compound word support. I'd suggest to use exclusion dictionary (on the base of synonym dictionary) before ispell. It could be very simple: durst : durst Oleg On Thu, 3 Aug 2006, Hannes Dorbath wrote: > SELECT ts_debug('durst'); > (default_german,lword,"Latin word",durst,"{de_ispell,de}","'dur' 'sen'") > > SELECT ts_debug('h?chsten'); > (default_german,word,Word,h?chsten,"{de_ispell,de}","'sen' 'h?ch' 'h?chst' > 'h?chsten'") > > For some reason both produce the lexem 'sen'. That leads to strange results. > Search for `durst' will highlight `h?chsten' with headline(). > > Server is PG 8.0.4, > german snowball stemmer, > dictionary used is http://hannes.imos.net/german_iso.med > (From OpenOffice) > > What causes some words to result in `sen', though they don't contain that > lexem? > > Thanks! > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
> but does 'dursten' is a some form of 'durst' ? Yes it is. Hm, even when I remove `dursten' and `durst' all together from the dict I still get `sen'. How can I update a tsvector column stripping the `sen' lexem? Thanks! On 03.08.2006 12:54, Oleg Bartunov wrote: > Hannes, > > I don't know german, sorry, but does 'dursten' is a some form of 'durst' ? > Probably, here we have false hit from compound word support. I'd suggest > to use exclusion dictionary (on the base of synonym dictionary) before > ispell. It could be very simple: > durst : durst > > > Oleg > > On Thu, 3 Aug 2006, Hannes Dorbath wrote: > >> SELECT ts_debug('durst'); >> (default_german,lword,"Latin word",durst,"{de_ispell,de}","'dur' 'sen'") >> >> SELECT ts_debug('h?chsten'); >> (default_german,word,Word,h?chsten,"{de_ispell,de}","'sen' 'h?ch' >> 'h?chst' 'h?chsten'") >> >> For some reason both produce the lexem 'sen'. That leads to strange >> results. Search for `durst' will highlight `h?chsten' with headline(). >> >> Server is PG 8.0.4, >> german snowball stemmer, >> dictionary used is http://hannes.imos.net/german_iso.med >> (From OpenOffice) >> >> What causes some words to result in `sen', though they don't contain >> that lexem? >> >> Thanks! >> >> > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > Sternberg Astronomical Institute, Moscow University, Russia > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(495)939-16-83, +007(495)939-23-83 > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- Regards, Hannes Dorbath
> hmm, I don't like this. Why not create synonym dictionary as written on http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes Because I found some more words with the same problem, and I have no idea how much there are in total :/ > you need to reindex when you change dictionaries. I just tested with ts_debug() in a new session (dict was reloaded).. On 03.08.2006 13:22, Oleg Bartunov wrote: > On Thu, 3 Aug 2006, Hannes Dorbath wrote: > >>> but does 'dursten' is a some form of 'durst' ? >> >> Yes it is. >> >> Hm, even when I remove `dursten' and `durst' all together from the >> dict I still get `sen'. > > hmm, I don't like this. Why not create synonym dictionary as written on > http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes > > >> >> How can I update a tsvector column stripping the `sen' lexem? > > you need to reindex when you change dictionaries. > >> >> Thanks! >> >> On 03.08.2006 12:54, Oleg Bartunov wrote: >>> Hannes, >>> >>> I don't know german, sorry, but does 'dursten' is a some form of >>> 'durst' ? >>> Probably, here we have false hit from compound word support. I'd suggest >>> to use exclusion dictionary (on the base of synonym dictionary) >>> before ispell. It could be very simple: >>> durst : durst >>> >>> >>> Oleg >>> >>> On Thu, 3 Aug 2006, Hannes Dorbath wrote: >>> >>>> SELECT ts_debug('durst'); >>>> (default_german,lword,"Latin word",durst,"{de_ispell,de}","'dur' >>>> 'sen'") >>>> >>>> SELECT ts_debug('h?chsten'); >>>> (default_german,word,Word,h?chsten,"{de_ispell,de}","'sen' 'h?ch' >>>> 'h?chst' 'h?chsten'") >>>> >>>> For some reason both produce the lexem 'sen'. That leads to strange >>>> results. Search for `durst' will highlight `h?chsten' with headline(). >>>> >>>> Server is PG 8.0.4, >>>> german snowball stemmer, >>>> dictionary used is http://hannes.imos.net/german_iso.med >>>> (From OpenOffice) >>>> >>>> What causes some words to result in `sen', though they don't contain >>>> that lexem? >>>> >>>> Thanks! >>>> >>>> >>> >>> Regards, >>> Oleg >>> _____________________________________________________________ >>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >>> Sternberg Astronomical Institute, Moscow University, Russia >>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >>> phone: +007(495)939-16-83, +007(495)939-23-83 >>> >>> ---------------------------(end of broadcast)--------------------------- >>> TIP 3: Have you checked our extensive FAQ? >>> >>> http://www.postgresql.org/docs/faq >>> >> >> >> > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > Sternberg Astronomical Institute, Moscow University, Russia > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(495)939-16-83, +007(495)939-23-83 -- imos Gesellschaft fuer Internet-Marketing und Online-Services mbH Alfons-Feifel-Str. 9 // D-73037 Goeppingen // Stauferpark Ost Tel: 07161 93339-14 // Fax: 07161 93339-99 // Internet: www.imos.net