Re: tsearch2: enable non ascii stop words with C locale

Поиск

Список

Период

Сортировка

От	Teodor Sigaev
Тема	Re: tsearch2: enable non ascii stop words with C locale
Дата	13 февраля 2007 г. 07:13:02
Msg-id	45D17308.1070305@sigaev.ru обсуждение исходный текст
Ответ на	Re: tsearch2: enable non ascii stop words with C locale (Tatsuo Ishii <ishii@sraoss.co.jp>)
Ответы	Re: tsearch2: enable non ascii stop words with C locale
Список	pgsql-hackers

Дерево обсуждения

> I know. My guess is the parser does not read the stop word file at
> least with default configuration.

Parser should not read stopword file: its deal for dictionaries.

>
> So if a character is not ASCII, it returns 0 even if p_isalpha returns
> 1. Is this what you expect?
No, p_islatin should return true only for latin characters, not for national ones.

>
> In our case, we added JAPANESE_STOP_WORD into english.stop then:
> select to_tsvector(JAPANESE_STOP_WORD)
> which returns words even they are in JAPANESE_STOP_WORD.
> And with the patches the problem was solved.

Pls, show your configuration for lexemes/dictionaries. I suspect that you have 
en_stem dictionary on for lword lexemes type. Better way is to use 'simple' 
distionary (it's support stopword the same way as en_stem does) and set it for
nlword, word, part_hword, nlpart_hword, hword, nlhword lexeme's types. Note, 
leave unchanged en_stem for any latin word.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/

В списке pgsql-hackers по дате отправления:

Предыдущее

От: "Niels Breet"
Дата: 13 февраля 2007 г., 07:11:47
Сообщение: Re: OT: IRC nick to real world mapping

Следующее

От: Magnus Hagander
Дата: 13 февраля 2007 г., 07:33:59
Сообщение: Re: Variable length varlena headers redux

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: tsearch2: enable non ascii stop words with C locale

Предыдущее

Следующее