Re: BUG #18149: Incorrect lexeme for english token "proxy"

Поиск
Список
Период
Сортировка
От Laurenz Albe
Тема Re: BUG #18149: Incorrect lexeme for english token "proxy"
Дата
Msg-id 25d0bf34e0f7e0d9f3455e1b7adaf5a5aee810e1.camel@cybertec.at
обсуждение исходный текст
Ответ на BUG #18149: Incorrect lexeme for english token "proxy"  (PG Bug reporting form <noreply@postgresql.org>)
Ответы Re: BUG #18149: Incorrect lexeme for english token "proxy"  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
On Thu, 2023-10-05 at 21:44 +0000, PG Bug reporting form wrote:
> The english dictionary is using the lexeme "proxi" for the token "proxy". As
> a result, the search term "proxy" is not yielding results for records that
> contain this word.
>
> # select * from ts_debug('english', 'proxy');
>    alias   |   description   | token |  dictionaries  |  dictionary  |
> lexemes
> -----------+-----------------+-------+----------------+--------------+---------
>  asciiword | Word, all ASCII | proxy | {english_stem} | english_stem |
> {proxi}

I cannot reproduce that.  If I generate a text search query for "proxy",
I get this:

SELECT to_tsquery('english', 'proxy');

 to_tsquery
════════════
 'proxi'
(1 row)

which will work just fine.

> I think this lexeme was chosen to support the plural of proxy which is
> proxies. However there are other plurals where the root word is spelled
> different and Postgres creates the correct lexeme such as: [goose or mouse]

The snowball dictionary has no real knowledge of the words.  Stemming is
done by applying some heuristics which work "well enough" in most cases.
In the case of "proxy", the rule seems simply to be to stem everything that
ends in "y" as ending in "i", in the hope to catch plurals that are built
along the same lines as for "proxy":

select * from to_tsvector('english', 'mumply');

 to_tsvector
═════════════
 'mumpli':1
(1 row)

This works fine in most cases, but every heuristic can go wrong occasionally:

select * from to_tsvector('english', 'standby');

 to_tsvector
═════════════
 'standbi':1
(1 row)

Yours,
Laurenz Albe



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Alexander Lakhin
Дата:
Сообщение: Re: BUG #17821: Assertion failed in heap_update() due to heap pruning
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #18149: Incorrect lexeme for english token "proxy"