BUG #8750: 'simple' parser in to_tsvector() splits words on underscores

Поиск
Список
Период
Сортировка
От drx@a-blast.org
Тема BUG #8750: 'simple' parser in to_tsvector() splits words on underscores
Дата
Msg-id E1W0z20-0007Gz-9W@wrigleys.postgresql.org
обсуждение исходный текст
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      8750
Logged by:          Dragan Espenschied
Email address:      drx@a-blast.org
PostgreSQL version: 9.3.2
Operating system:   Ubuntu 12.04 x64_64
Description:

If to convert a text to a tsvector with the 'simple' parser, words are split
on underscores. For example:


select to_tsvector('simple', 'light_bulb');
    to_tsvector
--------------------
 'bulb':2 'light':1


The underscore is typically used if a term that should be kept together
contains a space, so it is an explicit note that a term should not be
split.


At least, this is how I understand it.


I suggest that words are not split on underscores by default. It would make
for example typical tasks of tagging very comfortable to implement, without
much need to modify the parser.


Thanks for considering my suggestion!
Dragan

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Out of memory in CIFS leads to database crash
Следующее
От: rabigul@gmail.com
Дата:
Сообщение: BUG #8760: Large Objects