Обсуждение: BUG #8750: 'simple' parser in to_tsvector() splits words on underscores

Поиск
Список
Период
Сортировка

BUG #8750: 'simple' parser in to_tsvector() splits words on underscores

От
drx@a-blast.org
Дата:
The following bug has been logged on the website:

Bug reference:      8750
Logged by:          Dragan Espenschied
Email address:      drx@a-blast.org
PostgreSQL version: 9.3.2
Operating system:   Ubuntu 12.04 x64_64
Description:

If to convert a text to a tsvector with the 'simple' parser, words are split
on underscores. For example:


select to_tsvector('simple', 'light_bulb');
    to_tsvector
--------------------
 'bulb':2 'light':1


The underscore is typically used if a term that should be kept together
contains a space, so it is an explicit note that a term should not be
split.


At least, this is how I understand it.


I suggest that words are not split on underscores by default. It would make
for example typical tasks of tagging very comfortable to implement, without
much need to modify the parser.


Thanks for considering my suggestion!
Dragan