full text search and hyphens in uuid

Поиск

Список

Период

Сортировка

От	Martin Norbäck Olivers
Тема	full text search and hyphens in uuid
Дата	27 октября 2023 г. 14:48:32
Msg-id	CALoTC6s=QAvj=yw2cY=8t_dyQsByXF_AT8k=z-YXOcgcj3sO=g@mail.gmail.com обсуждение исходный текст
Ответы	Re: full text search and hyphens in uuid (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-sql

Дерево обсуждения

Hi!
I have a problem with full text search and uuids in the text which I index using to_tsvector . I have uuids in my text and most of the time, it works well because they are lexed as words so I can just search for the parts of the uuid.

The problem is an uuid like this:

select to_tsvector('simple','0232710f-8545-59eb-abcd-47aa57184361')

Which gives this result

'-59':3 '-8545':2 '0232710f':1 '47aa57184361':7 'abcd':6 'eb':5 'eb-abcd-47aa57184361':4

So, I found dict_int and asked it to remove the minus signs

create extension dict_int;

ALTER TEXT SEARCH DICTIONARY intdict (MAXLEN = 12, absval = true);

alter text search configuration simple alter mapping for int, uint with intdict

and now I get this result instead:

'0232710f':1 '47aa57184361':7 '59':3 '8545':2 'abcd':6 'eb':5 'eb-abcd-47aa57184361':4

which is slightly better, but still not good enough because there is no token 59eb. It's being split into 59 and eb.

Is there any way to change this behaviour of the tsvector lexer? Do I have to write my own tsvector or is there a way to "turn off" integer handling in the lexer?

Regards,
Martin

В списке pgsql-sql по дате отправления:

Предыдущее

От: hector vass
Дата: 26 октября 2023 г., 16:50:53
Сообщение: Re: Concurrently run scipts

Следующее

От: Tom Lane
Дата: 28 октября 2023 г., 05:05:14
Сообщение: Re: full text search and hyphens in uuid

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

full text search and hyphens in uuid

Предыдущее

Следующее