Обсуждение: BUG #3975: tsearch2 index should not bomb out of 1Mb limit

Поиск
Список
Период
Сортировка

BUG #3975: tsearch2 index should not bomb out of 1Mb limit

От
"Edwin Groothuis"
Дата:
The following bug has been logged online:

Bug reference:      3975
Logged by:          Edwin Groothuis
Email address:      postgresql@mavetju.org
PostgreSQL version: 8.3
Operating system:   FreeBSD 6.3
Description:        tsearch2 index should not bomb out of 1Mb limit
Details:

I have been experimenting with indexing the FreeBSD mailinglist into a
tsearch2 powered/backended database.

Sometimes I see these warning:

NOTICE:  word is too long to be indexed
DETAIL:  Words longer than 2047 characters are ignored.

That's okay, I can live with that. However, I see this one too:

ERROR:  string is too long for tsvector

Ouch. But... since very long words are already not indexed (is the length
configurable anywhere because I don't mind setting it to 50 characters), I
don't think that it should bomb out of this but print a similar warning like
"String only partly indexed".

I'm still trying to determine how big the message it failed on was...

Re: BUG #3975: tsearch2 index should not bomb out of 1Mb limit

От
Euler Taveira de Oliveira
Дата:
Edwin Groothuis wrote:

> Ouch. But... since very long words are already not indexed (is the length
> configurable anywhere because I don't mind setting it to 50 characters), I
> don't think that it should bomb out of this but print a similar warning like
> "String only partly indexed".
>
This is not a bug. I would say it's a limitation. Look at
src/include/tsearch/ts_type.h. You could decrease len in WordEntry to 9
(512 characters) and increase pos to 22 (4 Mb). Don't forget to update
MAXSTRLEN and MAXSTRPOS accordingly.

> I'm still trying to determine how big the message it failed on was...
>
Maybe we should change the "string is too long for tsvector" to "string
is too long (%ld bytes, max %ld bytes) for tsvector".


--
   Euler Taveira de Oliveira
   http://www.timbira.com/