Re: Full text index not being used

Поиск
Список
Период
Сортировка
От Oleg Bartunov
Тема Re: Full text index not being used
Дата
Msg-id Pine.LNX.4.64.0902031509450.4158@sn.sai.msu.ru
обсуждение исходный текст
Ответ на Re: Full text index not being used  (Alex Neth <alex@liivid.com>)
Ответы Re: Full text index not being used  (Alex Neth <alex@liivid.com>)
Список pgsql-general
Alex, can you somehow identify document, which has problem with
long word errors ?
Also, if you have space on disk I'd recommend to try

select *, to_tsvector('english',full_listing) as flv from source_listings;

I don't remember if you said us information about
your setup (pg version, OS, memory, what did you change in postgresql.conf..)


Oleg
On Tue, 3 Feb 2009, Alex Neth wrote:

> Based on suggestions from this list, I am trying to create a tsvector column
> and index that, since it is perhaps the recheck and rebuilding of all the
> vectors that is slowing things down.  I don't understand why a recheck is
> necessary on a gin index.....
>
> My update statement has been running for 36 hours now and has not finished.
> The statement is:  update source_listings set flv = to_tsvector('english',
> full_listing);  I know that it is still working because it occasionally
> prints out one of those long word errors.
>
> I have only 1.6M rows and each entry in that column is a standard size web
> page with just the text, maybe 3-5K.
>
> For sure I don't have meaningful long words.  Perhaps that is because it is
> not handling the HTML well and I should be parsing down the web page first.
> Hopefully that doesn't mean I need to rebuild this column over the course of
> 3 days - I didn't expect it to take this long so I thought I'd just try it
> out.
>
>
>
>
>
> On Feb 2, 2009, at 1:49 AM, Oleg Bartunov wrote:
>
>> Alex,
>>
>> what text you're indexing ? I don't believe you have meaningful
>> very long words ( > 2047 characters).
>>
>> Do you really need multicolumn index ?
>>
>> I'd recommend to separate problem - create column fts for
>> tsvector('english',full_listing), create index on it and try full-text
>> query. The way you're doing imply calling to_tsvector every time you
>> search, which can be very costly.
>>
>>
>> Olegk
>>
>> On Sun, 1 Feb 2009, Alex wrote:
>>
>>> So this seems to be because the result size is too big.  I still don't
>>> know why it is looping through every record and printing a warning,
>>> but adding a LIMIT makes the queries complete in a reasonable time
>>> (although not all that fast).
>>>
>>> However I need to sort and also have many other facets that may or may
>>> not be included in the query.  Adding a sort makes it load every
>>> record again and take forever.
>>>
>>> I tried to create an index including all of the fields I query on to
>>> see if that would work, but I get an error the the index row is too
>>> large:
>>>
>>> => create index master_index on source_listings(geo_lat, geo_lon,
>>> price, bedrooms, region, city, listing_type, to_tsvector('english',
>>> full_listing), post_time);
>>> NOTICE:  word is too long to be indexed
>>> DETAIL:  Words longer than 2047 characters are ignored.
>>> NOTICE:  word is too long to be indexed
>>> DETAIL:  Words longer than 2047 characters are ignored.
>>> NOTICE:  word is too long to be indexed
>>> DETAIL:  Words longer than 2047 characters are ignored.
>>> NOTICE:  word is too long to be indexed
>>> DETAIL:  Words longer than 2047 characters are ignored.
>>> ERROR:  index row requires 13356 bytes, maximum size is 8191
>>>
>>> Any ideas about how to resolve this?
>>>
>>>
>>
>>     Regards,
>>         Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

В списке pgsql-general по дате отправления:

Предыдущее
От: Phoenix Kiula
Дата:
Сообщение: Re: Smartest way to resize a column?
Следующее
От: Phoenix Kiula
Дата:
Сообщение: Fastest way to drop an index?