Re: Full text index not being used
От | Oleg Bartunov |
---|---|
Тема | Re: Full text index not being used |
Дата | |
Msg-id | Pine.LNX.4.64.0902031509450.4158@sn.sai.msu.ru обсуждение исходный текст |
Ответ на | Re: Full text index not being used (Alex Neth <alex@liivid.com>) |
Ответы |
Re: Full text index not being used
(Alex Neth <alex@liivid.com>)
|
Список | pgsql-general |
Alex, can you somehow identify document, which has problem with long word errors ? Also, if you have space on disk I'd recommend to try select *, to_tsvector('english',full_listing) as flv from source_listings; I don't remember if you said us information about your setup (pg version, OS, memory, what did you change in postgresql.conf..) Oleg On Tue, 3 Feb 2009, Alex Neth wrote: > Based on suggestions from this list, I am trying to create a tsvector column > and index that, since it is perhaps the recheck and rebuilding of all the > vectors that is slowing things down. I don't understand why a recheck is > necessary on a gin index..... > > My update statement has been running for 36 hours now and has not finished. > The statement is: update source_listings set flv = to_tsvector('english', > full_listing); I know that it is still working because it occasionally > prints out one of those long word errors. > > I have only 1.6M rows and each entry in that column is a standard size web > page with just the text, maybe 3-5K. > > For sure I don't have meaningful long words. Perhaps that is because it is > not handling the HTML well and I should be parsing down the web page first. > Hopefully that doesn't mean I need to rebuild this column over the course of > 3 days - I didn't expect it to take this long so I thought I'd just try it > out. > > > > > > On Feb 2, 2009, at 1:49 AM, Oleg Bartunov wrote: > >> Alex, >> >> what text you're indexing ? I don't believe you have meaningful >> very long words ( > 2047 characters). >> >> Do you really need multicolumn index ? >> >> I'd recommend to separate problem - create column fts for >> tsvector('english',full_listing), create index on it and try full-text >> query. The way you're doing imply calling to_tsvector every time you >> search, which can be very costly. >> >> >> Olegk >> >> On Sun, 1 Feb 2009, Alex wrote: >> >>> So this seems to be because the result size is too big. I still don't >>> know why it is looping through every record and printing a warning, >>> but adding a LIMIT makes the queries complete in a reasonable time >>> (although not all that fast). >>> >>> However I need to sort and also have many other facets that may or may >>> not be included in the query. Adding a sort makes it load every >>> record again and take forever. >>> >>> I tried to create an index including all of the fields I query on to >>> see if that would work, but I get an error the the index row is too >>> large: >>> >>> => create index master_index on source_listings(geo_lat, geo_lon, >>> price, bedrooms, region, city, listing_type, to_tsvector('english', >>> full_listing), post_time); >>> NOTICE: word is too long to be indexed >>> DETAIL: Words longer than 2047 characters are ignored. >>> NOTICE: word is too long to be indexed >>> DETAIL: Words longer than 2047 characters are ignored. >>> NOTICE: word is too long to be indexed >>> DETAIL: Words longer than 2047 characters are ignored. >>> NOTICE: word is too long to be indexed >>> DETAIL: Words longer than 2047 characters are ignored. >>> ERROR: index row requires 13356 bytes, maximum size is 8191 >>> >>> Any ideas about how to resolve this? >>> >>> >> >> Regards, >> Oleg >> _____________________________________________________________ >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >> Sternberg Astronomical Institute, Moscow University, Russia >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >> phone: +007(495)939-16-83, +007(495)939-23-83 Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
В списке pgsql-general по дате отправления: