Обсуждение: Slow GIN indexes after bulk insert

Поиск
Список
Период
Сортировка

Slow GIN indexes after bulk insert

От
Chris Spencer
Дата:

I have 5 GIN indexes on a PG 9.3 table containing about 50 million records. Each index covers a "group" of common records like billing address, shipping address, contact names, etc.

When first created, the indexes works miracles in speeding up the full text search of these fields. However, I'm running into some serious maintenance headaches.

After I insert a few thousand new records, the indexes seem to have no effect. A query that might normally take 1 second now takes 5 minutes.

If I drop and recreate the indexes, then performance returns to normal, but this causes a couple hours of downtime, even if I recreate indexes concurrently.

Why are these GIN indexes becoming unusable after a large number of inserts? Is this a known limitation or is there anything I can do to fix this?

Chris

Re: Slow GIN indexes after bulk insert

От
Tom Lane
Дата:
Chris Spencer <chrisspen@gmail.com> writes:
> I have 5 GIN indexes on a PG 9.3 table containing about 50 million records.
> Each index covers a "group" of common records like billing address,
> shipping address, contact names, etc.

> When first created, the indexes works miracles in speeding up the full text
> search of these fields. However, I'm running into some serious maintenance
> headaches.

> After I insert a few thousand new records, the indexes seem to have no
> effect. A query that might normally take 1 second now takes 5 minutes.

Probably the problem is that most of the new index entries are still
sitting in the index's pending list and haven't been merged to the main
search tree.  Try VACUUM'ing the table after a bulk insert.  (I think
there are other ways of triggering a GIN pending-list cleanup, but
I don't recall which ones were available in 9.3.)

Worthy of note here is that the max pending list size is governed by
work_mem, so a large work_mem can make this effect worse.  (We got
smart and made it an independent setting, but not till 9.5.)

            regards, tom lane


Re: Slow GIN indexes after bulk insert

От
Chris Spencer
Дата:
Thanks for the tips.

What constitutes a "large" work_mem? My server has 61GB of memory and my work_mem is currently set to include all of that. What percent of my total memory should I lower that to so it won't impact GIN updates?

On Mon, Mar 21, 2016 at 2:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Chris Spencer <chrisspen@gmail.com> writes:
> > I have 5 GIN indexes on a PG 9.3 table containing about 50 million records.
> > Each index covers a "group" of common records like billing address,
> > shipping address, contact names, etc.
>
> > When first created, the indexes works miracles in speeding up the full text
> > search of these fields. However, I'm running into some serious maintenance
> > headaches.
>
> > After I insert a few thousand new records, the indexes seem to have no
> > effect. A query that might normally take 1 second now takes 5 minutes.
>
> Probably the problem is that most of the new index entries are still
> sitting in the index's pending list and haven't been merged to the main
> search tree.  Try VACUUM'ing the table after a bulk insert.  (I think
> there are other ways of triggering a GIN pending-list cleanup, but
> I don't recall which ones were available in 9.3.)
>
> Worthy of note here is that the max pending list size is governed by
> work_mem, so a large work_mem can make this effect worse.  (We got
> smart and made it an independent setting, but not till 9.5.)
>
>                         regards, tom lane

Re: Slow GIN indexes after bulk insert

От
Tom Lane
Дата:
Chris Spencer <chrisspen@gmail.com> writes:
> What constitutes a "large" work_mem? My server has 61GB of memory and my
> work_mem is currently set to include all of that.

Ouch.  That's a mistake independently of GIN.  The primary usage of
work_mem is to define how much memory an individual sorting or hashing
query step is allowed to use.  A complex query might have several sort or
hash steps, and then you need to worry about concurrent queries in
different sessions; not to mention that this is not the only demand on
your server's RAM.  I'd be hesitant to set work_mem much above 1GB, maybe
even quite a bit less than that depending on what your workload is like.

Cutting work_mem to ~100MB might alone be enough to fix your GIN issue;
if not you could experiment with forced flushes of the GIN pending lists
via VACUUM (or ANALYZE might do it too, and be more directly useful).

            regards, tom lane


Re: Slow GIN indexes after bulk insert

От
Jeff Janes
Дата:
On Mon, Mar 21, 2016 at 10:55 AM, Chris Spencer <chrisspen@gmail.com> wrote:
> I have 5 GIN indexes on a PG 9.3 table containing about 50 million records.
> Each index covers a "group" of common records like billing address, shipping
> address, contact names, etc.
>
> When first created, the indexes works miracles in speeding up the full text
> search of these fields. However, I'm running into some serious maintenance
> headaches.
>
> After I insert a few thousand new records, the indexes seem to have no
> effect. A query that might normally take 1 second now takes 5 minutes.

I'd try turning fastupdate off on the indexes.  It will probably slow
down your bulk inserts, (perhaps by a lot, but you won't know until
you try).  But if you can handle that, it will make your slow select
problems go away like magic.

Cheers,

Jeff