Обсуждение: massive update on gin index

Поиск
Список
Период
Сортировка

massive update on gin index

От
Marcos Pegoraro
Дата:
In a table with people's info I have 3 phone numbers, mobile, work and home. But then some have 2 mobiles, some have 2 work numbers, so decided to test it as an array of json. I know I could have another table for that, but I was just testing.

So my original table had
Mobile, Work, Home and all of them are btree indexed.

Then added a jsonb field and updated it with those 3 phone numbers on it
[{"phone": 2236279878, "type": 1}, {"phone": 22998432631, "type": 2}]
[{"phone": 22996783278, "type": 2}]
create index idxPhones on People using gin(Phones)

If I select using old or new fields, both uses index and Execution Time is similar
explain analyze select * from People where Phones @> '[{"phone": 2236279878}]';
explain analyze select * from People where Mobile = 2236279878 or Work = 2236279878 or Home = 2236279878;
 
But then I repeated 2 or 3 times that update which stores those 3 phones on json and then my gin index became slow, very very slow, why ?

select using btree on 3 phone numbers - Execution Time: 0.164 ms
select using gin on json on first update - Execution Time: 0.220 ms
select using gin on json next to 2 or 3 updates - Execution Time: 11.220 ms

And that execution time will come back to 0.220 ms only if I recreate the index.

Then I found gin_pending_list_limit and fast_update which I think are used to update GIN indexes, but didn´t find any examples of both.

What am I missing ? That gin index needs to have some more options or attributes on it ?
I know in a day by day use I'll never do that massive update twice but just to understand when will this index be updated ?

Thanks
Marcos

Re: massive update on gin index

От
Guyren Howe
Дата:
You might consider defining a phone type that includes your “type” information, and just having an array of those, if you really want to do something like this.

But a related table instead would be the obvious answer.

On Sep 14, 2022, at 12:33 , Marcos Pegoraro <marcos@f10.com.br> wrote:

In a table with people's info I have 3 phone numbers, mobile, work and home. But then some have 2 mobiles, some have 2 work numbers, so decided to test it as an array of json. I know I could have another table for that, but I was just testing.

So my original table had
Mobile, Work, Home and all of them are btree indexed.

Then added a jsonb field and updated it with those 3 phone numbers on it
[{"phone": 2236279878, "type": 1}, {"phone": 22998432631, "type": 2}]
[{"phone": 22996783278, "type": 2}]
create index idxPhones on People using gin(Phones)

If I select using old or new fields, both uses index and Execution Time is similar
explain analyze select * from People where Phones @> '[{"phone": 2236279878}]';
explain analyze select * from People where Mobile = 2236279878 or Work = 2236279878 or Home = 2236279878;
 
But then I repeated 2 or 3 times that update which stores those 3 phones on json and then my gin index became slow, very very slow, why ?

select using btree on 3 phone numbers - Execution Time: 0.164 ms
select using gin on json on first update - Execution Time: 0.220 ms
select using gin on json next to 2 or 3 updates - Execution Time: 11.220 ms

And that execution time will come back to 0.220 ms only if I recreate the index.

Then I found gin_pending_list_limit and fast_update which I think are used to update GIN indexes, but didn´t find any examples of both.

What am I missing ? That gin index needs to have some more options or attributes on it ?
I know in a day by day use I'll never do that massive update twice but just to understand when will this index be updated ?

Thanks
Marcos

Re: massive update on gin index

От
Rob Sargent
Дата:
On 9/14/22 13:38, Guyren Howe wrote:
> You might consider defining a phone type that includes your “type” 
> information, and just having an array of those, if you really want to 
> do something like this.
>
> But a related table instead would be the obvious answer.

Did you try a simple array of phone numbers?  If you really care about 
mobile,work,home prepend the number with one of HMW. Easily stripped off 
as necessary.  I've had decent performance with arrays in the past.





Re: massive update on gin index

От
Marcos Pegoraro
Дата:
Did you try a simple array of phone numbers?  If you really care about
mobile,work,home prepend the number with one of HMW. Easily stripped off
as necessary.  I've had decent performance with arrays in the past.

I know I have other options, and possibly better, but I was trying to understand what happens with gin indexes, just that.

Re: massive update on gin index

От
Tom Lane
Дата:
Marcos Pegoraro <marcos@f10.com.br> writes:
> I know I have other options, and possibly better, but I was trying to
> understand what happens with gin indexes, just that.

GIN does have a "pending list" of insertions not yet pushed into the main
index structure, and search performance will suffer if that gets too
bloated.  I don't recall much about how to control that, but I think
vacuuming the table will serve to empty the pending list.  Also see

https://www.postgresql.org/docs/current/gin-implementation.html#GIN-FAST-UPDATE

            regards, tom lane



Re: massive update on gin index

От
Marcos Pegoraro
Дата:
Em qua., 14 de set. de 2022 às 16:55, Tom Lane <tgl@sss.pgh.pa.us> escreveu:
GIN does have a "pending list" of insertions not yet pushed into the main
index structure, and search performance will suffer if that gets too
bloated.  I don't recall much about how to control that, but I think
vacuuming the table will serve to empty the pending list.  Also see

https://www.postgresql.org/docs/current/gin-implementation.html#GIN-FAST-UPDATE

                        regards, tom lane

Correct, if I want use that index immediately with same performance I have to call

select pg_catalog.gin_clean_pending_list('idxphones');

Or wait next autovacuum.

thanks
Marcos