Обсуждение: Negative imact of maintenance_work_mem to GIN size
Hi!
I found that sometimes larger maintenance_work_mem leads to larger GIN index. That is quite strange. ISTM that it's related to posting lists compression but I can't figure out how exactly it is.
See example on delicious bookmarks dataset.
set maintenance_work_mem = '1GB';
create index js_idx1 on js using gin(v jsonb_path_idx);
set maintenance_work_mem = '16MB';
create index js_idx2 on js using gin(v jsonb_path_ops);
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+---------+-------+--------+-------+--------+-------------
public | js_idx1 | index | smagen | js | 432 MB |
public | js_idx2 | index | smagen | js | 309 MB |
(2 rows)
------
With best regards,
Alexander Korotkov.
With best regards,
Alexander Korotkov.
On Tue, May 20, 2014 at 4:50 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
------
With best regards,
Alexander Korotkov.
I found that sometimes larger maintenance_work_mem leads to larger GIN index. That is quite strange. ISTM that it's related to posting lists compression but I can't figure out how exactly it is.
It appears to be not related to posting lists compression. I did reproduce it on 9.2.
create table test as (select array[(random()*10000000)::int]::int[] as v from generate_series(1,10000000) g);
set maintenance_work_mem = '2GB';
create index test_idx1 on test using gin(v);
set maintenance_work_mem = '16MB';
create index test_idx2 on test using gin(v);
Schema | Name | Type | Owner | Table | Size | Description
--------+----------------------------+-------+----------+-------------+---------+-------------
public | test_idx1 | index | smagen | test | 392 MB |
public | test_idx2 | index | smagen | test | 268 MB |
(2 rows)
The reason of it is that we filling entry tree with inserting without any special optimization. Large maintenance_work_mem gives us ascending order of insertion. Thus insertion is performed always into rightmost page leaving rest of pages half-empty. Small maintenance_work_mem gives us more random order of insertion. Such insertions makes pages about 75% filled in average. Posting trees has special optimization for this case while entry tree doesn't.
------
With best regards,
Alexander Korotkov.