Re: [PoC] Improve dead tuple storage for lazy vacuum

Поиск
Список
Период
Сортировка
От John Naylor
Тема Re: [PoC] Improve dead tuple storage for lazy vacuum
Дата
Msg-id CAFBsxsHrvTPUK=C1=xweJjGujja4Xjfgva3C8jnW3Shz6RBnFg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
Ответы Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers
I ran a couple "in situ" tests on server hardware using UUID columns, since they are common in the real world and have bad correlation to heap order, so are a challenge for index vacuum.

=== test 1, delete everything from a small table, with very small maintenance_work_mem:

alter system set shared_buffers ='4GB';
alter system set max_wal_size ='10GB';
alter system set checkpoint_timeout ='30 min';
alter system set autovacuum =off;

-- unrealistically low
alter system set maintenance_work_mem = '32MB';

create table if not exists test (x uuid);
truncate table test;
insert into test (x) select gen_random_uuid() from generate_series(1,50*1000*1000);
create index on test (x);

delete from test;
vacuum (verbose, truncate off) test;
--

master:
INFO:  finished vacuuming "john.naylor.public.test": index scans: 9
system usage: CPU: user: 70.04 s, system: 19.85 s, elapsed: 802.06 s

v29 patch:
INFO:  finished vacuuming "john.naylor.public.test": index scans: 1
system usage: CPU: user: 9.80 s, system: 2.62 s, elapsed: 36.68 s

This is a bit artificial, but it's easy to construct cases where the array leads to multiple index scans but the new tid store can fit everythin without breaking a sweat. I didn't save the progress reporting, but v29 was using about 11MB for tid storage.


=== test 2: try to stress tid lookup with production maintenance_work_mem:
1. use unlogged table to reduce noise
2. vacuum freeze first to reduce heap scan time
3. delete some records at the beginning and end of heap to defeat binary search's pre-check

alter system set shared_buffers ='4GB';
alter system set max_wal_size ='10GB';
alter system set checkpoint_timeout ='30 min';
alter system set autovacuum =off;

alter system set maintenance_work_mem = '1GB';

create unlogged table if not exists test (x uuid);
truncate table test;
insert into test (x) select gen_random_uuid() from generate_series(1,1000*1000*1000);
vacuum_freeze test;

select pg_size_pretty(pg_table_size('test'));
 pg_size_pretty
----------------
 41 GB

create index on test (x);

select pg_size_pretty(pg_total_relation_size('test'));
 pg_size_pretty
----------------
 71 GB

select max(ctid) from test;
     max      
--------------
 (5405405,75)

delete from test where ctid <  '(100000,0)'::tid;
delete from test where ctid > '(5300000,0)'::tid;

vacuum (verbose, truncate off) test;

both:
INFO:  vacuuming "john.naylor.public.test"
INFO:  finished vacuuming "john.naylor.public.test": index scans: 1
index scan needed: 205406 pages from table (3.80% of total) had 38000000 dead item identifiers removed

--
master:
system usage: CPU: user: 134.32 s, system: 19.24 s, elapsed: 286.14 s

v29 patch:
system usage: CPU: user:  97.71 s, system: 45.78 s, elapsed: 573.94 s

The entire vacuum took 25% less wall clock time. Reminder that this is without wal logging, and also unscientific because only one run.

--
I took 10 seconds of perf data while index vacuuming was going on (showing calls > 2%):

master:
  40.59%  postgres  postgres            [.] vac_cmp_itemptr
  24.97%  postgres  libc-2.17.so        [.] bsearch
   6.67%  postgres  postgres            [.] btvacuumpage
   4.61%  postgres  [kernel.kallsyms]   [k] copy_user_enhanced_fast_string
   3.48%  postgres  postgres            [.] PageIndexMultiDelete
   2.67%  postgres  postgres            [.] vac_tid_reaped
   2.03%  postgres  postgres            [.] compactify_tuples
   2.01%  postgres  libc-2.17.so        [.] __memcpy_ssse3_back

v29 patch:

  29.22%  postgres  postgres            [.] TidStoreIsMember
   9.30%  postgres  postgres            [.] btvacuumpage
   7.76%  postgres  postgres            [.] PageIndexMultiDelete
   6.31%  postgres  [kernel.kallsyms]   [k] copy_user_enhanced_fast_string
   5.60%  postgres  postgres            [.] compactify_tuples
   4.26%  postgres  libc-2.17.so        [.] __memcpy_ssse3_back
   4.12%  postgres  postgres            [.] hash_search_with_hash_value

--
master:
psql -c "select phase, heap_blks_total, heap_blks_scanned, max_dead_tuples, num_dead_tuples from pg_stat_progress_vacuum"
       phase       | heap_blks_total | heap_blks_scanned | max_dead_tuples | num_dead_tuples
-------------------+-----------------+-------------------+-----------------+-----------------
 vacuuming indexes |         5405406 |           5405406 |       178956969 |        38000000

v29 patch:
psql  -c "select phase, heap_blks_total, heap_blks_scanned, max_dead_tuple_bytes, dead_tuple_bytes from pg_stat_progress_vacuum"
       phase       | heap_blks_total | heap_blks_scanned | max_dead_tuple_bytes | dead_tuple_bytes
-------------------+-----------------+-------------------+----------------------+------------------
 vacuuming indexes |         5405406 |           5405406 |           1073670144 |          8678064

Here, the old array pessimistically needs 1GB allocated (as for any table > ~5GB), but only fills 228MB for tid lookup. The patch reports 8.7MB. Tables that only fit, say, 30-50 tuples per page will have less extreme differences in memory use. Same for the case where only a couple dead items occur per page, with many uninteresting pages in between. Even so, the allocation will be much more accurately sized in the patch, especially in non-parallel vacuum.

There are other cases that could be tested (I mentioned some above), but this is enough to show the improvements possible.

I still need to do some cosmetic follow-up to v29 as well as a status report, and I will try to get back to that soon.

--
John Naylor
EDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Richard Guo
Дата:
Сообщение: Re: Wrong query results caused by loss of join quals
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: buildfarm + meson