Degraded performance during table rewrite

Поиск
Список
Период
Сортировка
От Mohamed Wael Khobalatte
Тема Degraded performance during table rewrite
Дата
Msg-id CABZeWdxmAafKEr7tugufWYUgAS3AnVRAwCRA8GOnRh-aTzeyCA@mail.gmail.com
обсуждение исходный текст
Ответы Re: Degraded performance during table rewrite  (Mohamed Wael Khobalatte <mkhobalatte@grubhub.com>)
Re: Degraded performance during table rewrite  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Degraded performance during table rewrite  (Ron <ronljohnsonjr@gmail.com>)
Список pgsql-general
Hi all, 

I am attempting to do a bunch of table rewrites to shrink a table in the absence of pg_repack and vacuum full (both not an option). The database is growing fast and has had significant bloat in both heaps and indexes, so index rebuilds alone won't cut it. We found that table inheritance can be used to achieve this rather nicely. We are running PG v9.6.18.

We are setting up the inheritance as follows: 

BEGIN;
ALTER TABLE #{table} RENAME TO #{table}_old;
CREATE TABLE #{table} (LIKE #{table}_old INCLUDING ALL);
ALTER TABLE #{table}_old INHERIT #{table};
ALTER SEQUENCE #{table}_id_seq OWNED BY #{table}.id;
COMMIT; 

Then, the migration itself runs as follows (each in a transaction, looping through records and sleeping for a bit)

WITH del AS (
  DELETE FROM #{old_table}
  WHERE id IN (
    SELECT id
    FROM #{old_table}
    WHERE id > #{max_deleted_id} -- This is the max deleted from the previous batch, we grab it programmatically.
    ORDER BY id ASC
    LIMIT #{batch_size}
  )
  RETURNING *
)
INSERT INTO #{table}
SELECT * FROM del
RETURNING id

For instance, the batch_size can be 10_000, and the code sleeps programatically for 200ms (this is done in a Ruby script).

"max_deleted_id" is passed to each run from the previous one. This improves the inner SELECT query.

This works very well. However, I noticed two suprising things:

1. The performance of the delete and insert drops by several orders of magnitude as the script runs. For instance, in one run, it goes from 150ms average run to 700ms per batch.

2. The explain itself takes a while to run on a sample batch. In one table, the explain alone took four seconds.

To try and reproduce this locally, I used the following dummy table:

create table towns (id serial primary key, code text, article text, name text, department text);

insert into towns (
    code, article, name, department
)
select
    left(md5(i::text), 10),
    md5(random()::text),
    md5(random()::text),
    left(md5(random()::text), 4)

from generate_series(1, 100000000) s(i);

This spends 150ms per batch, which climbs to 700ms per batch. A vacuum of the old table lowers is back to 150ms, but I don't understand why, because we structure the query to jump over all previously dead rows. There is an old thread in which Tom Lane mentions that the planner might itself be walking that primary index. Is this applicable here? And is there anything we can do besides more aggressive and continued vacuuming of the old table (or a change in autovacuum settings)? Ideally, we want to run this overnight without much supervision.  

В списке pgsql-general по дате отправления:

Предыдущее
От: Julien Rouhaud
Дата:
Сообщение: Re: survey: psql syntax errors abort my transactions
Следующее
От: Mohamed Wael Khobalatte
Дата:
Сообщение: Re: Degraded performance during table rewrite