On Mon, Nov 19, 2018 at 6:52 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> Hi,
>
> It seems we have pretty annoying problem with logical decoding when
> performing VACUUM FULL / CLUSTER on a table with toast-ed data.
>
> The trouble is that the rewritten heap is WAL-logged using XLOG/FPI
> records, the TOAST data is logged as regular INSERT records. XLOG/FPI is
> ignored in logical decoding, and so reorderbuffer never gets those
> records. But we do decode the TOAST data, and reorderbuffer stashes them
> in toast_hash hash table. Which gets reset only when handling a row from
> the main heap, and that never arrives. So we end up stashing all the
> TOAST data in memory :-(
>
> So do VACUUM FULL (or CLUSTER) on a sufficiently large table, and you're
> likely to break any logical replication connection. And it does not
> matter if you replicate this particular table.
>
> Luckily enough, this can leverage some of the pieces introduced by
> e9edc1ba which was meant to deal with rewrites of system tables, and in
> raw_heap_insert it added this:
>
> /*
> * The new relfilenode's relcache entrye doesn't have the necessary
> * information to determine whether a relation should emit data for
> * logical decoding. Force it to off if necessary.
> */
> if (!RelationIsLogicallyLogged(state->rs_old_rel))
> options |= HEAP_INSERT_NO_LOGICAL;
>
> As raw_heap_insert is used only for heap rewrites, we can simply remove
> the if condition and use the HEAP_INSERT_NO_LOGICAL flag for all TOAST
> data logged from here.
>
This fix seems fine to me.
> This does fix the issue, because we still decode the TOAST changes but
> there are no data and so
>
> if (change->data.tp.newtuple != NULL)
> {
> dlist_delete(&change->node);
> ReorderBufferToastAppendChunk(rb, txn, relation,
> change);
> }
>
> ends up not stashing the change in the hash table.
With the below change you proposed can we remove the above condition
because toast-insertion changes being processed by the reorder buffer
always have a new tuple? If a toast-insertion record doesn't have a
new tuple it has already ignored when decoding.
> It's imperfect,
> because we still decode the changes (and stash them to disk), but ISTM
> that can be fixed by tweaking DecodeInsert a bit to just ignore those
> changes entirely.
>
> Attached is a PoC patch with these two fixes.
>
I think this change is also fine.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center