BUG #15959: 'DROP EXTENSION pglogical' while an unused logical replication slot exists causes slot corruption

Поиск
Список
Период
Сортировка
От PG Bug reporting form
Тема BUG #15959: 'DROP EXTENSION pglogical' while an unused logical replication slot exists causes slot corruption
Дата
Msg-id 15959-9540507f02e93026@postgresql.org
обсуждение исходный текст
Ответы Re: BUG #15959: 'DROP EXTENSION pglogical' while an unused logicalreplication slot exists causes slot corruption  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      15959
Logged by:          Matt W
Email address:      wise@wiredgeek.net
PostgreSQL version: 10.6
Operating system:   Linux
Description:

This is a problem we ran into on multiple production databases before we
discovered the series of steps required to make it happen. The pattern
presented itself to us as we were doing a migration from a set of current
existing "source" databases into newer "replica" databases.

Overall the bug presents itself if you are using a Logical Replication Slot
on a "replica master" database that uses pglogical to replicate from a
"source master" database. The data flow looks like this:

```
[Source_DB] ----PGLogical----> [Replica_DB] ----LogicalReplicationSlot--->
pg_recvlogical
```

Short Description:
  If you have a logical replication slot created (but not being actively
consumed from) and you issue a 'DROP EXTENSION pglogical', it puts the
database into a bad state. Later when the consumer for that slot comes in
and tries to start replicating they will receive the following error:

  pg_recvlogical: unexpected termination of replication stream: ERROR:
could not find pg_class entry for 16387

Detailed Setup:
  To replicate the issue fully, check out the code at
https://github.com/diranged/postgres-logical-replication-pgclass-bug and
follow the instructions.

Business Impact:
  As soon as the logical replication slot is broken, there are two critical
impacts. First, if you rely on a fully in-tact stream of data replicating
out of your database into some other data path (for example, with
https://github.com/Nextdoor/pg-bifrost), you start losing data at the moment
in which the slot is broken. There is no way that we know of to "skip" the
broken record and move forward.

  Second, as soon as the replication slot breaks, Postgres begins backing up
WAL data on disk. If this goes unnoticed, the database can run itself out of
space and cause major problems. This is particularly painful in Amazon RDS
where you don't have control of moving the WAL data onto different
volumes.

Versions Affected:
  I've tested this on Postgres 10.6 -> 10.10,


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Konstantin Knizhnik
Дата:
Сообщение: Re: Postgres 10&11 data processing error
Следующее
От: PG Bug reporting form
Дата:
Сообщение: BUG #15960: ON CONFLICT Trying accessing to variables