Re: BUG #18433: Logical replication timeout

Поиск
Список
Период
Сортировка
От Shlok Kyal
Тема Re: BUG #18433: Logical replication timeout
Дата
Msg-id CANhcyEVo5hjhtqe8qyWRwpWgfgK4C1j0++zVZMkVtdTMtcdLCg@mail.gmail.com
обсуждение исходный текст
Ответ на BUG #18433: Logical replication timeout  (PG Bug reporting form <noreply@postgresql.org>)
Ответы Re: BUG #18433: Logical replication timeout  (Костянтин Томах <tomahkvt@gmail.com>)
Список pgsql-bugs
Hi,

On Mon, 15 Apr 2024 at 13:09, PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      18433
> Logged by:          Kostiantyn
> Email address:      tomahkvt@gmail.com
> PostgreSQL version: 13.14
> Operating system:   AWS RDS
> Description:
>
> On Postgresql 10 we used the following approach for the big tables:
> 1) Download schema from the source database instance
> 2) Deleted PK, FK, and Indexes for tables bigger than 100Gb from the
> schema
> 3)Upload the schema to the destination DB.
> 4) Configure Logical replication between source and destination DB.
> 5) When last table logical replication table synchronization worker for
> subscription "db_name_public_subscription", table "table_name" has
> finished
> 6) we created all the necessary PK, FK, and Indexes.
> This approach allowed to us upload data more quickly. This approach was
> working  great on PostgreSQL 10.
>
> We tried the same approach for Postgresql13, but we got an error.
> 1) Download schema from the source database instance
> 2) Deleted PK, FK, and Indexes for tables bigger than 100Gb from the
> schema
> 3)Upload the schema to the destination DB.
> 4) configurated identity replication full at source DB for tables bigger
> than 100Gb
> 5) Configured Logical replication between source and destination DB.
> 6) During catchup on this big table  process we got the following
> messages:
> Source DB
> 2024-04-08 15:38:34 UTC:(27994):replication_role@:[22047]:LOG: terminating
> walsender process due to replication timeout
> 2024-04-08 15:38:34 UTC:(27994):replication_role@:[22047]:CONTEXT: slot
> "db_name_public_subscription", output plugin "pgoutput", in the begin
> callback, associated LSN 13705/2E913050
> 2024-04-08 15:38:34 UTC:(27994):replication_role@:[22047]:STATEMENT:
> START_REPLICATION SLOT "db_name_public_subscription" LOGICAL 13702/C2C8FB30
> (proto_version '1', publication_names '"db_name_public_publication"')
> 2024-04-08 15:38:34 UTC:(36862):replication_role@:[22811]:LOG: terminating
> walsender process due to replication timeout
> 2024-04-08 15:38:34 UTC:(36862):replication_role@:[22811]:CONTEXT: slot
> "db_name_public_subscription_18989108_sync_17127", output plugin "pgoutput",
> in the begin callback, associated LSN 13703/27942B70
> 2024-04-08 15:38:34 UTC:(36862):replication_role@:[22811]:STATEMENT:
> START_REPLICATION SLOT "db_name_public_subscription_18989108_sync_17127"
> LOGICAL 13703/17622B58 (proto_version '1', publication_names
> '"db_name_public_publication"')
>
> One important point. If there is no request to source DB logical replication
> works fine for big tables.
> I saw the messages in PostgreSQL bugs like
> https://www.postgresql.org/message-id/flat/718213.1601410160%40sss.pgh.pa.us#7e61dd07661901b505bcbd74ce5f5f28
> But I also did some tests and increased wal_keep_size
> and max_slot_wal_keep_size to 1GB. And I set wal_sender_timeout to 1h but
> without success. The setup works in PG 13 only with a small amount of data.

I went through the issue and I think that the given logs are appearing
due to some delay in the Apply Worker process.
I could reproduce it in my development environment by applying delays
in Apply Worker.

I think this issue can be resolved by setting 'wal_sender_timeout' to
a greater value. Can you try setting 'wal_sender_timeout' to a greater
value?

Also, I noticed that while using Postgresql13 you are configuring the
table in sourceDB as REPLICA IDENTITY FULL but not doing the same in
Postgresql10. Is there any specific reason for it?
I pointed it out because REPLICA IDENTITY FULL has a different
behaviour and sets the entire row as key.

Thanks and Regards,
Shlok Kyal



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Следующее
От: Myka Dresser
Дата:
Сообщение: Re: BUG #18411: Unable to create database with owner on AWS RDS