Обсуждение: BUG #18027: Logical replication taking forever

Поиск
Список
Период
Сортировка

BUG #18027: Logical replication taking forever

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      18027
Logged by:          Andres Martin del Campo Campos
Email address:      andres@invisible.email
PostgreSQL version: 13.2
Operating system:   PostgreSQL 13.2 on x86_64-pc-linux-gnu, compiled b
Description:

I'm trying to use logical replication to replicate a table of 69GB it's been
a week and it hasn't synced but the table says (in the database where I'm
replicating this table) is now 400GB and I'm running out of space. What's
wrong, first I thought it was because of all the WAL logs but I'm now
assuming it's something else. Anyone has experienced something like this?
You help is much appreciated.


Re: BUG #18027: Logical replication taking forever

От
Amit Kapila
Дата:
On Tue, Jul 18, 2023 at 11:08 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      18027
> Logged by:          Andres Martin del Campo Campos
> Email address:      andres@invisible.email
> PostgreSQL version: 13.2
> Operating system:   PostgreSQL 13.2 on x86_64-pc-linux-gnu, compiled b
> Description:
>
> I'm trying to use logical replication to replicate a table of 69GB it's been
> a week and it hasn't synced but the table says (in the database where I'm
> replicating this table) is now 400GB and I'm running out of space. What's
> wrong, first I thought it was because of all the WAL logs but I'm now
> assuming it's something else.
>

This sounds a bit unusual. Have you verified the size of tables on
both publisher and subscriber? Can you once show the values of
pg_subscription_rel for this table? Are there any operations happening
on this table either on publisher or subscriber?

--
With Regards,
Amit Kapila.



Re: BUG #18027: Logical replication taking forever

От
Andres Martin del Campo Campos
Дата:
Thank you so much for replying, I'm really struggling here.

I ran out of space and I thought I could re-start it to see if it worked but again it's been more than a day and it isn't showing.

I ran this query and it shows that the table is still copying: 

SELECT s.subname AS subscription_name,

       c.relnamespace::regnamespace::text as table_schema,

       c.relname as table_name,

       rel.srsublsn,

       case rel.srsubstate 

         when 'i' then 'initialized'

         when 'd' then 'copying'

         when 's' then 'synchronized'

         when 'r' then 'ready'

       end as state

FROM pg_catalog.pg_subscription s

  JOIN pg_catalog.pg_subscription_rel rel ON rel.srsubid = s.oid

  JOIN pg_catalog.pg_class c on c.oid = rel.srrelid;




image.png


If I check the pg_stat_activity on the publication I also see the backend_type = 'walsender'


image.png



In the publication the table size is 70G as you can see in the screenshot below image.png


but in the subscription, it's already 124G and still copying 😭  any suggestions?
image.png

Thanks again for your support, Amit 

On Tue, Jul 18, 2023 at 10:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Jul 18, 2023 at 11:08 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      18027
> Logged by:          Andres Martin del Campo Campos
> Email address:      andres@invisible.email
> PostgreSQL version: 13.2
> Operating system:   PostgreSQL 13.2 on x86_64-pc-linux-gnu, compiled b
> Description:
>
> I'm trying to use logical replication to replicate a table of 69GB it's been
> a week and it hasn't synced but the table says (in the database where I'm
> replicating this table) is now 400GB and I'm running out of space. What's
> wrong, first I thought it was because of all the WAL logs but I'm now
> assuming it's something else.
>

This sounds a bit unusual. Have you verified the size of tables on
both publisher and subscriber? Can you once show the values of
pg_subscription_rel for this table? Are there any operations happening
on this table either on publisher or subscriber?

--
With Regards,
Amit Kapila.
Вложения

Re: BUG #18027: Logical replication taking forever

От
vignesh C
Дата:
On Wed, 19 Jul 2023 at 14:06, Andres Martin del Campo Campos
<andres@invisible.email> wrote:
>
> Thank you so much for replying, I'm really struggling here.
>
> I ran out of space and I thought I could re-start it to see if it worked but again it's been more than a day and it
isn'tshowing.
 
>

Can you run this query few times on the publisher and see if
bytes_processed and tuples_processed are getting increased(this might
give us some hint if it stuck or it is progressing slowly):
SELECT * FROM pg_stat_progress_copy;

If it is not progressing, is there anything unusual in the log file,
some warning or error messages? if so could you post those log
contents too.
Also if it is stuck, how many tuples are copied and how much is the
total tuple count?

Regards,
Vignesh



Re: BUG #18027: Logical replication taking forever

От
Andres Martin del Campo Campos
Дата:
Seems like I don't have that table


 image.png


There are no errors in the logs but I only see dead tuples and no live tuples

On Wed, Jul 19, 2023 at 5:07 AM vignesh C <vignesh21@gmail.com> wrote:
On Wed, 19 Jul 2023 at 14:06, Andres Martin del Campo Campos
<andres@invisible.email> wrote:
>
> Thank you so much for replying, I'm really struggling here.
>
> I ran out of space and I thought I could re-start it to see if it worked but again it's been more than a day and it isn't showing.
>

Can you run this query few times on the publisher and see if
bytes_processed and tuples_processed are getting increased(this might
give us some hint if it stuck or it is progressing slowly):
SELECT * FROM pg_stat_progress_copy;

If it is not progressing, is there anything unusual in the log file,
some warning or error messages? if so could you post those log
contents too.
Also if it is stuck, how many tuples are copied and how much is the
total tuple count?

Regards,
Vignesh
Вложения

Re: BUG #18027: Logical replication taking forever

От
vignesh C
Дата:
On Thu, 20 Jul 2023 at 22:46, Andres Martin del Campo Campos
<andres@invisible.email> wrote:
>
> Seems like I don't have that table
>
>
>
>
>
> There are no errors in the logs but I only see dead tuples and no live tuples

Sorry my bad, this is available only from PG14, it is not available in PG13.

Regards,
Vignesh



Re: BUG #18027: Logical replication taking forever

От
Amit Kapila
Дата:
On Thu, Jul 20, 2023 at 10:46 PM Andres Martin del Campo Campos <andres@invisible.email> wrote:
Seems like I don't have that table


 image.png


There are no errors in the logs but I only see dead tuples and no live tuples


oh, can you show us the dead and live tuple count on both publisher and subscriber? Ideally, COPY command should only copy the recent data based on the snapshot. It shouldn't copy the old/dead rows. One possibility I could think of is that due to some reason, if there is a failure during the initial sync process, it will ROLLBACK the whole copy and restart it again. So, that way one can see the table is growing with dead tuples and the copy is never finished especially if such an error occurs repeatedly. If that happens, you must see some error in the subscriber-side logs. Can you ensure in some way that such a phenomenon is not happening in your case?

--
With Regards,
Amit Kapila.
Вложения

Re: BUG #18027: Logical replication taking forever

От
Andres Martin del Campo Campos
Дата:
Thank you Amit! 
Here's the publisher:
 image.png

Here's the subscriber:
image.png

I don't see any errors in the logs but if you have time I would love to schedule a quick meeting with you and compensate you for your time of course.

Let me know if you are available

On Fri, Jul 21, 2023 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Jul 20, 2023 at 10:46 PM Andres Martin del Campo Campos <andres@invisible.email> wrote:
Seems like I don't have that table


 image.png


There are no errors in the logs but I only see dead tuples and no live tuples


oh, can you show us the dead and live tuple count on both publisher and subscriber? Ideally, COPY command should only copy the recent data based on the snapshot. It shouldn't copy the old/dead rows. One possibility I could think of is that due to some reason, if there is a failure during the initial sync process, it will ROLLBACK the whole copy and restart it again. So, that way one can see the table is growing with dead tuples and the copy is never finished especially if such an error occurs repeatedly. If that happens, you must see some error in the subscriber-side logs. Can you ensure in some way that such a phenomenon is not happening in your case?

--
With Regards,
Amit Kapila.
Вложения