Обсуждение: WAL Replication query

Поиск
Список
Период
Сортировка

WAL Replication query

От
KK CHN
Дата:
List,

I am  trying  WAL replication,the setup is working  with a primary and Standby server.

Query is  :   When the primary crashes  due an unforeseen reason,  what may happen to the uncommitted transactions at primary ?  
Those  won't replicate to the standby right ?

How much data loss occurs in this case ?  
What is the mechanism to overcome this uncommitted transaction loss due to primary server crash which is not   replicated to the standby  server ?

Thank you,
 Krishane

Re: WAL Replication query

От
Laurenz Albe
Дата:
On Tue, 2022-11-01 at 11:25 +0530, KK CHN wrote:
> I am  trying  WAL replication,the setup is working  with a primary and Standby server.
> 
> Query is  :   When the primary crashes  due an unforeseen reason,
> what may happen to the uncommitted transactions at primary ?

Their effects remain invisible.
  
> Those  won't replicate to the standby right ?

They may be replicated, but their effects will remain invisible there too.

> How much data loss occurs in this case ?

Data loss?  I don't understand.  There is no data loss with replication.
If an incomplete transaction is rolled back, that is no data loss.

> What is the mechanism to overcome this uncommitted transaction loss due to primary
> server crash which is not   replicated to the standby  server ?

I don't understand.  Are you perhaps talking about failover?

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com



RE: [External] Re: WAL Replication query

От
Sacheen Birhade
Дата:
If replication is in asynchronous then there will be data loss, right Laurenz?

Thanks,
Sacheen Birhade.

-----Original Message-----
From: Laurenz Albe <laurenz.albe@cybertec.at>
Sent: Tuesday, November 1, 2022 12:11 PM
To: KK CHN <kkchn.in@gmail.com>; pgsql-admin@lists.postgresql.org
Subject: [External] Re: WAL Replication query

On Tue, 2022-11-01 at 11:25 +0530, KK CHN wrote:
> I am  trying  WAL replication,the setup is working  with a primary and Standby server.
>
> Query is  :   When the primary crashes  due an unforeseen reason, what
> may happen to the uncommitted transactions at primary ?

Their effects remain invisible.
  
> Those  won't replicate to the standby right ?

They may be replicated, but their effects will remain invisible there too.

> How much data loss occurs in this case ?

Data loss?  I don't understand.  There is no data loss with replication.
If an incomplete transaction is rolled back, that is no data loss.

> What is the mechanism to overcome this uncommitted transaction loss
> due to primary server crash which is not   replicated to the standby  server ?

I don't understand.  Are you perhaps talking about failover?

Yours,
Laurenz Albe
--
Cybertec |
https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cybertec-postgresql.com%2F&data=05%7C01%7Csacheen.birhade%40veritas.com%7Ce6d7b7ce704e44eaa52208dabbd41e93%7Cfc8e13c0422c4c55b3eaca318e6cac32%7C0%7C0%7C638028816981291760%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=e5mP%2BJjxdGUSMFXWUowX0CUZJYDOu%2FRneDQ0jSSK3nA%3D&reserved=0





Re: [External] Re: WAL Replication query

От
Laurenz Albe
Дата:
On Tue, 2022-11-01 at 06:44 +0000, Sacheen Birhade wrote:
> If replication is in asynchronous then there will be data loss, right Laurenz?

Why?  The data will perhaps show up on the standby a little later, but why is
that data loss?  Remember that the question was about replication, and there
was no mention of failover.

Yours,
Laurenz Albe



RE: [External] Re: WAL Replication query

От
Sacheen Birhade
Дата:
Yes, Laurenz. You are right.

-----Original Message-----
From: Laurenz Albe <laurenz.albe@cybertec.at> 
Sent: Tuesday, November 1, 2022 12:23 PM
To: Sacheen Birhade <sacheen.birhade@veritas.com>; KK CHN <kkchn.in@gmail.com>; pgsql-admin@lists.postgresql.org
Subject: Re: [External] Re: WAL Replication query

On Tue, 2022-11-01 at 06:44 +0000, Sacheen Birhade wrote:
> If replication is in asynchronous then there will be data loss, right Laurenz?

Why?  The data will perhaps show up on the standby a little later, but why is that data loss?  Remember that the
questionwas about replication, and there was no mention of failover.
 

Yours,
Laurenz Albe

Re: [External] Re: WAL Replication query

От
Ron
Дата:
On 11/1/22 01:53, Laurenz Albe wrote:
On Tue, 2022-11-01 at 06:44 +0000, Sacheen Birhade wrote:
If replication is in asynchronous then there will be data loss, right Laurenz?
Why?  The data will perhaps show up on the standby a little later, but why is
that data loss?  Remember that the question was about replication, and there
was no mention of failover.

No, the question was about a crash during replication: OP (not Sacheen, unless that person is using two email addresses) explicitly asked "When the primary crashes  due an unforeseen reason (what happens)?"

If the two database systems are really busy, and especially if the network connection isn't fast enough, async replication means there might be some transactions committed on Primary which were queued for transmission, but hadn't yet made it to the Secondary, right?

--
Angular momentum makes the world go 'round.

Re: [External] Re: WAL Replication query

От
Laurenz Albe
Дата:
On Tue, 2022-11-01 at 02:43 -0500, Ron wrote:
> On 11/1/22 01:53, Laurenz Albe wrote:
> > On Tue, 2022-11-01 at 06:44 +0000, Sacheen Birhade wrote:
> > > If replication is in asynchronous then there will be data loss, right Laurenz?
> >
> > Why?  The data will perhaps show up on the standby a little later, but why is
> > that data loss?  Remember that the question was about replication, and there
> > was no mention of failover.
> 
>  No, the question was about a crash during replication: OP (not Sacheen, unless that person
> is using two email addresses) explicitly asked "When the primary crashes  due an unforeseen
> reason (what happens)?"
>  
>  If the two database systems are really busy, and especially if the network connection
> isn't fast enough, async replication means there might be some transactions committed
> on Primary which were queued for transmission, but hadn't yet made it to the Secondary, right?

Right.  And how does that constitute data loss?  If you start the primary again, the transaction
will be replicated just fine.  Now if you call it *data delay*, I would agree.

Yours,
Laurenz Albe



Re: [External] Re: WAL Replication query

От
Ron
Дата:
On 11/1/22 08:34, Laurenz Albe wrote:
On Tue, 2022-11-01 at 02:43 -0500, Ron wrote:
On 11/1/22 01:53, Laurenz Albe wrote:
On Tue, 2022-11-01 at 06:44 +0000, Sacheen Birhade wrote:
If replication is in asynchronous then there will be data loss, right Laurenz?
Why?  The data will perhaps show up on the standby a little later, but why is
that data loss?  Remember that the question was about replication, and there
was no mention of failover.
 No, the question was about a crash during replication: OP (not Sacheen, unless that person
is using two email addresses) explicitly asked "When the primary crashes  due an unforeseen
reason (what happens)?"
 
 If the two database systems are really busy, and especially if the network connection
isn't fast enough, async replication means there might be some transactions committed
on Primary which were queued for transmission, but hadn't yet made it to the Secondary, right?
Right.  And how does that constitute data loss?  If you start the primary again, the transaction
will be replicated just fine.  Now if you call it *data delay*, I would agree.

Primary has crashed, according to OP; it's not coming back any time soon.  And thus you promote Secondary to be New Primary, and go about  your work.  When the Old Primary comes back up (hours or days later), you do a pg_basebackup to make it the New Secondary.

--
Angular momentum makes the world go 'round.

Re: [External] WAL Replication query

От
Scott Ribe
Дата:
> On Nov 1, 2022, at 7:34 AM, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> Right.  And how does that constitute data loss?  If you start the primary again, the transaction
> will be replicated just fine.  Now if you call it *data delay*, I would agree.

It comes down to this: there is no magic. Replication is async with delay between primary & secondary, or replication
issynchronous with the delay visible to clients. Transaction commit completes, or it doesn't. And so on. 


Re: [External] Re: WAL Replication query

От
Laurenz Albe
Дата:
On Tue, 2022-11-01 at 08:49 -0500, Ron wrote:
> Primary has crashed, according to OP; it's not coming back any time soon.  And thus you promote
> Secondary to be New Primary, and go about  your work.  When the Old Primary comes back up
> (hours or days later), you do a pg_basebackup to make it the New Secondary.

There was no word about failover or "not coming back any time soon" (that's why I asked for
clarification in my answer).  In fact, PostgreSQL will often undergo crash recovery and come
back up automatically.  See the parameter "restart_after_crash", which defaults to "on".

Yours,
Laurenz Albe



Re: [External] Re: WAL Replication query

От
Ron
Дата:
On 11/1/22 08:56, Laurenz Albe wrote:
On Tue, 2022-11-01 at 08:49 -0500, Ron wrote:
Primary has crashed, according to OP; it's not coming back any time soon.  And thus you promote
Secondary to be New Primary, and go about  your work.  When the Old Primary comes back up
(hours or days later), you do a pg_basebackup to make it the New Secondary.
There was no word about failover 

On of the purposes of replication is so that you can fail over to it when Primary crashes, right?

or "not coming back any time soon" (that's why I asked for
clarification in my answer).

It does of course depend on what crashes, and how long it takes to restart.

Just the other day, Veeam somehow (for at least the second time) broke all connections to a VM running under ESX.  It took six hours to get fixed.  That VM was running Postgresql.  If the Pg cluster was replicated, I'd have promoted the Secondary to Primary for my customer.

--
Angular momentum makes the world go 'round.

Re: [External] Re: WAL Replication query

От
KK CHN
Дата:


On Tue, Nov 1, 2022 at 7:26 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Tue, 2022-11-01 at 08:49 -0500, Ron wrote:
> Primary has crashed, according to OP; it's not coming back any time soon.  And thus you promote
> Secondary to be New Primary, and go about  your work.  When the Old Primary comes back up
> (hours or days later), you do a pg_basebackup to make it the New Secondary.

There was no word about failover or "not coming back any time soon" (that's why I asked for
clarification in my answer).  In fact, PostgreSQL will often undergo crash recovery and come
back up automatically.  See the parameter "restart_after_crash", which defaults to "on".

Sorry,  It was my mistake not to mention the failover to the Standby required. 

  I  mean when the Primary Server crashes due to a H/W failure or whatever other reason,  Then   in my case  I am forced to  promote the Standby server as  Primary until the   old  primary  server H/W vendor replaces the hardware component( it may vary from hours to days as pre SLA  terms with H/W vendor).  

In this scenario  any transactions happened in  crashed Primary server which were not transferred due to the delay in transfer to the Standby server.  But   I am in need to promote the Standby to Primary and  it will run for another one day or two, until my  Old crashed server failed component is replaced with a new H/W part.   

So there will be an inconsistency in the data  between the current  Primary  ( promoted standby)  and my recovered /repaired old server ( which was primary before the crash).  

   How to handle this data inconsistency or what best mechanism needs to be implemented if you were me in this  use case scenario?   



 
Yours,
Laurenz Albe


Re: [External] Re: WAL Replication query

От
Ron
Дата:
On 11/1/22 23:00, KK CHN wrote:
[snip]
So there will be an inconsistency in the data  between the current  Primary  ( promoted standby)  and my recovered /repaired old server ( which was primary before the crash).  

   How to handle this data inconsistency or what best mechanism needs to be implemented if you were me in this  use case scenario?  

Before promoting Secondary to New Primary, I'd record it's current pg_stat_replication.replay_lsn value somewhere and then compare it to the relevant LSN on Old Primary when it comes back up.

If they're the same, then there's no data inconsistency.

If there is, then you've got to manually check tables, since every application and database structure is different

--
Angular momentum makes the world go 'round.

Re: [External] Re: WAL Replication query

От
Laurenz Albe
Дата:
On Wed, 2022-11-02 at 09:30 +0530, KK CHN wrote:
>   I  mean when the Primary Server crashes due to a H/W failure or whatever other reason,
> Then   in my case  I am forced to  promote the Standby server as  Primary until the
> old  primary  server H/W vendor replaces the hardware component( it may vary from hours to days as pre SLA  terms
withH/W vendor).  
 
> 
> In this scenario  any transactions happened in  crashed Primary server which were not transferred due to the
> delay in transfer to the Standby server.  But   I am in need to promote the Standby to Primary and
> it will run for another one day or two, until my  Old crashed server failed component is replaced with a new H/W
part.  
 
> 
> So there will be an inconsistency in the data  between the current  Primary  ( promoted standby)
> and my recovered /repaired old server ( which was primary before the crash).  
> 
>    How to handle this data inconsistency or what best mechanism needs to be implemented if you were me in this  use
casescenario?
 

Thanks for the clarification.

Yes, in that case you can lose committed data.  The normal way to avoid that is to use
synchronous replication.  That is easy to set up, but has some implications:

- it will slow down data modifications considerably
- you should have a low network latency between the machines
- the whole system becomes unavailable if the standby goes down, so have at
  least two synchronous standbys if you want good availability

Yours,
Laurenz Albe



Re: [External] Re: WAL Replication query

От
KK CHN
Дата:

Yes, in that case you can lose committed data.  The normal way to avoid that is to use
synchronous replication.  That is easy to set up, but has some implications:

- it will slow down data modifications considerably
- you should have a low network latency between the machines
- the whole system becomes unavailable if the standby goes down, so have at
  least two synchronous standbys if you want good availability


Thanking  all for the prompt replies and guidance for clarifying the doubts and queries. 

Krishane