Обсуждение: The way to know whether the standby has caught up with the master
Hi, For reliable high-availability, when the master crashes, the clusterware must know whether it can promote the standby safely without any data loss, before actually promoting it. IOW, it must know whether the standby has already caught up with the primary. Otherwise, failover might cause data loss. We can know that from pg_stat_replication on the master. But the problem is that pg_stat_replication is not available since the master is not running at that moment. So that info should be available also on the standby. To achieve that, I'm thinking to change walsender so that, when the standby has caught up with the master, it sends back the message indicating that to the standby. And I'm thinking to add new function (or view like pg_stat_replication) available on the standby, which shows that info. Thought? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 25.05.2011 07:42, Fujii Masao wrote: > For reliable high-availability, when the master crashes, the clusterware must > know whether it can promote the standby safely without any data loss, > before actually promoting it. IOW, it must know whether the standby has > already caught up with the primary. Otherwise, failover might cause data loss. > We can know that from pg_stat_replication on the master. But the problem > is that pg_stat_replication is not available since the master is not running at > that moment. So that info should be available also on the standby. > > To achieve that, I'm thinking to change walsender so that, when the standby > has caught up with the master, it sends back the message indicating that to > the standby. And I'm thinking to add new function (or view like > pg_stat_replication) > available on the standby, which shows that info. By the time the standby has received that message, it might not be caught-up anymore because new WAL might've been generated in the master already. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, May 25, 2011 at 2:16 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 25.05.2011 07:42, Fujii Masao wrote: >> >> For reliable high-availability, when the master crashes, the clusterware >> must >> know whether it can promote the standby safely without any data loss, >> before actually promoting it. IOW, it must know whether the standby has >> already caught up with the primary. Otherwise, failover might cause data >> loss. >> We can know that from pg_stat_replication on the master. But the problem >> is that pg_stat_replication is not available since the master is not >> running at >> that moment. So that info should be available also on the standby. >> >> To achieve that, I'm thinking to change walsender so that, when the >> standby >> has caught up with the master, it sends back the message indicating that >> to >> the standby. And I'm thinking to add new function (or view like >> pg_stat_replication) >> available on the standby, which shows that info. > > By the time the standby has received that message, it might not be caught-up > anymore because new WAL might've been generated in the master already. Right. But, thanks to sync rep, until such a new WAL has been replicated to the standby, the commit of transaction is not visible to the client. So, even if there are some WAL not replicated to the standby, the clusterware can promote the standby safely without any data loss (to the client point of view), I think. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, May 25, 2011 at 12:28 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Wed, May 25, 2011 at 2:16 PM, Heikki Linnakangas >>> >>> To achieve that, I'm thinking to change walsender so that, when the >>> standby >>> has caught up with the master, it sends back the message indicating that >>> to >>> the standby. And I'm thinking to add new function (or view like >>> pg_stat_replication) >>> available on the standby, which shows that info. >> >> By the time the standby has received that message, it might not be caught-up >> anymore because new WAL might've been generated in the master already. > > Right. But, thanks to sync rep, until such a new WAL has been replicated to > the standby, the commit of transaction is not visible to the client. So, even if > there are some WAL not replicated to the standby, the clusterware can promote > the standby safely without any data loss (to the client point of view), I think. > then, you also need to transmit to the standby if it is the current sync standby. -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL
On Wed, May 25, 2011 at 6:16 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >> To achieve that, I'm thinking to change walsender so that, when the >> standby >> has caught up with the master, it sends back the message indicating that >> to >> the standby. And I'm thinking to add new function (or view like >> pg_stat_replication) >> available on the standby, which shows that info. > > By the time the standby has received that message, it might not be caught-up > anymore because new WAL might've been generated in the master already. AFAICS, this is an exact case of the Byzantine General's problem. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > On 25.05.2011 07:42, Fujii Masao wrote: >> To achieve that, I'm thinking to change walsender so that, when the standby >> has caught up with the master, it sends back the message indicating that to >> the standby. And I'm thinking to add new function (or view like >> pg_stat_replication) >> available on the standby, which shows that info. > By the time the standby has received that message, it might not be > caught-up anymore because new WAL might've been generated in the master > already. Even assuming that you believe this is a useful capability, there is no need to change walsender. It *already* sends the current-end-of-WAL in every message, which indicates precisely whether the message contains all of available WAL data. regards, tom lane
On Wed, May 25, 2011 at 12:34:59PM +0100, Simon Riggs wrote: > On Wed, May 25, 2011 at 6:16 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: > > >> To achieve that, I'm thinking to change walsender so that, when the > >> standby > >> has caught up with the master, it sends back the message indicating that > >> to > >> the standby. And I'm thinking to add new function (or view like > >> pg_stat_replication) > >> available on the standby, which shows that info. > > > > By the time the standby has received that message, it might not be caught-up > > anymore because new WAL might've been generated in the master already. > > AFAICS, this is an exact case of the Byzantine General's problem. Have they updated it to acknowledge that the city is now called Istanbul? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Wed, May 25, 2011 at 3:11 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote: > On Wed, May 25, 2011 at 12:28 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Wed, May 25, 2011 at 2:16 PM, Heikki Linnakangas >>> By the time the standby has received that message, it might not be caught-up >>> anymore because new WAL might've been generated in the master already. >> >> Right. But, thanks to sync rep, until such a new WAL has been replicated to >> the standby, the commit of transaction is not visible to the client. So, even if >> there are some WAL not replicated to the standby, the clusterware can promote >> the standby safely without any data loss (to the client point of view), I think. > > then, you also need to transmit to the standby if it is the current > sync standby. Yes. After further thought, we can promote the standby safely only when the corresponding walsender meets the following conditions: 1. sync_state is "sync" 2. the standby's flush_location is bigger than or equal to the smallest wait locationin the sync rep queue. Which guarantees that all the committed transactions (i.e., their "success" indicationshave been returned to the client) have been replicated to the standby. Once the above conditions get satisfied, the failover is safe until sync_state is flipped to "async". By using this logic, walsender needs to check whether failover is safe, and send the message according to the result. One problem is that, when sync_state is flipped to "async", walsender might perform replication asynchronously before the standby receives the message indicating failover is unsafe. In this case, if the master crashes, the clusterware would wrongly think that failover is safe and promote the standby despite which causes data loss. To solve this problem, walsender would need to send that message *synchronously*, i.e., wait for the ACK of the message to arrive from the standby before actually changing sync_state to "async". Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, May 25, 2011 at 11:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> On 25.05.2011 07:42, Fujii Masao wrote: >>> To achieve that, I'm thinking to change walsender so that, when the standby >>> has caught up with the master, it sends back the message indicating that to >>> the standby. And I'm thinking to add new function (or view like >>> pg_stat_replication) >>> available on the standby, which shows that info. > >> By the time the standby has received that message, it might not be >> caught-up anymore because new WAL might've been generated in the master >> already. > > Even assuming that you believe this is a useful capability, there is no > need to change walsender. It *already* sends the current-end-of-WAL in > every message, which indicates precisely whether the message contains > all of available WAL data. That's not enough to calculate whether failover is safe or not. Even if the standby's flush location is equal to the master's current end location, new WAL might have already been generated, and the "success" indication of the corresponding transaction might have been returned to the client (this is possible only when async mode). So in addition to the master's current end location, the standby must know its sync mode, which walsender would need to send. Another problem is that, when we can safely promote the standby, the standby's flush location isn't always equal to the master's current end location. Imagine the case where there are some unsent WAL in the master and corresponding transactions are waiting for replication. In this case, obviously those locations are not the same. But in sync replication, we can guarantee that all the committed (from the client's view) transactions have been replicated to the standby, so failover is safe. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center