Обсуждение: WAL replication question

Поиск
Список
Период
Сортировка

WAL replication question

От
Keith Ouellette
Дата:

I am relatively new to PostgreSQL and have a question on WAL replication. I have two servers with one designated as Master. Both Master and Slave are configured simularly, with exception of the recovery.conf file only exists on the Slave. 

 

For failover, I am using the trigger file capability to "promote" the Slave in the event that the Master fails. That seems to work well. The question I have is when the former Master comes back on line, is there a way to make it a slave (adding the recovery.conf file) without restarting the PostgreSQL process? I currently do this with a restart, which works well, but we are using Pacemaker and the PostgreSQL resource goes away and takes the node down until it comes up. We would rather the node not go down if possible.

 

Is there a way to do this? Thank you in advance.

 

Keith

Re: WAL replication question

От
"Kevin Grittner"
Дата:
Keith Ouellette wrote:

> I am using the trigger file capability to "promote" the Slave in
> the event that the Master fails.

> when the former Master comes back on line, is there a way to make
> it a slave (adding the recovery.conf file) without restarting the
> PostgreSQL process? I currently do this with a restart, which
> works well, but we are using Pacemaker and the PostgreSQL
> resource goes away and takes the node down until it comes up. We
> would rather the node not go down if possible.

That's confusing -- if the master node fails, how can it still be
up?

Once you fail over, you should take care that the old master node
is good and truly down until it can be started as a slave.  You
will need to copy from the new master to the node which is to be
the new slave before you do that. And yes, add the recovery.conf.

-Kevin


Re: WAL replication question

От
Keith Ouellette
Дата:
Kevin,

  I understand the confusion. What I am being asked to do is the following:

1. Use pacemaker to determine who should be the Master node so it can assign a virtual IP to it.

2. Use the lsb:postgresql RA as a master resource to monitor the postgresql process on each node

3. Failover to the slave node when the master fails. Pacemaker moves over the virtual IP. I detect that in a process I
run(script that checks the location of the virtual IP) and promote the slave to master using the trigger file.  

4. When the failed server comes back up, I detect that it should no longer be master using the process and sync it to
thenew master, create the recovery.conf file and restart the postgres process. 

The issue is when we restart the postgresql, Pacemaker takes that node "down" in alarm. The team lead does not want
that.I am just wondering if it was possible to bring up the recovered server without restarting the postgresql process?


Thanks,
Keith


________________________________________
From: Kevin Grittner [kgrittn@mail.com]
Sent: Monday, January 21, 2013 4:07 PM
To: Keith Ouellette; pgsql-novice@postgresql.org
Subject: Re: [NOVICE] WAL replication question

Keith Ouellette wrote:

> I am using the trigger file capability to "promote" the Slave in
> the event that the Master fails.

> when the former Master comes back on line, is there a way to make
> it a slave (adding the recovery.conf file) without restarting the
> PostgreSQL process? I currently do this with a restart, which
> works well, but we are using Pacemaker and the PostgreSQL
> resource goes away and takes the node down until it comes up. We
> would rather the node not go down if possible.

That's confusing -- if the master node fails, how can it still be
up?

Once you fail over, you should take care that the old master node
is good and truly down until it can be started as a slave.  You
will need to copy from the new master to the node which is to be
the new slave before you do that. And yes, add the recovery.conf.

-Kevin

Re: WAL replication question

От
Michael Wood
Дата:
Hi

On 22 January 2013 15:22, Keith Ouellette <Keith.Ouellette@airgas.com> wrote:
> Kevin,
>
>   I understand the confusion. What I am being asked to do is the following:
>
> 1. Use pacemaker to determine who should be the Master node so it can assign a virtual IP to it.
>
> 2. Use the lsb:postgresql RA as a master resource to monitor the postgresql process on each node
>
> 3. Failover to the slave node when the master fails. Pacemaker moves over the virtual IP. I detect that in a process
Irun (script that checks the location of the virtual IP) and promote the slave to master using the trigger file. 
>
> 4. When the failed server comes back up, I detect that it should no longer be master using the process and sync it to
thenew master, create the recovery.conf file and restart the postgres process. 

How about at step 4:

When the failed server comes back up, do not automatically start
postgres, or in some other way prevent Pacemaker from thinking the
node is ready.  Then, when you've done whatever you need to to get it
ready as a slave, start it up and Pacemaker can see that it is now
ready.

> The issue is when we restart the postgresql, Pacemaker takes that node "down" in alarm. The team lead does not want
that.I am just wondering if it was possible to bring up the recovered server without restarting the postgresql process? 

I think the main thing is that until you've finished reconfiguring
postgres it is not "recovered", so you need to make sure Pacemaker
doesn't think it is recovered before it's actually ready.

Unless I've missed something :)  (I have no experience with Pacemaker
or postgres replication.)

--
Michael Wood <esiotrot@gmail.com>


Re: WAL replication question

От
Keith Ouellette
Дата:
Michael,

   Good points. Instead of automating the recovery, I am now forcing someone to look at the failed unit, by
automaticallyputting that node in standby. At this point, it makes things more stable and predictable.  

Thanks,
Keith


________________________________________
From: Michael Wood [esiotrot@gmail.com]
Sent: Tuesday, January 22, 2013 8:49 AM
To: Keith Ouellette
Cc: Kevin Grittner; pgsql-novice@postgresql.org
Subject: Re: [NOVICE] WAL replication question

Hi

On 22 January 2013 15:22, Keith Ouellette <Keith.Ouellette@airgas.com> wrote:
> Kevin,
>
>   I understand the confusion. What I am being asked to do is the following:
>
> 1. Use pacemaker to determine who should be the Master node so it can assign a virtual IP to it.
>
> 2. Use the lsb:postgresql RA as a master resource to monitor the postgresql process on each node
>
> 3. Failover to the slave node when the master fails. Pacemaker moves over the virtual IP. I detect that in a process
Irun (script that checks the location of the virtual IP) and promote the slave to master using the trigger file. 
>
> 4. When the failed server comes back up, I detect that it should no longer be master using the process and sync it to
thenew master, create the recovery.conf file and restart the postgres process. 

How about at step 4:

When the failed server comes back up, do not automatically start
postgres, or in some other way prevent Pacemaker from thinking the
node is ready.  Then, when you've done whatever you need to to get it
ready as a slave, start it up and Pacemaker can see that it is now
ready.

> The issue is when we restart the postgresql, Pacemaker takes that node "down" in alarm. The team lead does not want
that.I am just wondering if it was possible to bring up the recovered server without restarting the postgresql process? 

I think the main thing is that until you've finished reconfiguring
postgres it is not "recovered", so you need to make sure Pacemaker
doesn't think it is recovered before it's actually ready.

Unless I've missed something :)  (I have no experience with Pacemaker
or postgres replication.)

--
Michael Wood <esiotrot@gmail.com>