Обсуждение: help with startup slave after pg_rewind

Поиск
Список
Период
Сортировка

help with startup slave after pg_rewind

От
Dylan Luong
Дата:

 

Hi

After promoting slave to master, I completed a pg_rewind of the slave (old master) to the new master. But when I try to start the slave I am getting the following error.

 

2018-09-20 07:53:51 ACST [20265]: [2-1] db=[unknown],user=replicant app=[unknown],host=10.69.20.22(51271) FATAL:  the database system is starting up

2018-09-20 07:53:51 ACST [20264]: [3-1] db=,user= app=,host= LOG:  restored log file "0000000C.history" from archive

2018-09-20 07:53:51 ACST [20264]: [4-1] db=,user= app=,host= LOG:  restored log file "0000000C0000085B00000000" from archive

2018-09-20 07:53:51 ACST [20264]: [5-1] db=,user= app=,host= LOG:  contrecord is requested by 85B/28

2018-09-20 07:53:51 ACST [20268]: [1-1] db=,user= app=,host= LOG:  started streaming WAL from primary at 85B/0 on timeline 12

2018-09-20 07:53:51 ACST [20264]: [6-1] db=,user= app=,host= LOG:  contrecord is requested by 85B/28

2018-09-20 07:53:51 ACST [20268]: [2-1] db=,user= app=,host= FATAL:  terminating walreceiver process due to administrator command

2018-09-20 07:53:51 ACST [20264]: [7-1] db=,user= app=,host= LOG:  restored log file "0000000C0000085B00000000" from archive

2018-09-20 07:53:51 ACST [20264]: [8-1] db=,user= app=,host= LOG:  contrecord is requested by 85B/28

2018-09-20 07:53:51 ACST [20264]: [9-1] db=,user= app=,host= LOG:  contrecord is requested by 85B/28

2018-09-20 07:53:51 ACST [20264]: [10-1] db=,user= app=,host= LOG:  restored log file "0000000C0000085B00000000" from archive

2018-09-20 07:53:51 ACST [20264]: [11-1] db=,user= app=,host= LOG:  contrecord is requested by 85B/28

2018-09-20 07:53:51 ACST [20264]: [12-1] db=,user= app=,host= LOG:  contrecord is requested by 85B/28

 

 

I tried to run pg_rewind again, but now it says I cannot do it as its already same timeline.

 

Regards

Dylan

 

Re: help with startup slave after pg_rewind

От
Michael Paquier
Дата:
On Wed, Sep 19, 2018 at 10:29:44PM +0000, Dylan Luong wrote:
> After promoting slave to master, I completed a pg_rewind of the slave
> (old master) to the new master. But when I try to start the slave I am
> getting the following error.
>
> I tried to run pg_rewind again, but now it says I cannot do it as its
> already same timeline.

What did pg_rewind tell you after the first run?  If you remove the set
of WAL segments on the rewound instance and let it replay only segments
from the archive, are you able to get past?

There is an inconsistency in the WAL records you are trying to replay.
In this case a contrecord refers to a WAL record split across multiple
pages.  The WAL reader is expecting one, and cannot find it.  And that's
not normal.  My bet is that something is wrong in your failover flow
which you think is right.  It is hard to get that right.
--
Michael

Вложения