Обсуждение: [GENERAL] streaming replication - crash on standby
The last line from pg_xlogdump of the last WAL file on the crashed standby server shows the following.
pg_xlogdump: FATAL: error in WAL record at DF/4CB95FD0: unexpected pageaddr DB/62B96000 in log segment 00000000000000DF0000004C, offset 12148736
I believe this means the standby server received WAL file out of order? But why did it crash? Is crashing normal behavior in case like this?
Thanks,
Seong
Hi, On 2017-08-09 22:03:43 +0000, Seong Son (US) wrote: > The last line from pg_xlogdump of the last WAL file on the crashed standby server shows the following. > > pg_xlogdump: FATAL: error in WAL record at DF/4CB95FD0: unexpected pageaddr DB/62B96000 in log segment 00000000000000DF0000004C,offset 12148736 > > I believe this means the standby server received WAL file out of order? But why did it crash? Is crashing normal behaviorin case like this? This likely just means that that's the end of the WAL. - Andres
I see. Thank you. But the Postgresql process had crashed at that time so the streaming replication was no longer working. Why would it crashand is that normal? Thanks, Seong This email and any files transmitted with it are intended solely for the use of the individual or entity to whom they areaddressed. If you have received this email in error please notify the system manager. This message contains informationthat is intended only for the individual named. If you are not the named addressee you should not disseminate,distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mailby mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing,copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. -----Original Message----- From: Andres Freund [mailto:andres@anarazel.de] Sent: Wednesday, August 09, 2017 6:27 PM To: Seong Son (US) <Seong.Son@datapath.com> Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] streaming replication - crash on standby Hi, On 2017-08-09 22:03:43 +0000, Seong Son (US) wrote: > The last line from pg_xlogdump of the last WAL file on the crashed standby server shows the following. > > pg_xlogdump: FATAL: error in WAL record at DF/4CB95FD0: unexpected pageaddr DB/62B96000 in log segment 00000000000000DF0000004C,offset 12148736 > > I believe this means the standby server received WAL file out of order? But why did it crash? Is crashing normal behaviorin case like this? This likely just means that that's the end of the WAL. - Andres
Hi, Please quote properly on postgres mailing lists. On 2017-08-09 22:31:23 +0000, Seong Son (US) wrote: > I see. Thank you. > > But the Postgresql process had crashed at that time so the streaming replication was no longer working. Why would it crashand is that normal? You've given us absolutely zero information to be able to diagnose the problem. If you want somebody to help you you'll have to describe exactly what happened, and what the problem you're facing is. - Andres > This email and any files transmitted with it are intended solely for the use of the individual or entity to whom they areaddressed. If you have received this email in error please notify the system manager. This message contains informationthat is intended only for the individual named. If you are not the named addressee you should not disseminate,distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mailby mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing,copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. This footer makes no sense on a public list.
>-----Original Message----- >From: Andres Freund [mailto:andres@anarazel.de] >Sent: Wednesday, August 09, 2017 6:34 PM >To: Seong Son (US) <Seong.Son@datapath.com> >Cc: pgsql-general@postgresql.org >Subject: Re: [GENERAL] streaming replication - crash on standby > >Hi, > >Please quote properly on postgres mailing lists. > >On 2017-08-09 22:31:23 +0000, Seong Son (US) wrote: >> I see. Thank you. >> >> But the Postgresql process had crashed at that time so the streaming replication was no longer working. Why would itcrash and is that normal? > >You've given us absolutely zero information to be able to diagnose the problem. If you want somebody to help you you'llhave to describe exactly what happened, and what the problem you're facing is. > >- Andres Sorry for lack of info. I've gathered some more info. Hopefully it would be enough to help isolate the cause of the crashof the standby server. The servers are on Windows Server 2012 R2. Postgresql 9.6. Primary and standby servers are in two different cities connectedover VPN. Here's the last few lines from pg_log at the time of the strandby server's crash: 2017-08-08 21:17:56 UTC FATAL: invalid memory alloc request size 1656315904 2017-08-08 21:17:56 UTC LOG: startup process (PID 2972) exited with exit code 1 2017-08-08 21:17:56 UTC LOG: terminating any other active server processes 2017-08-08 21:17:56 UTC WARNING: terminating connection because of crash of another server process 2017-08-08 21:17:56 UTC DETAIL: The postmaster has commanded this server process to roll back the current transaction andexit, because another server process exited abnormally and possibly corrupted shared memory. 2017-08-08 21:17:56 UTC HINT: In a moment you should be able to reconnect to the database and repeat your command. 2017-08-08 21:17:56 UTC WARNING: terminating connection because of crash of another server process 2017-08-08 21:17:56 UTC DETAIL: The postmaster has commanded this server process to roll back the current transaction andexit, because another server process exited abnormally and possibly corrupted shared memory. 2017-08-08 21:17:56 UTC HINT: In a moment you should be able to reconnect to the database and repeat your command. 2017-08-08 21:17:56 UTC LOG: database system is shut down And this is the last entry from pg_xlogdump: -08 21:17:36.864852 Coordinated Universal Time pg_xlogdump: FATAL: error in WAL record at DF/4CB95FD0: unexpected pageaddr DB/62B96000 in log segment 00000000000000DF0000004C,offset 12148736 One thing I noticed is that the network is not the most stable. When I ran wireshark capture on port 5432, I saw numerouserrors and warning like "New fragment overlaps old data (retransmission?)" "This frame is a (suspected) out-of-order segment" "This frame is a (suspected) retransmission" So the questions are, why did the standby server crash? Could the network instability be the cause for the crash? Thank you in advance for any info. Seong