Обсуждение: Strange message from pg_receivexlog
Hi all, My WAL archiving script based on pg_receivexlog reported the following error several days ago (just ignore everything before 'pg_receivexlog', it's a message my script generates). Thu Aug 15 18:33:09 MSK 2013 ERROR archive_wal.sh: Problem occured during WAL archiving: pg_receivexlog: could not send feedback packet: server closed the connection unexpectedly At the same time postgres reported this error in log: 2013-08-15 18:32:51 MSK 30945 postgres@[unknown] from [local] [vxid:53/0 txid:0] [streaming 2A97/6FA48000] LOG: terminating walsender process due to replication timeout Both pg_receivexlog and postgres run at the same machive, pg_receivexlog connects to postgres locally. /var/log/messages has absolutely nothing about it. I also have a hot standby on another machine connecting to the same master, but there is nothing strange in its logs either. Any thoughts what it was? -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray.ru@gmail.com
On Tue, Aug 20, 2013 at 8:17 AM, Sergey Konoplev <gray.ru@gmail.com> wrote: > Hi all, > > My WAL archiving script based on pg_receivexlog reported the following > error several days ago (just ignore everything before > 'pg_receivexlog', it's a message my script generates). > > Thu Aug 15 18:33:09 MSK 2013 ERROR archive_wal.sh: Problem occured > during WAL archiving: pg_receivexlog: could not send feedback packet: > server closed the connection unexpectedly > > At the same time postgres reported this error in log: > > 2013-08-15 18:32:51 MSK 30945 postgres@[unknown] from [local] > [vxid:53/0 txid:0] [streaming 2A97/6FA48000] LOG: terminating > walsender process due to replication timeout > > Both pg_receivexlog and postgres run at the same machive, > pg_receivexlog connects to postgres locally. /var/log/messages has > absolutely nothing about it. I also have a hot standby on another > machine connecting to the same master, but there is nothing strange in > its logs either. > > Any thoughts what it was? It looks like something outside postgres or pg_receivexlog did terminate the connection. pg_receievexlog noticed it was closed, whereas postgres waited for the timeout (but probably would've noticed if it had actually had some other data to send maybe?). Do you have some iptables connection tracking or something like that which could be interfering? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Tue, Aug 20, 2013 at 2:10 AM, Magnus Hagander <magnus@hagander.net> wrote: > It looks like something outside postgres or pg_receivexlog did > terminate the connection. pg_receievexlog noticed it was closed, > whereas postgres waited for the timeout (but probably would've noticed > if it had actually had some other data to send maybe?). Do you have > some iptables connection tracking or something like that which could > be interfering? AFAIU, just standard things: -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A INPUT -p icmp -j ACCEPT -A INPUT -i lo -j ACCEPT -A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT -A INPUT -j REJECT --reject-with icmp-host-prohibited -A FORWARD -j REJECT --reject-with icmp-host-prohibited Nothing looks suspicious for me. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray.ru@gmail.com
On Tue, Aug 20, 2013 at 3:17 PM, Sergey Konoplev <gray.ru@gmail.com> wrote: > Hi all, > > My WAL archiving script based on pg_receivexlog reported the following > error several days ago (just ignore everything before > 'pg_receivexlog', it's a message my script generates). > > Thu Aug 15 18:33:09 MSK 2013 ERROR archive_wal.sh: Problem occured > during WAL archiving: pg_receivexlog: could not send feedback packet: > server closed the connection unexpectedly > > At the same time postgres reported this error in log: > > 2013-08-15 18:32:51 MSK 30945 postgres@[unknown] from [local] > [vxid:53/0 txid:0] [streaming 2A97/6FA48000] LOG: terminating > walsender process due to replication timeout > > Both pg_receivexlog and postgres run at the same machive, > pg_receivexlog connects to postgres locally. /var/log/messages has > absolutely nothing about it. I also have a hot standby on another > machine connecting to the same master, but there is nothing strange in > its logs either. > > Any thoughts what it was? Is the value of replication_timeout sufficiently-larger than the status-interval of pg_receivexlog? Regards, -- Fujii Masao
On Wed, Aug 21, 2013 at 5:09 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> Thu Aug 15 18:33:09 MSK 2013 ERROR archive_wal.sh: Problem occured >> during WAL archiving: pg_receivexlog: could not send feedback packet: >> server closed the connection unexpectedly >> >> At the same time postgres reported this error in log: >> >> 2013-08-15 18:32:51 MSK 30945 postgres@[unknown] from [local] >> [vxid:53/0 txid:0] [streaming 2A97/6FA48000] LOG: terminating >> walsender process due to replication timeout > > Is the value of replication_timeout sufficiently-larger than the status-interval > of pg_receivexlog? The replication_timeout is 60s. The archive_wal.sh (script-wrapper around pg_receivexlog) reports its status straight away as it falls with an error. Below is the explanation of how it works. This is the core of archive_wal.sh: ( flock -xn 543 || exit 0 result=$($PGRECEIVEXLOG -n -D $WAL_ARCHIVE_DIR 2>&1) || \ die "Problem occured during WAL archiving: $result." ) 543>$WAL_LOCK_FILE And it is set to run by cron once a minute reporting me by email on occasions: MAILTO=gray.ru@gmail.com * * * * * /bin/bash /var/lib/pgsql/tmsdb/archive_wal.sh >>/var/log/tmsdb/archive_wal.log -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray.ru@gmail.com