it happened again. The weird thing is that when I query pg_stat_replication I see only one slave(the one that is still synced) and I dont see the second one. Moreover, I dont see anything in the repmgr log of the primary and in the slave regarding the disconnection...
"have it fail over to using the archived WALs instead of full database restore" How do I configure this ?
With Postgres replication, it’s configured it in the recovery.conf file using the “restore_command”. It would amount to a some script that connect into your backups and pulls the requested WAL file.
When you say no firewall; that is bit confusing and I’m left assuming that the nodes are on the same subnet? I normally only use replication slots with either a backup solution or a replia that is going over a WAN. I am bit perplex why replication would fall that far behind on a local network (send lag not replay lag). What is the interconnect; is it gigabit or 10g and what the volume of WALs being generated? Might have a network related issue here.