Fwd: postgresql lost connection to repmgr arbitrarily

Поиск
Список
Период
Сортировка
От Zhaoxun Yan
Тема Fwd: postgresql lost connection to repmgr arbitrarily
Дата
Msg-id CADEX6_WZmRx-7HmkCgaN_RHriRakYtPMo7USc=FjNimeRWED3w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: postgresql lost connection to repmgr arbitrarily  (Zhaoxun Yan <yan.zhaoxun@gmail.com>)
Список pgsql-admin

I found the postgresql process with this line:
postgres: rep repmgr 172.17.1.2(60490) idle

It represents the TCP connection from local address 172.17.1.2:60490 and was labeled as "idle"
I checked a local connection to 172.17.1.2, which is the address of eth0,
It is a loopback connection just like localhost:
[root@yzx2 ~]# tracepath 172.17.1.2
 1:  yzx2                                                  0.057ms reached
     Resume: pmtu 65535 hops 1 back 1

Thus no router is involved in the repmgr-postgresql connection otherwise mtu<=1500
Does the "idle" label mean something?

Forwarded Conversation
Subject: postgresql lost connection to repmgr arbitrarily
------------------------

From: Zhaoxun Yan <yan.zhaoxun@gmail.com>
Date: Tue, Oct 17, 2023 at 2:55 PM
To: Pgsql-admin <pgsql-admin@lists.postgresql.org>


Hi!
It happens from time to time. At first I thought it was the router problem, so I changed the host in repmgr's configuration from its intranet address to 127.0.0.1, but it persists. Here is what happened according to repmgrd:

2023-10-17 00:21:50+0800: repmgrd_local_disconnect on node2, unable to connect to local node - happened
2023-10-17 00:21:50.471347+08: repmgrd_local_reconnect on node2, reconnected to local node after 0 seconds - happened
2023-10-17 10:23:05+0800: repmgrd_local_disconnect on node2, unable to connect to local node - happened
2023-10-17 10:23:05.473997+08: repmgrd_local_reconnect on node2, reconnected to local node after 0 seconds - happened
2023-10-17 13:22:23+0800: repmgrd_local_disconnect on node2, unable to connect to local node - happened
2023-10-17 13:22:23.552278+08: repmgrd_local_reconnect on node2, reconnected to local node after 0 seconds - happened


So I get to the postgresql side and its log reports:
2023-10-17 00:21:50.415 CST [2210264] LOG:  could not receive data from client: Connection reset by peer
2023-10-17 10:23:05.420 CST [2249257] LOG:  could not receive data from client: Connection reset by peer
2023-10-17 13:22:23.486 CST [2260546] LOG:  could not receive data from client: Connection reset by peer


I have set up keepalive feature in postgresql.conf to prevent router from cutting off TCP connection:

tcp_keepalives_idle = 20

tcp_keepalives_interval = 10

tcp_keepalives_count = 3


So do you have any idea on what went wrong? BTW, postgresql version is 15.4 while repmgr version is 5.4dev.



----------
From: Scott Ribe <scott_ribe@elevated-dev.com>
Date: Tue, Oct 17, 2023 at 9:39 PM
To: Zhaoxun Yan <yan.zhaoxun@gmail.com>
Cc: Pgsql-admin <pgsql-admin@lists.postgresql.org>


I have no idea if this is related to your problem, but...

I once had a connection timeout where a big institution was using Cisco routers, which charged ongoing license fees, tiered by how many connections they would support. And they configured them to recognize keepalive packets, and drop connections which only had keepalive packets for some length of time!

----------
From: Zhaoxun Yan <yan.zhaoxun@gmail.com>
Date: Wed, Oct 18, 2023 at 10:25 AM
To: Scott Ribe <scott_ribe@elevated-dev.com>
Cc: Pgsql-admin <pgsql-admin@lists.postgresql.org>


Hi Scott,
  To avoid the problem you mentioned, I have already changed the host address to 127.0.0.1, meaning 'localhost', and the connection is only on that machine, without via a router.


В списке pgsql-admin по дате отправления:

Предыдущее
От: Alexander Gesser
Дата:
Сообщение: connection timeout expired
Следующее
От: Tomek
Дата:
Сообщение: Re: Table health