Обсуждение: connection timeout with psycopg2

Поиск
Список
Период
Сортировка

connection timeout with psycopg2

От
Vicente Juan Tomas Monserrat
Дата:

Hi there,

I have been testing out the following architecture for PostgreSQL HA.

             +---------+      +-----+   VIP   +----+      |     +---------+    |      |                    |
+------v-------+     +------v-------+
|  pgBouncer   |     |  pgBouncer   |
|      +       |     |      +       |
| keepalived   |     |  keepalived  |
+------+-------+     +------+-------+      |                    |      |                    |      |                    |
+------v-------+     +------v-------+
|              |     |              |
|   HAProxy    |     |   HAProxy    |
|              |     |              |
+------+-------+     +------+-------+      |                    |      +--------------------+      |                    |      |                    | +----v----+          +----v----+ |         |          |         | |         |          |         | |  PG01   |          |  PG02   | |         |          |         | |(patroni)|          |(patroni)| |         |          |         | +---------+          +---------+

I'm using this python script for checking the failover events in pgBouncer, HAProxy and Patroni (PostgreSQL HA solution).

#! /usr/bin/env python
# -*- coding: utf-8 -*-
# vim:fenc=utf-8

import psycopg2

ISOLEVEL = psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT

import time
from datetime import datetime

user = 'postgres'
password = 'secretpassword'

host = 'localhost'
port = '6432'
database = 'test'
LIMIT_RETRIES = 10

class DB():   def __init__(self, user, password, host, port, database, reconnect):       self.user = user       self.password = password       self.host = host       self.port = port       self.database = database       self._connection = None       self._cursor = None       self.reconnect = reconnect       self.init()
   def connect(self,retry_counter=0):       if not self._connection:           try:               self._connection = psycopg2.connect(user = self.user, password = self.password, host = self.host, port = self.port, database = self.database, connect_timeout = 3)               retry_counter = 0               self._connection.autocommit = True               return self._connection           except psycopg2.OperationalError as error:               if not self.reconnect or retry_counter >= LIMIT_RETRIES:                   raise error               else:                   retry_counter += 1                   print("got error {}. reconnecting {}".format(str(error).strip(), retry_counter))                   time.sleep(5)                   self.connect(retry_counter)           except (Exception, psycopg2.Error) as error:               raise error
   def cursor(self):       if not self._cursor or self._cursor.closed:           if not self._connection:               self.connect()           self._cursor = self._connection.cursor()           return self._cursor
   def execute(self, query, retry_counter=0):       try:           self._cursor.execute(query)           retry_counter = 0       except (psycopg2.DatabaseError, psycopg2.OperationalError) as error:           if retry_counter >= LIMIT_RETRIES:               raise error           else:               retry_counter += 1               print("got error {}. retrying {}".format(str(error).strip(), retry_counter))               time.sleep(1)               self.reset()               self.execute(query, retry_counter)       except (Exception, psycopg2.Error) as error:           raise error
   def reset(self):       self.close()       self.connect()       self.cursor()
   def close(self):       if self._connection:           if self._cursor:               self._cursor.close()           self._connection.close()           print("PostgreSQL connection is closed")       self._connection = None       self._cursor = None
   def init(self):       self.connect()       self.cursor()

db = DB(user=user, password=password, host=host, port=port, database=database, reconnect=True)
db.execute("create table if not exists t1 (id integer);")
i = 0
while True:   db.execute("insert into t1(id) values(1);")   if i % 100 == 0:       print("%s: %d" % (datetime.now(), i))   i = i+1

When running this python script against the pgBouncer VIP it keeps inserting data into the database. Then I stop one of the HAProxy service (where the VIP lives) the connection it hangs and never goes on. The VIP is on the other node but the client/app it doesn't notice and it keeps waiting for 5 minutes and finally continues. I've been looking for some default value of 5min with no luck.

$ python insert.py 
2019-10-15 10:01:51.817585: 0
2019-10-15 10:01:51.901091: 100
2019-10-15 10:01:52.031583: 200
2019-10-15 10:01:52.126565: 300
2019-10-15 10:01:52.216502: 400
2019-10-15 10:01:52.307157: 500
2019-10-15 10:01:52.400867: 600
2019-10-15 10:01:52.497239: 700
2019-10-15 10:01:52.655689: 800
2019-10-15 10:01:52.777883: 900
got error server conn crashed?             <<<<<<<<<<<<<<<<< HAProxy stopped manually to force the VIP to move to the other node
server closed the connection unexpectedly       This probably means the server terminated abnormally       before or while processing the request.. retrying 1
PostgreSQL connection is closed
^C^C^C^C^C                                <<<<<<<<<<<<<<<<<< The connection gets stuck (kill PID)

I've tried exactly the same code logic in Java (using PostgreSQL JDBC) and dotnet core (using Npgsql) works fine with specifying this parameters socketTimeout (Java) and 'Command Timeout' (dotnet) respectively.

$ dotnet run
connection initialized
2019-10-15T08:27:28.843 0
2019-10-15T08:27:30.205 100
2019-10-15T08:27:31.566 200
got error: Exception while reading from stream. Retrying 1
connection closed
connection initialized
connection reset
2019-10-15T08:27:42.076 300             <<<<<<<<<<<< the app notices something wrong with the existing connection and does a reset
2019-10-15T08:27:43.461 400
2019-10-15T08:27:44.843 500
2019-10-15T08:27:46.244 600
2019-10-15T08:27:47.637 700
2019-10-15T08:27:49.031 800
^C

In python and using psycopg2 (v2.8.3) I've not been able to inform in any way the application to reset the connection, retry and keep going on.

As the psycopg2 it's a wrapper for libpq, I've seen that in libpq (for postgresql 12) there is a new option called tcp_user_timeout. Note this options doesn't exist in previous versions of libpq.
So I've tried to install psycopg2 (instead of psycopg2-binary) with the libpq (v12) installed, so the psycopg2 is build against libpq (v12). Even so, it's not working as expected.

Environment:
Ubuntu 16.04 LTS
PostgreSQL 9.6.15
Patroni 1.6.0
pgBouncer 1.11.0
keepalived 2.0.18
HAProxy 1.6.3

I've also tried to catch different types of exceptions with no luck.

I would appreciate any guidance on this matter. I can give you more info if needed.

Thank!

Re: connection timeout with psycopg2

От
Adrian Klaver
Дата:
On 10/16/19 2:29 AM, Vicente Juan Tomas Monserrat wrote:
> Hi there,
> 
> I have been testing out the following architecture for PostgreSQL HA.
> 
> |+---------+ +-----+ VIP +----+ | +---------+ | | | +------v-------+ 
> +------v-------+ | pgBouncer | | pgBouncer | | + | | + | | keepalived | 
> | keepalived | +------+-------+ +------+-------+ | | | | | | 
> +------v-------+ +------v-------+ | | | | | HAProxy | | HAProxy | | | | 
> | +------+-------+ +------+-------+ | | +--------------------+ | | | | 
> +----v----+ +----v----+ | | | | | | | | | PG01 | | PG02 | | | | | 
> |(patroni)| |(patroni)| | | | | +---------+ +---------+ |
> 
> I'm using this python script for checking the failover events in 
> pgBouncer, HAProxy and Patroni (PostgreSQL HA solution).
> 
> |#! /usr/bin/env python # -*- coding: utf-8 -*- # vim:fenc=utf-8 import 
> psycopg2 ISOLEVEL = psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT 
> import time from datetime import datetime user = 'postgres' password = 
> 'secretpassword' host = 'localhost' port = '6432' database = 'test' 
> LIMIT_RETRIES = 10 class DB(): def __init__(self, user, password, host, 
> port, database, reconnect): self.user = user self.password = password 
> self.host = host self.port = port self.database = database 
> self._connection = None self._cursor = None self.reconnect = reconnect 
> self.init() def connect(self,retry_counter=0): if not self._connection: 
> try: self._connection = psycopg2.connect(user = self.user, password = 
> self.password, host = self.host, port = self.port, database = 
> self.database, connect_timeout = 3) retry_counter = 0 
> self._connection.autocommit = True return self._connection except 
> psycopg2.OperationalError as error: if not self.reconnect or 
> retry_counter >= LIMIT_RETRIES: raise error else: retry_counter += 1 
> print("got error {}. reconnecting {}".format(str(error).strip(), 
> retry_counter)) time.sleep(5) self.connect(retry_counter) except 
> (Exception, psycopg2.Error) as error: raise error def cursor(self): if 
> not self._cursor or self._cursor.closed: if not self._connection: 
> self.connect() self._cursor = self._connection.cursor() return 
> self._cursor def execute(self, query, retry_counter=0): try: 
> self._cursor.execute(query) retry_counter = 0 except 
> (psycopg2.DatabaseError, psycopg2.OperationalError) as error: if 
> retry_counter >= LIMIT_RETRIES: raise error else: retry_counter += 1 
> print("got error {}. retrying {}".format(str(error).strip(), 
> retry_counter)) time.sleep(1) self.reset() self.execute(query, 
> retry_counter) except (Exception, psycopg2.Error) as error: raise error 
> def reset(self): self.close() self.connect() self.cursor() def 
> close(self): if self._connection: if self._cursor: self._cursor.close() 
> self._connection.close() print("PostgreSQL connection is closed") 
> self._connection = None self._cursor = None def init(self): 
> self.connect() self.cursor() db = DB(user=user, password=password, 
> host=host, port=port, database=database, reconnect=True) 
> db.execute("create table if not exists t1 (id integer);") i = 0 while 
> True: db.execute("insert into t1(id) values(1);") if i % 100 == 0: 
> print("%s: %d" % (datetime.now(), i)) i = i+1 |
> 
> When running this python script against the pgBouncer VIP it keeps 
> inserting data into the database. Then I stop one of the HAProxy 
> service (where the VIP lives) the connection it hangs and never goes on. 
> The VIP is on the other node but the client/app it doesn't notice and it 
> keeps waiting for 5 minutes and finally continues. I've been looking for 
> some default value of 5min with no luck.

Observations and comments:
1) I would point out there is a Psycopg list:

https://www.postgresql.org/list/psycopg/

2) I am not sure where the 5 minutes comes in. I see LIMIT_RETRIES = 10 
and a sleep of 5 sec between retries.

3) Where did this:

"got error server conn crashed?  "

come from? I don't see that in the code.




> 
> |$ python insert.py 2019-10-15 10:01:51.817585: 0 2019-10-15 
> 10:01:51.901091: 100 2019-10-15 10:01:52.031583: 200 2019-10-15 
> 10:01:52.126565: 300 2019-10-15 10:01:52.216502: 400 2019-10-15 
> 10:01:52.307157: 500 2019-10-15 10:01:52.400867: 600 2019-10-15 
> 10:01:52.497239: 700 2019-10-15 10:01:52.655689: 800 2019-10-15 
> 10:01:52.777883: 900 got error server conn crashed? <<<<<<<<<<<<<<<<< 
> HAProxy stopped manually to force the VIP to move to the other node 
> server closed the connection unexpectedly This probably means the server 
> terminated abnormally before or while processing the request.. retrying 
> 1 PostgreSQL connection is closed ^C^C^C^C^C <<<<<<<<<<<<<<<<<< The 
> connection gets stuck (kill PID) |
> 
> I've tried exactly the same code logic in Java (using PostgreSQL JDBC) 
> and dotnet core (using Npgsql) works fine with specifying this 
> parameters socketTimeout (Java) and 'Command Timeout' (dotnet) respectively.
> 
> |$ dotnet run connection initialized 2019-10-15T08:27:28.843 0 
> 2019-10-15T08:27:30.205 100 2019-10-15T08:27:31.566 200 got error: 
> Exception while reading from stream. Retrying 1 connection closed 
> connection initialized connection reset 2019-10-15T08:27:42.076 300 
> <<<<<<<<<<<< the app notices something wrong with the existing 
> connection and does a reset 2019-10-15T08:27:43.461 400 
> 2019-10-15T08:27:44.843 500 2019-10-15T08:27:46.244 600 
> 2019-10-15T08:27:47.637 700 2019-10-15T08:27:49.031 800 ^C |
> 
> In python and using psycopg2 (v2.8.3) I've not been able to inform in 
> any way the application to reset the connection, retry and keep going on.
> 
> As the psycopg2 it's a wrapper for libpq, I've seen that in libpq (for 
> postgresql 12) there is a new option called |tcp_user_timeout|. Note 
> this options doesn't exist in previous versions of libpq.
> So I've tried to install psycopg2 (instead of psycopg2-binary) with the 
> libpq (v12) installed, so the psycopg2 is build against libpq (v12). 
> Even so, it's not working as expected.
> 
> Environment:
> Ubuntu 16.04 LTS
> PostgreSQL 9.6.15
> Patroni 1.6.0
> pgBouncer 1.11.0
> keepalived 2.0.18
> HAProxy 1.6.3
> 
> I've also tried to catch different types of exceptions with no luck.
> 
> I would appreciate any guidance on this matter. I can give you more info 
> if needed.
> 
> Thank!
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com