I have searched fairly thoroughly and been unable to find a way to force
prompt client application session breaks when PostgreSQL
client-to-server transport fails.
I run a 7x24 PostgreSQL 9.1 "write-only" libpq client application
(solely INSERTs/COPYs running on Debian 7 "wheezy" OS) that communicates
with its PostgreSQL 9.0 DB server (Debian 6 "squeeze") via
less-than-perfect intercontinental TCP Internet/VPN transport. The
application has been running very reliably for over 4 years except for
communication breaks.
Unfortunately, in this environment, connectivity lapses of a minute or
two to an hour or two are common. To minimize the risk of data loss
when session recovery is attempted only AFTER the client queues data, I
want to promptly detect and attempt recovery of lost sessions even when
no transactions are pending. To this end, I have tried:
#define PGSQL_KEEPALIVE_QSECS "60"
char pstring[6];
snprintf(pstring, sizeof(pstring), "%i", cnt0->conf.pgsql_port);
PQconnectdbParams( (const char *[]) {"dbname", "host", "user", "password", "port", "sslmode",
"application_name","connect_timeout", "keepalives", "keepalives_idle", "keepalives_interval", "keepalives_count",
NULL}, (const char *[]) {cnt0->conf.pgsql_db, cnt0->conf.pgsql_host, cnt0->conf.pgsql_user,
cnt0->conf.pgsql_password,pstring, "disable", "motion", PGSQL_KEEPALIVE_QSECS, "1", PGSQL_KEEPALIVE_QSECS,
PGSQL_KEEPALIVE_QSECS,"3", NULL}, 0))
As a baseline comparison, I establish a psql session with an all-default
environment, break the VPN link, and then attempt a simple query (select
count(*) from ...). The query and psql session fail after about 17
minutes' wait. When testing the application -- even specifiying the
above connection parameters -- I get approximately the same 17 minute
timeout before a broken session is signalled at the application
(PQconsumeInput(conn); if (PQstatus(conn)!=PGSQL_CONNECTION_OK) ...)
when testing over the intentionally broken link. This is a far cry from
the maximum of 5 minutes I expected.
Based on postings elsewhere, I have also tried changing the relevant
Linux kernel defaults of:
/proc/sys/net/ipv4/tcp_keepalive_time=7200
/proc/sys/net/ipv4/tcp_keepalive_probes=9
/proc/sys/net/ipv4/tcp_keepalive_intvl=75
to:
/proc/sys/net/ipv4/tcp_keepalive_time=60
/proc/sys/net/ipv4/tcp_keepalive_probes=3
/proc/sys/net/ipv4/tcp_keepalive_intvl=15
... with no detectable effect; still a ca. 17 minute timeout. (Failure
of initial connection establishment IS indicated rapidly; ca. 20 sec.,
with or without any of the above measures, even connection_timeout=60.)
Any ideas how to achieve the keepalives as specified in
PQconnectdbParams when running on these platforms?
Thanks,
Bill Clay