Обсуждение: max_connections limit violation not showing in pg_stat_activity

Поиск
Список
Период
Сортировка

max_connections limit violation not showing in pg_stat_activity

От
Charles Clavadetscher
Дата:
Hello

We are using PostgreSQL 9.3.10 on RedHat (probably 6.x).

The database is hosted by an internal service provider and we have
superuser access to it over a PG client, e.g. psql, but not to the OS.
For that reason we only have access to the log files indirectly using
some of the built in system functions like pg_ls_dir, etc.

Among other thing the database is the backend for a web application that
expects a load of a some hundred users at a time (those are participans
to online surveys that we use for computing economic indicators and
access the system every month). The whole amount of people expected is
above 5000, but we don't expect a too high concurrent access to the
database. As mentioned a few hundreds at the beginning of the surveys.

To be sure that we won't have problems with the peak times we created a
load test using gatling that ramps up to 1000 users in 5 minutes in
bunches of 10. At the beginning we had problems with the web server
response that we were able to correct. Now we face problem with the
max_connections limit of PostgreSQL. Currently it is set to the default
of 100. We are going to look into it and either increase that limit or
consider connections pooling.

What bothers me however is that running a query on pg_stat_activity with
a watch of 1 seconds never shows any value higher than 37 of concurrent
active connections.

SELECT count(*) FROM pg_stat_activity; watch 1;

Due to that fact it took us quite a time to figure out that the
bottleneck had become the database. We discovered it after looking into
the log files (as mentioned above this is not very straightforward, in
particular because the logs tend to become quite huge).

I assume that the peaks of requests violating the limit happen between
two calls of the query. Is there a better way to keep track of this kind
of problems? I felt a bit weird not to be able to discover the issue sooner.

And what would be a reasonable strategy to deal with the problem at
hand? Increasing max_connections has repercussions on the configuration
of work_mem (if I remember well) or on the other hand on the amount of
physical memory that must be available on the system.

On Thursday we are going to have a meeting with our DB hosting provider
to discuss which improvement need to be made to meet the requirements of
our applications (the web application mentioned is not the only one
using the database, but is the only one where we expect such peaks).

So I'd be very grateful for advice on this subject.

Thank you.
Regards
Charles

--
Swiss PostgreSQL Users Group
c/o Charles Clavadetscher
Treasurer
Motorenstrasse 18
CH – 8005 Zürich

http://www.swisspug.org

+-----------------------+
|   ____  ______  ___   |
|  /    )/      \/   \  |
| (     / __    _\    ) |
|  \    (/ o)  ( o)   ) |
|   \_  (_  )   \ ) _/  |
|     \  /\_/    \)/    |
|      \/ <//|  |\\>    |
|           _|  |       |
|           \|_/        |
|                       |
| PostgreSQL 1996-2016  |
|  20 Years of Success  |
|                       |
+-----------------------+


Re: max_connections limit violation not showing in pg_stat_activity

От
Kevin Grittner
Дата:
On Tue, Nov 22, 2016 at 12:48 PM, Charles Clavadetscher
<clavadetscher@swisspug.org> wrote:

> We are using PostgreSQL 9.3.10 on RedHat (probably 6.x).

Is it possible to upgrade?  You are missing over a year's worth of
fixes for serious bugs and security vulnerabilities.

https://www.postgresql.org/support/versioning/

> Among other thing the database is the backend for a web application that
> expects a load of a some hundred users at a time (those are participans
> to online surveys that we use for computing economic indicators and
> access the system every month). The whole amount of people expected is
> above 5000, but we don't expect a too high concurrent access to the
> database. As mentioned a few hundreds at the beginning of the surveys.
>
> To be sure that we won't have problems with the peak times we created a
> load test using gatling that ramps up to 1000 users in 5 minutes in
> bunches of 10. At the beginning we had problems with the web server
> response that we were able to correct. Now we face problem with the
> max_connections limit of PostgreSQL. Currently it is set to the default
> of 100. We are going to look into it and either increase that limit or
> consider connections pooling.

On a web site with about 3000 active users, I found (through
adjusting the connection pool size on the production database and
monitoring performance) that we got best performance with a pool of
about 40 connections.  This was on a machine with 16 cores (never
count HT "threads" as cores), 512GB RAM, and a RAID with 40 drives
of spinning rust.

http://tbeitr.blogspot.com/2015/11/for-better-service-please-take-number.html

> What bothers me however is that running a query on pg_stat_activity with
> a watch of 1 seconds never shows any value higher than 37 of concurrent
> active connections.
>
> SELECT count(*) FROM pg_stat_activity; watch 1;

At the times when the resources are overloaded by more connections
than the resources can efficiently service -- well that's precisely
the time that a sleeping "monitoring" process is least likely to be
given a time slice to run.  If you can manage to get pgbadger to
run on your environment, and you turn on logging of connections and
disconnections, you will be able to get far more accurate
information.

> Increasing max_connections has repercussions on the configuration
> of work_mem (if I remember well)

Each connection can allocate one work_mem allocation per node which
requires a sort, hash, CTE, etc.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: max_connections limit violation not showing in pg_stat_activity

От
"Charles Clavadetscher"
Дата:
Hello Kevin

Thank you very much for your input. I appreciate it very much.

> -----Original Message-----
> From: Kevin Grittner [mailto:kgrittn@gmail.com]
> Sent: Dienstag, 22. November 2016 22:37
> To: Charles Clavadetscher <clavadetscher@swisspug.org>
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] max_connections limit violation not showing in pg_stat_activity
>
> On Tue, Nov 22, 2016 at 12:48 PM, Charles Clavadetscher <clavadetscher@swisspug.org> wrote:
>
> > We are using PostgreSQL 9.3.10 on RedHat (probably 6.x).
>
> Is it possible to upgrade?  You are missing over a year's worth of fixes for serious bugs and security
> vulnerabilities.

Yes. Actually it is foreseen to move to 9.6 (and RedHat 7.x).

> https://www.postgresql.org/support/versioning/
>
> > Among other thing the database is the backend for a web application
> > that expects a load of a some hundred users at a time (those are
> > participans to online surveys that we use for computing economic
> > indicators and access the system every month). The whole amount of
> > people expected is above 5000, but we don't expect a too high
> > concurrent access to the database. As mentioned a few hundreds at the beginning of the surveys.
> >
> > To be sure that we won't have problems with the peak times we created
> > a load test using gatling that ramps up to 1000 users in 5 minutes in
> > bunches of 10. At the beginning we had problems with the web server
> > response that we were able to correct. Now we face problem with the
> > max_connections limit of PostgreSQL. Currently it is set to the
> > default of 100. We are going to look into it and either increase that
> > limit or consider connections pooling.
>
> On a web site with about 3000 active users, I found (through adjusting the connection pool size on the production
> database and monitoring performance) that we got best performance with a pool of about 40 connections.  This was on
> a machine with 16 cores (never count HT "threads" as cores), 512GB RAM, and a RAID with 40 drives of spinning rust.

OK. I will have to check with our hosting people how many cores we have or can have on the new environment.
I have seen that there is pgBouncer and pgPool. Would you recommend one of those? AFAICS both are installed on the
clientside, so that we should be able to use them. 

> http://tbeitr.blogspot.com/2015/11/for-better-service-please-take-number.html
>
> > What bothers me however is that running a query on pg_stat_activity
> > with a watch of 1 seconds never shows any value higher than 37 of
> > concurrent active connections.
> >
> > SELECT count(*) FROM pg_stat_activity; watch 1;
>
> At the times when the resources are overloaded by more connections than the resources can efficiently service --
> well that's precisely the time that a sleeping "monitoring" process is least likely to be given a time slice to run.
> If you can manage to get pgbadger to run on your environment, and you turn on logging of connections and
> disconnections, you will be able to get far more accurate information.

Yes, it sounds reasonable. I assumed that this kind of measurements have a higher priority or reserved slots for them.
Inthose occasions is when they are most needed. 

And thank you for the hint to pgbadger. I will take a look into it, but an installation on the server completely
dependson our hosting service people. I am not sure this is feasible, but I can imagine an alternative scenario, using
alocal installation for tuning and then pass the parameters to the hosters for implementation. Might help. 

Regards
Charles

>
> > Increasing max_connections has repercussions on the configuration of
> > work_mem (if I remember well)
>
> Each connection can allocate one work_mem allocation per node which requires a sort, hash, CTE, etc.
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company



Re: max_connections limit violation not showing in pg_stat_activity

От
Kevin Grittner
Дата:
On Wed, Nov 23, 2016 at 4:43 AM, Charles Clavadetscher
<clavadetscher@swisspug.org> wrote:
> From: Kevin Grittner [mailto:kgrittn@gmail.com]

>> Is it possible to upgrade?  You are missing over a year's worth
>> of fixes for serious bugs and security vulnerabilities.
>
> Yes. Actually it is foreseen to move to 9.6 (and RedHat 7.x).
>
>> https://www.postgresql.org/support/versioning/

An upgrade from 9.3.x to 9.6.x is a major release upgrade, which
gets you new features and usually gets you improvements in
performance and scalability.  The 9.3 major release will be
supported for almost 2 more years, so I wasn't so concerned about
that as being on 9.3.10 when the latest bug fix version of 9.3 is
9.3.15.  To avoid hitting bugs that others have already hit and
reported, with fixes published, it is wise to try to upgrade to the
latest minor release fairly quickly.  If the minor release fixes a
serious security vulnerability, I think it is a good idea to update
within a day or two of release.

> I will have to check with our hosting people how many cores we
> have or can have on the new environment.  I have seen that there
> is pgBouncer and pgPool. Would you recommend one of those?

pgBouncer is more lightweight, so if you don't need any of the
features present only pgPool, I would go with pgBouncer.  Depending
on your application software environment, it may be even better to
use a pool built into the application development framework.  There
are several very good pools available to Java environments.  I can
personally attest to the quality of Apache dbcp, but others have
mentioned that they like other Java connection pools even better.
You might want to search the archives, or maybe someone will
mention the others again on this thread.

>>> SELECT count(*) FROM pg_stat_activity; watch 1;
>>
>> At the times when the resources are overloaded by more
>> connections than the resources can efficiently service -- well
>> that's precisely the time that a sleeping "monitoring" process
>> is least likely to be given a time slice to run. If you can
>> manage to get pgbadger to run on your environment, and you turn
>> on logging of connections and disconnections, you will be able
>> to get far more accurate information.
>
> Yes, it sounds reasonable. I assumed that this kind of
> measurements have a higher priority or reserved slots for them.
> In those occasions is when they are most needed.

There is no such feature in PostgreSQL.  It might be worthwhile,
although how that would be implemented is not obvious, short of a
sysadmin looking for the monitoring backend process and running
"nice" against it.

> And thank you for the hint to pgbadger. I will take a look into
> it, but an installation on the server completely depends on our
> hosting service people. I am not sure this is feasible, but I can
> imagine an alternative scenario, using a local installation for
> tuning and then pass the parameters to the hosters for
> implementation.

As long as you can control the PostgreSQL configuration (to set the
right logging options) and can retrieve the log files, you should
be able to use it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: max_connections limit violation not showing in pg_stat_activity

От
"Charles Clavadetscher"
Дата:
Hello Kevin

Getting back at this.

> -----Original Message-----
> From: Kevin Grittner [mailto:kgrittn@gmail.com]
> Sent: Mittwoch, 23. November 2016 17:04
> To: Charles Clavadetscher <clavadetscher@swisspug.org>
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] max_connections limit violation not showing in pg_stat_activity
>
> On Wed, Nov 23, 2016 at 4:43 AM, Charles Clavadetscher <clavadetscher@swisspug.org> wrote:
> > From: Kevin Grittner [mailto:kgrittn@gmail.com]
>
> >> Is it possible to upgrade?  You are missing over a year's worth of
> >> fixes for serious bugs and security vulnerabilities.
> >
> > Yes. Actually it is foreseen to move to 9.6 (and RedHat 7.x).
> >
> >> https://www.postgresql.org/support/versioning/
>
> An upgrade from 9.3.x to 9.6.x is a major release upgrade, which gets you new features and usually gets you
> improvements in performance and scalability.  The 9.3 major release will be supported for almost 2 more years, so I
> wasn't so concerned about that as being on 9.3.10 when the latest bug fix version of 9.3 is 9.3.15.  To avoid
> hitting bugs that others have already hit and reported, with fixes published, it is wise to try to upgrade to the
> latest minor release fairly quickly.  If the minor release fixes a serious security vulnerability, I think it is a
> good idea to update within a day or two of release.
>
> > I will have to check with our hosting people how many cores we have or
> > can have on the new environment.  I have seen that there is pgBouncer
> > and pgPool. Would you recommend one of those?
>
> pgBouncer is more lightweight, so if you don't need any of the features present only pgPool, I would go with
> pgBouncer.  Depending on your application software environment, it may be even better to use a pool built into the
> application development framework.  There are several very good pools available to Java environments.  I can
> personally attest to the quality of Apache dbcp, but others have mentioned that they like other Java connection
> pools even better.
> You might want to search the archives, or maybe someone will mention the others again on this thread.

Finally I set up pgbouncer and a simple first test with a somewhat heavy load (1000 users in 5 minutes corresponding to
atotal amount of 12000 http requests and 5000 DB requests) shows an incredible improvement in performance. Without the
poolerroughly a fifth of the calls died in a timeout. The remaining show an average response time of more than 8
seconds.With the pooler all requests went through without any error whatsoever and the mean response time dropped to 23
ms(the maximum being at 193 ms). At first I thought that I had some error in my simulation somewhere, but all 5000 DB
requestshave been correctly performed as I could check in a log table that I prepared for that specific purpose. There
isdefinetely not doubt about the beneficial effect of the connection pooler. 

Now I will take some time to better understand the configuration settings.

Maybe a question on that. I have following configuration entries:

pool_mode = session
server_reset_query = DISCARD ALL
max_client_conn = 100
default_pool_size = 60
min_pool_size = 5
max_db_connections = 60
max_user_connections = 60

The pooler only serves a single application and only has a single connection string with the same user for all calls
(thatbeing the application user defined for that specific web application). Our current DB has a max_connections of
100.I assumed that max_client_conn should match this entry. With max_db_connections and max_user_connections I try to
makesure that calls from this  application won't take up all resources (as it happened without the pooler). This seems
towork correctly. During the simulation other applications that also require the database but don't use the pooler,
worknormally, i.e. there are no noticeable effects. 

If you have some suggestion or see potential problems in these settings, that I am not aware of so far, feel free to
pointme to them. 

Thank you very much again for the excellent tips.
Bye
Charles

> >>> SELECT count(*) FROM pg_stat_activity; watch 1;
> >>
> >> At the times when the resources are overloaded by more connections
> >> than the resources can efficiently service -- well that's precisely
> >> the time that a sleeping "monitoring" process is least likely to be
> >> given a time slice to run. If you can manage to get pgbadger to run
> >> on your environment, and you turn on logging of connections and
> >> disconnections, you will be able to get far more accurate
> >> information.
> >
> > Yes, it sounds reasonable. I assumed that this kind of measurements
> > have a higher priority or reserved slots for them.
> > In those occasions is when they are most needed.
>
> There is no such feature in PostgreSQL.  It might be worthwhile, although how that would be implemented is not
> obvious, short of a sysadmin looking for the monitoring backend process and running "nice" against it.
>
> > And thank you for the hint to pgbadger. I will take a look into it,
> > but an installation on the server completely depends on our hosting
> > service people. I am not sure this is feasible, but I can imagine an
> > alternative scenario, using a local installation for tuning and then
> > pass the parameters to the hosters for implementation.
>
> As long as you can control the PostgreSQL configuration (to set the right logging options) and can retrieve the log
> files, you should be able to use it.
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company