Обсуждение: How to keep queries low latency as concurrency increases
As I increase concurrency I'm experiencing what I believe are too slow queries given the minuscule amount of data in my tables. I have 20 Django worker processes and use ab to generate 3000 requests to a particular URL which is doing some read only queries. I ran this with ab concurrency level set to 4, 12 and 20. With some aggregation using pgbadger here are the results: concurrency 4 Number of queries: 39,046 Total query duration: 4.255s Slowest query: 33ms Total taken to execute slowest query 6000 times: 1.633s Number of queries taking over 100ms: 0 Number of queries taking over 50ms: 0 Number of queries taking over 25ms: 1 Number of queries taking over 10ms: 7 concurrency 12 Number of queries: 39,035 Total query duration: 7.435s Slowest query: 174ms Total taken to execute slowest query 6000 times: 2.617s Number of queries taking over 100ms: 2 Number of queries taking over 50ms: 4 Number of queries taking over 25ms: 17 Number of queries taking over 10ms: 99 concurrency 20 Number of queries: 39,043 Total query duration: 11.614s Slowest query: 198ms Total taken to execute slowest query 6000 times: 4.286s Number of queries taking over 100ms: 5 Number of queries taking over 50ms: 19 Number of queries taking over 25ms: 52 Number of queries taking over 10ms: 255 All tests have 0 INSERTs, 0 UPDATEs, 0 DELETEs, aprox. 18000 SELECTs and 21000 OTHERs (Django's ORM sends a lot of SET TIME ZONE, SET default_transaction_isolation TO 'READ committed'; etc) The 3 queries that take longest in total are: SELECT "django_site"."id", "django_site"."domain", "django_site"."name", "vwf_customsite"."site_ptr_id", "vwf_customsite"."geo_reference_id", "vwf_customsite"."friendly_domain", "vwf_customsite"."ws_machine", "vwf_customsite"."public", "vwf_customsite"."user_limit", "vwf_customsite"."hidden_login_and_registration", "vwf_customsite"."logo", "vwf_customsite"."LANGUAGE", "vwf_customsite"."ga_tracker_id", "vwf_customsite"."always_running", "vwf_customsite"."deleted", "vwf_customsite"."version", "vwf_customsite"."contact_email" FROM "vwf_customsite" INNER JOIN "django_site" ON ( "vwf_customsite"."site_ptr_id" = "django_site"."id" ) WHERE "vwf_customsite"."site_ptr_id" = 0; SELECT "vwf_plugin"."id", "vwf_plugin"."name", "vwf_plugin"."site_id", "vwf_plugin"."enabled" FROM "vwf_plugin" WHERE ( "vwf_plugin"."site_id" = 0 AND "vwf_plugin"."name" = '' ) ; SELECT "django_site"."id", "django_site"."domain", "django_site"."name" FROM "django_site" WHERE "django_site"."domain" = ''; The tables are extremely small: django_site has 8 rows, vwf_customsite has 7 and vwf_plugin 43. My intuition would say that for these read only queries on tables this small no query should take more than 5 ms even for a concurrency level of 20 and that performance shouldn't degrade at all when going from 4 to 20 concurrent ab requests. The CPUs are also used only about 10% so there should be plenty of capacity for more concurrency. The numbers above show a different situation though. The average for the slowest query stays under 1ms but it grows when increasing concurrency and there are spikes that really take too long IMO. Am I right that it should be possible to do better and if so how? Thanks a lot for any ideas or insights! More details about my setup: The schemas: Table "public.django_site" Column | Type | Modifiers --------+------------------------+---------------------------------------------------------- id | integer | not null default nextval('django_site_id_seq'::regclass) domain | character varying(100) | not null name | character varying(50) | not null Indexes: "django_site_pkey" PRIMARY KEY, btree (id) Referenced by: <snip list of 25 tables> Table "public.vwf_customsite" Column | Type | Modifiers -------------------------------+------------------------+----------- site_ptr_id | integer | not null geo_reference_id | integer | friendly_domain | character varying(100) | not null public | boolean | not null logo | character varying(100) | language | character varying(2) | not null ga_tracker_id | character varying(16) | not null version | character varying(100) | not null contact_email | character varying(254) | not null always_running | boolean | not null deleted | boolean | not null ws_machine | character varying(100) | not null user_limit | integer | not null hidden_login_and_registration | boolean | not null Indexes: "vwf_customsite_pkey" PRIMARY KEY, btree (site_ptr_id) "vwf_customsite_geo_reference_id" btree (geo_reference_id) Foreign-key constraints: "geo_reference_id_refs_id_488579c58f2d1a89" FOREIGN KEY (geo_reference_id) REFERENCES geo_reference_georeference(id) DEFERRABLE INITIALLY DEFERRED "site_ptr_id_refs_id_712ff223c9517f55" FOREIGN KEY (site_ptr_id) REFERENCES django_site(id) DEFERRABLE INITIALLY DEFERRED Referenced by: <snip list of 1 table> Table "public.vwf_plugin" Column | Type | Modifiers ---------+------------------------+--------------------------------------------------------- id | integer | not null default nextval('vwf_plugin_id_seq'::regclass) name | character varying(255) | not null site_id | integer | not null enabled | boolean | not null default false Indexes: "vwf_plugin_pkey" PRIMARY KEY, btree (id) "vwf_plugin_site_id" btree (site_id) Foreign-key constraints: "site_id_refs_id_4ac2846d79527bae" FOREIGN KEY (site_id) REFERENCES django_site(id) DEFERRABLE INITIALLY DEFERRED Hardware: Virtual machine running on top of VMWare 4 cores, Intel(R) Xeon(R) CPU E5645 @ 2.40GHz 4GB of RAM Disk that is virtual enough that I have no idea what it is, I know that there's some big storage shared between multiple virtual machines. Filesystem is ext4 with default mount options. I can imagine IO performance is not great for this machine, however, for the readonly queries and the very small tables above I would expect everything to be cached in memory and the disk not to matter. Ubuntu 12.04 with Postgres installed from Ubuntu's packages pgbouncer 1.4.2 installed from Ubuntu's packages on the same machine as Postgres. Django connects via TCP/IP to pgbouncer (it does one connection and one transaction per request) and pgbouncer keeps connections open to Postgres via Unix socket. The Python client is self compiled psycopg2-2.4.5. uname -a Linux wcea014.virtuocity.eu 3.2.0-32-generic #51-Ubuntu SMP Wed Sep 26 21:33:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux Non default settings name | current_setting ----------------------------+------------------------------------------------------------------------------------------------------------ version | PostgreSQL 9.1.6 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit client_encoding | UTF8 effective_cache_size | 1000MB external_pid_file | /var/run/postgresql/9.1-main.pid lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 log_checkpoints | on log_connections | on log_destination | stderr log_directory | /var/log/postgresql log_disconnections | on log_filename | postgresql-%Y-%m-%d-concTODO.log log_line_prefix | %t [%p]: [%l-1] log_lock_waits | on log_min_duration_statement | 0 log_rotation_size | 0 log_temp_files | 0 logging_collector | on maintenance_work_mem | 400MB max_connections | 100 max_stack_depth | 2MB port | 2345 random_page_cost | 2 server_encoding | UTF8 shared_buffers | 800MB ssl | on TimeZone | localtime unix_socket_directory | /var/run/postgresql wal_buffers | 16MB work_mem | 10MB
Catalin Iacob wrote: > Hardware: > Virtual machine running on top of VMWare > 4 cores, Intel(R) Xeon(R) CPU E5645 @ 2.40GHz > 4GB of RAM You should carefully test transaction-based pools limited to around 8 DB connections. Experiment with different size limits. http://wiki.postgresql.org/wiki/Number_Of_Database_Connections > Disk that is virtual enough that I have no idea what it is, I know > that there's some big storage shared between multiple virtual > machines. Filesystem is ext4 with default mount options. Can you change to noatime? > pgbouncer 1.4.2 installed from Ubuntu's packages on the same > machine as Postgres. Django connects via TCP/IP to pgbouncer (it > does one connection and one transaction per request) and pgbouncer > keeps connections open to Postgres via Unix socket. The Python > client is self compiled psycopg2-2.4.5. Is there a good transaction-based connection pooler in Python? You're better off with a good pool built in to the client application than with a good pool running as a separate process between the client and the database, IMO. > random_page_cost | 2 For fully cached databases I recommend random_page_cost = 1, and I always recommend cpu_tuple_cost = 0.03. -Kevin
On 10/30/2012 06:55 AM, Kevin Grittner wrote: > Is there a good transaction-based connection pooler in Python? > You're better off with a good pool built in to the client application > than with a good pool running as a separate process between the > client and the database, IMO. Could you explain this a little more? My experience is almost always the exact opposite, especially in large clusters that may have dozens of servers all hitting the same database. A centralized pool has much less duplication and can serve from a smaller pool than having 12 servers each have 25 connections reserved in their own private pool or something. I mean... a pool is basically a proxy server. I don't have 12 individual proxy servers for 12 webservers. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
Shaun Thomas wrote: > On 10/30/2012 06:55 AM, Kevin Grittner wrote: >> Is there a good transaction-based connection pooler in Python? >> You're better off with a good pool built in to the client >> application than with a good pool running as a separate process >> between the client and the database, IMO. > > Could you explain this a little more? My experience is almost > always the exact opposite, especially in large clusters that may > have dozens of servers all hitting the same database. A > centralized pool has much less duplication and can serve from a > smaller pool than having 12 servers each have 25 connections > reserved in their own private pool or something. > > I mean... a pool is basically a proxy server. I don't have 12 > individual proxy servers for 12 webservers. Sure, if you have multiple web servers and they are not routing their database requests through a common "model" layer, an external pooler would make sense. Most of the time I've dealt either with one web server or multiple servers routing requests at the transaction level to a single JVM which ran the logic of the transaction -- either of which is a good place to have a connection pool. A dozen different JVMs all making JDBC requests does kind of beg for an external layer to concentrate the requests; if it isn't something that's running the transaction layer, a connection pooler there would be good. -Kevin
On Mon, Oct 29, 2012 at 5:11 PM, Catalin Iacob <iacobcatalin@gmail.com> wrote: > pgbouncer 1.4.2 installed from Ubuntu's packages on the same machine > as Postgres. Django connects via TCP/IP to pgbouncer (it does one > connection and one transaction per request) and pgbouncer keeps > connections open to Postgres via Unix socket. Isn't pgbouncer single-threaded? If you hitting it with tiny queries as fast as possible from 20 connections, I would think that it would become the bottleneck. Cheers, Jeff
Jeff / Catalin -- Jeff Janes wrote: >On Mon, Oct 29, 2012 at 5:11 PM, Catalin Iacob <iacobcatalin@gmail.com> wrote: > >> pgbouncer 1.4.2 installed from Ubuntu's packages on the same machine >> as Postgres. Django connects via TCP/IP to pgbouncer (it does one >> connection and one transaction per request) and pgbouncer keeps >> connections open to Postgres via Unix socket. > >Isn't pgbouncer single-threaded? > >If you hitting it with tiny queries as fast as possible from 20 >connections, I would think that it would become the bottleneck. > >Cheers, > I'm sure pgbouncer has some threshold where it breaks down, but we have servers (postgres 8.4 and 9.1) with connections fromruntime (fed via haproxy) to pgbouncer that routinely have tens of thousands of connections in but only 40-70 postgresconnections to the postgres cluster itself. Mix of queries but most are simple. Typically a few thousand queriesa second to the readonly boxes, about the same to a beefier read / write master. This is a slightly old pgbouncer at that ... used is a fairly basic mode. Greg Williamson
On Tue, Oct 30, 2012 at 4:11 PM, Greg Williamson <gwilliamson39@yahoo.com> wrote: > Jeff / Catalin -- > > Jeff Janes wrote: > >>On Mon, Oct 29, 2012 at 5:11 PM, Catalin Iacob <iacobcatalin@gmail.com> wrote: >> >>> pgbouncer 1.4.2 installed from Ubuntu's packages on the same machine >>> as Postgres. Django connects via TCP/IP to pgbouncer (it does one >>> connection and one transaction per request) and pgbouncer keeps >>> connections open to Postgres via Unix socket. >> >>Isn't pgbouncer single-threaded? >> >>If you hitting it with tiny queries as fast as possible from 20 >>connections, I would think that it would become the bottleneck. >> >>Cheers, >> > > > I'm sure pgbouncer has some threshold where it breaks down, but we have servers (postgres 8.4 and 9.1) with connectionsfrom runtime (fed via haproxy) to pgbouncer that routinely have tens of thousands of connections in but only 40-70postgres connections to the postgres cluster itself. Mix of queries but most are simple. Typically a few thousand queriesa second to the readonly boxes, about the same to a beefier read / write master. > > This is a slightly old pgbouncer at that ... used is a fairly basic mode. I've used pgbouncer in two different environments now with thousands of connections and hundreds upon hundreds of queries per second and it has yet to be a bottleneck in either place as well.
On Tue, Oct 30, 2012 at 3:16 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Tue, Oct 30, 2012 at 4:11 PM, Greg Williamson > <gwilliamson39@yahoo.com> wrote: >> Jeff / Catalin -- >> >> Jeff Janes wrote: >> >>>On Mon, Oct 29, 2012 at 5:11 PM, Catalin Iacob <iacobcatalin@gmail.com> wrote: >>> >>>> pgbouncer 1.4.2 installed from Ubuntu's packages on the same machine >>>> as Postgres. Django connects via TCP/IP to pgbouncer (it does one >>>> connection and one transaction per request) and pgbouncer keeps >>>> connections open to Postgres via Unix socket. >>> >>>Isn't pgbouncer single-threaded? >>> >>>If you hitting it with tiny queries as fast as possible from 20 >>>connections, I would think that it would become the bottleneck. >>> >>>Cheers, >>> >> >> >> I'm sure pgbouncer has some threshold where it breaks down, but we have servers (postgres 8.4 and 9.1) with connectionsfrom runtime (fed via haproxy) to pgbouncer that routinely have tens of thousands of connections in but only 40-70postgres connections to the postgres cluster itself. Mix of queries but most are simple. Typically a few thousand queriesa second to the readonly boxes, about the same to a beefier read / write master. >> >> This is a slightly old pgbouncer at that ... used is a fairly basic mode. > > I've used pgbouncer in two different environments now with thousands > of connections and hundreds upon hundreds of queries per second and it > has yet to be a bottleneck in either place as well. The original poster has over 9000 queries per second in his best case, so I think that that is at the upper range of your experience. Using "pgbench -S" type workload, pgbouncer is definitely a bottleneck (1.7 fold slower at -c4 -j4 on a 4 CPU machine, and using -f with a dummy statement of "select 1;" it is 3 fold slower than going directly to the server. As -c increases, pgbouncer actually falls off faster than direct connections do up through at least -c20 -j20). Of course with your thousands of connections, direct connections are probably not feasible (and with that many connections, most of them are probably idle most of the time, pgbouncer's strength) Anyway, opening and closing connections to pgbouncer is far less costly than opening them directly to psql, but still very expensive compared to not doing so. The original poster should see if he can avoid that. Cheers, Jeff
On Tue, Oct 30, 2012 at 4:58 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Mon, Oct 29, 2012 at 5:11 PM, Catalin Iacob <iacobcatalin@gmail.com> wrote: > >> pgbouncer 1.4.2 installed from Ubuntu's packages on the same machine >> as Postgres. Django connects via TCP/IP to pgbouncer (it does one >> connection and one transaction per request) and pgbouncer keeps >> connections open to Postgres via Unix socket. > > Isn't pgbouncer single-threaded? > > If you hitting it with tiny queries as fast as possible from 20 > connections, I would think that it would become the bottleneck. Single threaded asynchronous servers are known to scale better for this type of workload than multi-threaded systems because you don't have to do locking and context switching. By 'for this type of workload', I mean workloads where most of the real work done is i/o -- pgbouncer as it's just routing data between network sockets is basically a textbook case for single threaded server. stunnel, by comparison, which has non-triival amounts of non i/o work going on, is more suited for threads. It also has severe scaling limits relative to pgbouncer. pgbouncer is an absolute marvel and should be standard kit in any case you're concerned about server scaling in terms of number of active connections to the database. I'm in the camp that application side connection pools are junk and should be avoided when possible. merlin
On Wed, Oct 31, 2012 at 11:39 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Tue, Oct 30, 2012 at 4:58 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> On Mon, Oct 29, 2012 at 5:11 PM, Catalin Iacob <iacobcatalin@gmail.com> wrote: >> >>> pgbouncer 1.4.2 installed from Ubuntu's packages on the same machine >>> as Postgres. Django connects via TCP/IP to pgbouncer (it does one >>> connection and one transaction per request) and pgbouncer keeps >>> connections open to Postgres via Unix socket. >> >> Isn't pgbouncer single-threaded? >> >> If you hitting it with tiny queries as fast as possible from 20 >> connections, I would think that it would become the bottleneck. > > Single threaded asynchronous servers are known to scale better for > this type of workload than multi-threaded systems because you don't > have to do locking and context switching. How much locking would there be in what pgbouncer does? On a 4 CPU machine, if I run pgbench -c10 -j10 with dummy queries (like "select 1;" or "set timezone...") against 2 instances of pgbouncer, I get nearly twice the throughput as if I use only one instance. A rather odd workload, maybe, but it does seem to be similar to the one that started this thread. > pgbouncer is an absolute marvel and should be standard kit in any case > you're concerned about server scaling in terms of number of active > connections to the database. I'm in the camp that application side > connection pools are junk and should be avoided when possible. I have nothing against pgbouncer, but it is not without consequences. Cheers, Jeff
On Sat, Nov 3, 2012 at 6:53 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Wed, Oct 31, 2012 at 11:39 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> On Tue, Oct 30, 2012 at 4:58 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >>> On Mon, Oct 29, 2012 at 5:11 PM, Catalin Iacob <iacobcatalin@gmail.com> wrote: >>> >>>> pgbouncer 1.4.2 installed from Ubuntu's packages on the same machine >>>> as Postgres. Django connects via TCP/IP to pgbouncer (it does one >>>> connection and one transaction per request) and pgbouncer keeps >>>> connections open to Postgres via Unix socket. >>> >>> Isn't pgbouncer single-threaded? >>> >>> If you hitting it with tiny queries as fast as possible from 20 >>> connections, I would think that it would become the bottleneck. >> >> Single threaded asynchronous servers are known to scale better for >> this type of workload than multi-threaded systems because you don't >> have to do locking and context switching. > > How much locking would there be in what pgbouncer does? > > On a 4 CPU machine, if I run pgbench -c10 -j10 with dummy queries > (like "select 1;" or "set timezone...") against 2 instances of > pgbouncer, I get nearly twice the throughput as if I use only one > instance. > > A rather odd workload, maybe, but it does seem to be similar to the > one that started this thread. > > >> pgbouncer is an absolute marvel and should be standard kit in any case >> you're concerned about server scaling in terms of number of active >> connections to the database. I'm in the camp that application side >> connection pools are junk and should be avoided when possible. > > I have nothing against pgbouncer, but it is not without consequences. agreed -- also, I was curious and independently verified you results. pgbouncer doesn't lock -- if you strace it, it just goes epoll_wait, recv_from, send_to endlessly while under heavy load from pgbench. This suggests that the bottleneck *is* pgbouncer, at least in some cases. It's hard to believe all the userland copying is causing that, but I guess that must be the case. merlin
On Sun, Nov 4, 2012 at 1:53 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > On a 4 CPU machine, if I run pgbench -c10 -j10 with dummy queries > (like "select 1;" or "set timezone...") against 2 instances of > pgbouncer, I get nearly twice the throughput as if I use only one > instance. > > A rather odd workload, maybe, but it does seem to be similar to the > one that started this thread. Every-connection-is-busy is pessimal workload for pgbouncer, as it has nothing useful to contribute to setup, just overhead. -- marko
On Mon, Nov 5, 2012 at 2:58 PM, Marko Kreen <markokr@gmail.com> wrote: > On Sun, Nov 4, 2012 at 1:53 AM, Jeff Janes <jeff.janes@gmail.com> wrote: >> On a 4 CPU machine, if I run pgbench -c10 -j10 with dummy queries >> (like "select 1;" or "set timezone...") against 2 instances of >> pgbouncer, I get nearly twice the throughput as if I use only one >> instance. >> >> A rather odd workload, maybe, but it does seem to be similar to the >> one that started this thread. > > Every-connection-is-busy is pessimal workload for pgbouncer, > as it has nothing useful to contribute to setup, just overhead. It still has something to contribute if connections are made and broken too often (pgbench -C type workload), as seems to be the case here. If he can get an application-side pooler (or perhaps just a change in configuration) such that the connections are not made and broken so often, then removing pgbouncer from the loop would probably be a win. Cheers, Jeff
On Tue, Nov 6, 2012 at 1:31 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Mon, Nov 5, 2012 at 2:58 PM, Marko Kreen <markokr@gmail.com> wrote: >> On Sun, Nov 4, 2012 at 1:53 AM, Jeff Janes <jeff.janes@gmail.com> wrote: >>> On a 4 CPU machine, if I run pgbench -c10 -j10 with dummy queries >>> (like "select 1;" or "set timezone...") against 2 instances of >>> pgbouncer, I get nearly twice the throughput as if I use only one >>> instance. >>> >>> A rather odd workload, maybe, but it does seem to be similar to the >>> one that started this thread. >> >> Every-connection-is-busy is pessimal workload for pgbouncer, >> as it has nothing useful to contribute to setup, just overhead. > > It still has something to contribute if connections are made and > broken too often (pgbench -C type workload), as seems to be the case > here. I did not notice -C in your message above. In such case, in a practical, non-pgbench workload, you should move pgbouncer to same machine as app, so any overhead is just CPU, spread over all app instances, and does not include network latency. > If he can get an application-side pooler (or perhaps just a change in > configuration) such that the connections are not made and broken so > often, then removing pgbouncer from the loop would probably be a win. Yes, if app has good pooling, there is less use for pgbouncer. In any case, only long connections should go over network. -- marko
On Mon, Nov 5, 2012 at 3:58 PM, Marko Kreen <markokr@gmail.com> wrote: > On Tue, Nov 6, 2012 at 1:31 AM, Jeff Janes <jeff.janes@gmail.com> wrote: >> On Mon, Nov 5, 2012 at 2:58 PM, Marko Kreen <markokr@gmail.com> wrote: >>> On Sun, Nov 4, 2012 at 1:53 AM, Jeff Janes <jeff.janes@gmail.com> wrote: >>>> On a 4 CPU machine, if I run pgbench -c10 -j10 with dummy queries >>>> (like "select 1;" or "set timezone...") against 2 instances of >>>> pgbouncer, I get nearly twice the throughput as if I use only one >>>> instance. >>>> >>>> A rather odd workload, maybe, but it does seem to be similar to the >>>> one that started this thread. >>> >>> Every-connection-is-busy is pessimal workload for pgbouncer, >>> as it has nothing useful to contribute to setup, just overhead. >> >> It still has something to contribute if connections are made and >> broken too often (pgbench -C type workload), as seems to be the case >> here. > > I did not notice -C in your message above. Right, I was assuming he would somehow solve that problem and was looking ahead to the next one. I had also tested the -C case, and pgbouncer can be the bottleneck there as well, but bypassing it will not solve the bottleneck because it will be even worse with direct connections. Running multiple instances of pgbouncer can, but only if you can make the application do some kind of load balancing between them. I think there are three different uses of pgbouncer. 1) connections made and closed too often, even if there are never very many at a time (e.g. stateless CGI) 2) hundreds or thousands of connections, with most idle at any given time. 3) hundreds or thousands, all of which want to be active at once but which need to be forced not to be so the server doesn't fall over due to contention. I'm not sure 2 and 3 are really fundamentally different. Cheers, Jeff
Thanks to everybody for their help, sorry for not getting back earlier but available time shrunk very quickly as the deadline approached and afterwards this kind of slipped off my mind. On Tue, Nov 6, 2012 at 12:31 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > It still has something to contribute if connections are made and > broken too often (pgbench -C type workload), as seems to be the case > here. Django opens a connection for every request and closes it at the end of the request. As far as I know you can't override this, they tell you that if connection overhead is too big you should use a connection pool like pgbouncer. You still get latency by doing the connection and some overhead in pgbouncer but you skip creating a Postgres process to handle the new connection. And indeed, after starting to use pgbouncer we could handle more concurrent users. > If he can get an application-side pooler (or perhaps just a change in > configuration) such that the connections are not made and broken so > often, then removing pgbouncer from the loop would probably be a win. Django doesn't offer application-side poolers, they tell you to use pgbouncer (see above). So pgbouncer is a net gain since it avoids Postgres process spawning overhead. Following recommendations in this thread, I replaced the global pgbouncer on the DB machine by one pgbouncer for each webserver machine and that helped. I didn't run the synthetic ab test in my initial message on the new configuration but for our more realistic tests, page response times did shorten. The system is in production now so it's harder to run the tests again to see exactly how much it helped but it definitely did. So it seems we're just doing too many connections and too many queries. Each page view from a user translates to multiple requests to the application server and each of those translates to a connection and at least a few queries (which are done in middleware and therefore happen for each and every query). One pgbouncer can handle lots of concurrent idle connections and lots of queries/second but our 9000 queries/second to seem push it too much. The longer term solution for us would probably be to do less connections (by doing less Django requests for a page) and less queries, before our deadline we were just searching for a short term solution to handle an expected traffic spike. Cheers, Catalin Iacob
On 25.11.2012 18:30, Catalin Iacob wrote: > So it seems we're just doing too many connections and too many > queries. Each page view from a user translates to multiple requests to > the application server and each of those translates to a connection > and at least a few queries (which are done in middleware and therefore > happen for each and every query). One pgbouncer can handle lots of > concurrent idle connections and lots of queries/second but our 9000 > queries/second to seem push it too much. The longer term solution for > us would probably be to do less connections (by doing less Django > requests for a page) and less queries, before our deadline we were > just searching for a short term solution to handle an expected traffic > spike. The typical solution to that is caching, see https://docs.djangoproject.com/en/1.4/topics/cache/. - Heikki
On Mon, Nov 26, 2012 at 12:46 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 25.11.2012 18:30, Catalin Iacob wrote: >> >> So it seems we're just doing too many connections and too many >> queries. Each page view from a user translates to multiple requests to >> the application server and each of those translates to a connection >> and at least a few queries (which are done in middleware and therefore >> happen for each and every query). One pgbouncer can handle lots of >> concurrent idle connections and lots of queries/second but our 9000 >> queries/second to seem push it too much. The longer term solution for >> us would probably be to do less connections (by doing less Django >> requests for a page) and less queries, before our deadline we were >> just searching for a short term solution to handle an expected traffic >> spike. > > > The typical solution to that is caching, see > https://docs.djangoproject.com/en/1.4/topics/cache/. The first caching solution they recommend is memcached, which I too highly recommend. Put a single instance on each server in your farm give it 1G in each place and go to town. You can get MASSIVE performance boosts from memcache.