Обсуждение: Admission Control Policy
This paper has a brief but interesting discussion of Admission Control in section 2.4: Architecture of a Database System. (Joseph M. Hellerstein, Michael Stonebraker and James Hamilton). Foundations and Trends in Databases 1(2). http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf They describe a two-tier approach, where the first tier is already effectively implemented in PostgreSQL with the max_connections and superuser_reserved_connections GUCs. The second tier is implemented to run after a plan is chosen, and may postpone execution of a query (or reduce the resources it is allowed) if starting it at that time might overload available resources. I think that implementing something like this could potentially help with several types of problems. We often see posts from people who have more active connections than is efficient. We could, for example, have a policy which queues query requests which are *not* from a superuser and not part of a transaction which has acquired a snapshot or any locks, if the number of active transactions is above a certain threshold. Proper configuration of a policy like this might change the performance graph to stay relatively steady past the "knee" rather than degrading. We occasionally see posts where people have exhausted available RAM and suffered a severe performance hit or a crash, due to an excessively high setting of work_mem or maintenance_work_mem. A good policy might warn and reduce the setting or reschedule execution to keep things from getting too out of hand. A good policy might also reduce conflicts between transactions, making stricter transaction isolation less painful. While this observation motivated me to think about it, it seems potentially useful on its own. It might perhaps make sense to provide some hook to allow custom policies to supplement or override a simple default policy. Thoughts? -Kevin
Hi, Le 28 déc. 2009 à 21:33, Kevin Grittner a écrit : > We often see posts from people who have more active connections than > is efficient. How would your proposal better solve the problem than using pgbouncer? <mad proposal time> I'd be in favor of considering how to get pgbouncer into -core, and now that we have Hot Standby maybe implement a mode inwhich as soon as a "real" XID is needed, or maybe upon receiving start transaction read write command, the connection ishandled transparently to the master. </> Regards, -- dim
On Monday 28 December 2009 22:39:06 Dimitri Fontaine wrote: > Hi, > > Le 28 déc. 2009 à 21:33, Kevin Grittner a écrit : > > We often see posts from people who have more active connections than > > is efficient. > > How would your proposal better solve the problem than using pgbouncer? > > <mad proposal time> > I'd be in favor of considering how to get pgbouncer into -core, and now > that we have Hot Standby maybe implement a mode in which as soon as a > "real" XID is needed, or maybe upon receiving start transaction read write > command, the connection is handled transparently to the master. </> Thats not as easy as it sounds - the master may not have all data needed by the snapshot on the slave anymore. Andres
Le 28 déc. 2009 à 22:46, Andres Freund a écrit : >> <mad proposal time> >> I'd be in favor of considering how to get pgbouncer into -core, and now >> that we have Hot Standby maybe implement a mode in which as soon as a >> "real" XID is needed, or maybe upon receiving start transaction read write >> command, the connection is handled transparently to the master. </> > Thats not as easy as it sounds - the master may not have all data needed by > the snapshot on the slave anymore. I suppose that if it was easy some patch would already be around for next commit fest? :) Seriously, your point is why I'd be tempted to only consider getting to the master at transaction starting time. That isbefore any snapshot is taken. Regards, -- dim
Dimitri Fontaine <dfontaine@hi-media.com> wrote: > Le 28 déc. 2009 à 21:33, Kevin Grittner a écrit : >> We often see posts from people who have more active connections >> than is efficient. > > How would your proposal better solve the problem than using > pgbouncer? With my current knowledge of pgbouncer I can't answer that definitively; but *if* pgbouncer, when configured for transaction pooling, can queue new transaction requests until a connection is free, then the differences would be: (1) According to pgbouncer documentation, transaction pooling is "a hack as it breaks application expectations of backend connection. You can use it only when application cooperates with such usage by not using features that can break." This would not be an issue with an ACP. (2) For the "active connection" aspect of the policy, you could let through superuser requests while other requests were queuing. (3) With the ACP, the statements would be parsed and optimized before queuing, so they would be "ready to execute" as soon as a connection was freed. (4) Other factors than active connection count could be applied, like expected memory consumption, or more esoteric metrics. In favor of pgbouncer (or other connection poolers) they don't require the overhead of a process and connection for each idle connection, so I would recommend a connection pooler even with an ACP. They cover overlapping ground, but I see them as more complementary than competing. -Kevin
Le 28 déc. 2009 à 22:59, Kevin Grittner a écrit : > With my current knowledge of pgbouncer I can't answer that > definitively; but *if* pgbouncer, when configured for transaction > pooling, can queue new transaction requests until a connection is > free, then the differences would be: It does that, yes. You setup a pool, which is per database/user, and when there's no more server side connection in the pool,the clients are held in "cl_waiting" state. > (1) According to pgbouncer documentation, transaction pooling is "a > hack as it breaks application expectations of backend connection. > You can use it only when application cooperates with such usage by > not using features that can break." This would not be an issue with > an ACP. That's why there's both transaction and session pooling. The benefit of session pooling is to avoid forking backends, reusingthem instead, and you still get the pooling control. > (2) For the "active connection" aspect of the policy, you could let > through superuser requests while other requests were queuing. superuser is another user and gets its own pool, I'm not sure if you can size it differently though (yet). It's possibleto trick a little by defining another (virtual) database where you force the user in the connection string to theserver, then tell your application to use this special database. > (3) With the ACP, the statements would be parsed and optimized > before queuing, so they would be "ready to execute" as soon as a > connection was freed. There's a pgfoundry project called preprepare, which can be used along with pgbouncer to get this effect. If you use 8.4,you can even get the effect without pgbouncer. http://preprepare.projects.postgresql.org/README.html > (4) Other factors than active connection count could be applied, > like expected memory consumption, or more esoteric metrics. All you can put in connection strings or per-role setting can be used to trick a virtual database and have it pre-set, butthat means different pools (they accumulate, now) and different connection strings for the application. The only advantageis that it works with released and proven code! (except for preprepare... well I've been told it's running in productionsomewhere) > In favor of pgbouncer (or other connection poolers) they don't > require the overhead of a process and connection for each idle > connection, so I would recommend a connection pooler even with an > ACP. They cover overlapping ground, but I see them as more > complementary than competing. Yeah, just trying to understand what you're proposing in terms of what I already know :) -- dim
Dimitri Fontaine <dfontaine@hi-media.com> wrote: > That's why there's both transaction and session pooling. The > benefit of session pooling is to avoid forking backends, reusing > them instead, and you still get the pooling control. So the application would need to open and close a pgbouncer connection for each database transaction in order to share the backend properly? >> (4) Other factors than active connection count could be applied, >> like expected memory consumption, or more esoteric metrics. > > All you can put in connection strings or per-role setting can be > used to trick a virtual database and have it pre-set, but that > means different pools (they accumulate, now) and different > connection strings for the application. Well, I don't know that you can very accurately predict a plan or what its memory usage would be. Trying to work out all permutations in advance and send each query to the right pool doesn't seem workable on a large scale. If we had a pooler bundled into the backend and defaulted to a halfway reasonable configuration, it's possible that implementing an active connection limit the second tier ACP would be covering close enough to the same ground as to be redundant. I'm not quite convinced, however, that your proposed use of pgbouncer for this, given the multiple pools which would need to be configured and the possible application awareness and cooperation with policy would be better than a fairly simple ACP. It seems a bit like driving nails with a wrench. I like wrenches, I use them to turn things, but I don't like using them to drive nails when I can help it. :-) -Kevin
Le 28 déc. 2009 à 23:35, Kevin Grittner a écrit : > So the application would need to open and close a pgbouncer > connection for each database transaction in order to share the > backend properly? No, in session pooling you get the same backend connection for the entire pgbouncer connection, it's a 1-1 mapping. > Well, I don't know that you can very accurately predict a plan or > what its memory usage would be. Trying to work out all permutations > in advance and send each query to the right pool doesn't seem > workable on a large scale. True. I was just trying to see what components we already have, while you're explaining what's missing: teamwork? :) > If we had a pooler bundled into the backend and defaulted to a > halfway reasonable configuration, it's possible that implementing an > active connection limit the second tier ACP would be covering close > enough to the same ground as to be redundant. I'm not quite > convinced, however, that your proposed use of pgbouncer for this, > given the multiple pools which would need to be configured and the > possible application awareness and cooperation with policy would be > better than a fairly simple ACP. It seems a bit like driving nails > with a wrench. I like wrenches, I use them to turn things, but I > don't like using them to drive nails when I can help it. :-) Hehe, pushing what we already have to their limits is often a nice way to describe what we want but still don't have... Ithink... -- dim
Dimitri Fontaine <dfontaine@hi-media.com> wrote: > Le 28 déc. 2009 à 22:59, Kevin Grittner a écrit : >> (3) With the ACP, the statements would be parsed and optimized >> before queuing, so they would be "ready to execute" as soon as a >> connection was freed. > > There's a pgfoundry project called preprepare, which can be used > along with pgbouncer to get this effect. If you use 8.4, you can > even get the effect without pgbouncer. > > http://preprepare.projects.postgresql.org/README.html I just reviewed the documentation for preprepare -- I can see a use case for that, but I really don't think it has a huge overlap with my point. The parsing and planning mentioned in my point 3 would apply to any query -- ad hoc, generated by an ORM, etc. The preprepare project seems to be a way to create "persistent prepared statements" which are automatically materialized upon connection. -Kevin
Le 28 déc. 2009 à 23:56, Kevin Grittner a écrit : >> http://preprepare.projects.postgresql.org/README.html > > I just reviewed the documentation for preprepare -- I can see a use > case for that, but I really don't think it has a huge overlap with > my point. The parsing and planning mentioned in my point 3 would > apply to any query -- ad hoc, generated by an ORM, etc. The > preprepare project seems to be a way to create "persistent prepared > statements" which are automatically materialized upon connection. Just that, right. -- dim
Dimitri Fontaine <dfontaine@hi-media.com> wrote: > No, in session pooling you get the same backend connection for the > entire pgbouncer connection, it's a 1-1 mapping. Right -- so it doesn't allow more logical connections than that with a limit to how many are active at any one time, *unless* the clients cooperate by closing the connections between transactions -- effectively requiring a client "yield" to accomplish what an ACP could do without special client cooperation. >> Well, I don't know that you can very accurately predict a plan or >> what its memory usage would be. Trying to work out all >> permutations in advance and send each query to the right pool >> doesn't seem workable on a large scale. > > True. I was just trying to see what components we already have, > while you're explaining what's missing: teamwork? :) It would take a lot more than teamwork to accurately predict those things. Particularly in an environment with a large number of dynamically generated queries. > pushing what we already have to their limits is often a nice way > to describe what we want but still don't have... Sure, and I'm a big fan of building things from proven smaller pieces where possible. Like with Linux utilities (grep, sed, awk, find, xargs). I just think that in this case a connection pool is complementary and doesn't fit into the solution to these particular problems very well. -Kevin
On Mon, Dec 28, 2009 at 3:33 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > They describe a two-tier approach, where the first tier is already > effectively implemented in PostgreSQL with the max_connections and > superuser_reserved_connections GUCs. The second tier is implemented > to run after a plan is chosen, and may postpone execution of a query > (or reduce the resources it is allowed) if starting it at that time > might overload available resources. It seems like it might be helpful, before tackling what you're talking about here, to have some better tools for controlling resource utilization. Right now, the tools we have a pretty crude. You can't even nice/ionice a certain backend without risking priority inversion, and there's no sensible way to limit the amount of amount of working memory per-query, only per query-node. http://archives.postgresql.org/pgsql-hackers/2009-10/msg00125.php ...Robert
Robert Haas <robertmhaas@gmail.com> wrote: > It seems like it might be helpful, before tackling what you're talking > about here, to have some better tools for controlling resource > utilization. Right now, the tools we have a pretty crude. You can't > even nice/ionice a certain backend without risking priority inversion, > and there's no sensible way to limit the amount of amount of working > memory per-query, only per query-node. > > http://archives.postgresql.org/pgsql-hackers/2009-10/msg00125.php I will review and consider. Thanks. -Kevin
Robert Haas <robertmhaas@gmail.com> wrote: > Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >> The second tier is implemented to run after a plan is chosen, and >> may postpone execution of a query (or reduce the resources it is >> allowed) if starting it at that time might overload available >> resources. > > It seems like it might be helpful, before tackling what you're > talking about here, to have some better tools for controlling > resource utilization. Right now, the tools we have a pretty > crude. You can't even nice/ionice a certain backend without > risking priority inversion, and there's no sensible way to limit > the amount of amount of working memory per-query, only per > query-node. > > http://archives.postgresql.org/pgsql-hackers/2009-10/msg00125.php I see your point, but it seems largely orthogonal: (1) These issues wouldn't preclude a very simple but still useful ACP which just limits the active connection count. This is really what I most want, and would solve a problem frequently reported on the lists. (2) If the ACP had a hook to allow plugging new policies, it would support development and testing of the types of measurement and control you describe, not hinder it. (3) You could get some useful benefit from an ACP which just postponed queries when a memory-heavy plan was ready and a lot of memory was already reserved by executing queries anticipated to be memory-heavy. That is, you wouldn't need to solve the harder problem of *limiting* memory usage to get benefit from being able to roughly *estimate* memory usage. Frankly, solving the problems you reference might be more work than implementing true serializable transactions. (At least *I'm* clueless about how to solve the memory allocation problems, and feel relatively confident about how to deal with serializable transactions.) I'm interested in ACPs because even the simplest implementation could reduce the number of serialization errors in some environments, improving performance in serializable isolation level. If doing that is a first step in helping to solve the problems you describe, I'll be happy to have helped. I don't think our shop can afford to tackle everything you reference there, however. -Kevin