Обсуждение: FlexLocks
I've been noodling around with various methods of reducing ProcArrayLock contention and (after many false starts) I think I've finally found one that works really well. I apologize in advance if this makes your head explode; I think that the design I have hear is solid, but it represents a significant and invasive overhaul of the LWLock system - I think for the better, but you'll have to be the judge. I'll start with the performance numbers (from that good ol' 32-core Nate Boley system), where I build from commit f1585362856d4da17113ba2e4ba46cf83cba0cf2, with and without the attached patches, and then ran pgbench on logged and unlogged tables with various numbers of clients, with shared_buffers = 8GB, maintenance_work_mem = 1GB, synchronous_commit = off, checkpoint_segments = 300, checkpoint_timeout = 15min, checkpoint_completion_target = 0.9, wal_writer_delay = 20ms. The numbers below are (as usual) the median of three-five minute runs at scale factor 100. The lines starting with "m" and a number are that number of clients on unpatched master, and the lines starting with "f" are that number of clients with this patch set. The really big win here is unlogged tables at 32 clients, where throughput has *doubled* and now scales *better than linearly* as compared with the single-client results. == Unlogged Tables == m01 tps = 679.737639 (including connections establishing) f01 tps = 668.275270 (including connections establishing) m08 tps = 4771.757193 (including connections establishing) f08 tps = 4867.520049 (including connections establishing) m32 tps = 10736.232426 (including connections establishing) f32 tps = 21303.295441 (including connections establishing) m80 tps = 7829.989887 (including connections establishing) f80 tps = 19835.231438 (including connections establishing) == Permanent Tables == m01 tps = 634.424125 (including connections establishing) f01 tps = 633.450405 (including connections establishing) m08 tps = 4544.781551 (including connections establishing) f08 tps = 4556.298219 (including connections establishing) m32 tps = 9902.844302 (including connections establishing) f32 tps = 11028.745881 (including connections establishing) m80 tps = 7467.437442 (including connections establishing) f80 tps = 11909.738232 (including connections establishing) A couple of other interesting things buried in these numbers: 1. Permanent tables don't derive nearly as much benefit as unlogged tables. I believe that this is because, for permanent tables, the major bottleneck is WALInsertLock. Fixing ProcArrayLock squeezes out a healthy 10%, but we'll have to make significant performance on WALInsertLock to get anywhere close to linear scaling. 2. The drop-off between 32 clients and 80 clients is greatly reduced with this patch set; indeed, for permanent tables, tps increased slightly between 32 and 80 clients. I believe that small decrease for unlogged tables is likely due to the fact that by 80 tables, WALInsertLock starts to become a contention point, due to the need to insert the commit records. In terms of the actual patches, it's been bugging me for a while that the LWLock code contains a lot of infrastructure that's not easily reusable by other parts of the system. So the first of the two attached patches, flexlock-v1.patch, separates the LWLock code into an upper layer and a lower layer. The lower layer I called "FlexLocks", and it's designed to allow a variety of locking implementations to be built on top of it and reuse as much of the basic infrastructure as I could figure out how to make reusable without hurting performance too much. LWLocks become the anchor client of the FlexLock system; in essence, most of flexlock.c is code that was removed from lwlock.c. The second patch, procarraylock.c, uses that infrastructure to define a new type of FlexLock specifically for ProcArrayLock. It basically works like a regular LWLock, except that it has a special operation to optimize ProcArrayEndTransaction(). In the uncontended case, instead of acquiring and releasing the lock, it just grabs the lock, observes that there is no contention, clears the critical PGPROC fields (which isn't noticeably slower than updating the state of the lock would be) and releases the spin lock. There's then no need to reacquire the spinlock to "release" the lock; we're done. In the contended case, the backend wishing to end adds itself to a queue of ending transactions. When ProcArrayLock is released, the last person out clears the PGPROC structures for all the waiters and wakes them all up; they don't need to reacquire the lock, because the work they wished to perform while holding it is already done. Thus, in the *worst* case, ending transactions only need to acquire the spinlock protecting ProcArrayLock half as often (once instead of twice), and in the best case (where backends have to keep retrying only to repeatedly fail to get the lock) it's far better than that. Of course, there are ways that this could be implemented without the FlexLock stuff, if people don't like this solution. Myself, I find it quite elegant (though there are certainly arguable points in there where the code could probably be improved), but then again, I wrote it. For what it's worth, I believe that there are other places where the FlexLock infrastructure could be helpful. In this case, the new ProcArrayLock is very specific to what ProcArrayLock actually does, and can't be really reused for anything else. But I've had a thought that we might want to have a type of FlexLock that contains an LSN. The lock holder advances the LSN and can then release everyone who was waiting for a value <= that LSN without them needing to reacquire the lock. This could be useful for things like WALWriteLock, and sync rep. Also, I think there might be interesting applications for buffer locks, perhaps by having a lock type that manages both content locks and pins. Alternatively, if we want to support CRCs, it might be useful to have a third buffer lock mode in between shared and exclusive. SX would conflict with itself and with exclusive but not with shared, and would be required to write out the page or set hint bits but not just to examine tuples; this could be used to ensure that the page doesn't change (thus invalidating the CRC) while the write is in progress. I'm not necessarily saying that any of these particular things are what we want to do, just throwing out the idea that we may want a variety of lock types that are similar to lightweight locks but with subtly different behavior, yet with common infrastructure for error handling and wait queue management. Anyway, this is all up for discussion, argument, etc. - but here are the patches. Comments, idea, thoughts, code review, and/or testing are appreciated. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Robert Haas <robertmhaas@gmail.com> wrote: > I'm not necessarily saying that any of these particular > things are what we want to do, just throwing out the idea that we > may want a variety of lock types that are similar to lightweight > locks but with subtly different behavior, yet with common > infrastructure for error handling and wait queue management. The locking in the clog area is pretty funky. I bet we could craft a special flavor of FlexLock to make that cleaner. And I would be surprised if some creative thinking didn't yield a far better FL scheme for SSI than we can manage with existing LW locks. Your description makes sense to me, and your numbers prove the value of the concept. Whether there's anything in the implementation I would quibble about will take some review time. -Kevin
On Tue, Nov 15, 2011 at 1:50 PM, Robert Haas <robertmhaas@gmail.com> wrote: > It basically > works like a regular LWLock, except that it has a special operation to > optimize ProcArrayEndTransaction(). In the uncontended case, instead > of acquiring and releasing the lock, it just grabs the lock, observes > that there is no contention, clears the critical PGPROC fields (which > isn't noticeably slower than updating the state of the lock would be) > and releases the spin lock. There's then no need to reacquire the > spinlock to "release" the lock; we're done. In the contended case, > the backend wishing to end adds itself to a queue of ending > transactions. When ProcArrayLock is released, the last person out > clears the PGPROC structures for all the waiters and wakes them all > up; they don't need to reacquire the lock, because the work they > wished to perform while holding it is already done. Thus, in the > *worst* case, ending transactions only need to acquire the spinlock > protecting ProcArrayLock half as often (once instead of twice), and in > the best case (where backends have to keep retrying only to repeatedly > fail to get the lock) it's far better than that. Which is the same locking avoidance technique we already use for sync rep and for the new group commit patch. I've been saying for some time that we should use the same technique for ProcArray and clog also, so we only need to queue once rather than queue three times at end of each transaction. I'm not really enthused by the idea of completely rewriting lwlocks for this. Seems like specialised code is likely to be best, as well as having less collateral damage. With that in mind, should we try to fuse the group commit with the procarraylock approach, so we just queue once and get woken when all the activities have been handled? If the first woken proc performs the actions then wakes people further down the queue it could work quite well. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Nov 15, 2011 at 1:40 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > Which is the same locking avoidance technique we already use for sync > rep and for the new group commit patch. Yep... > I've been saying for some time that we should use the same technique > for ProcArray and clog also, so we only need to queue once rather than > queue three times at end of each transaction. > > I'm not really enthused by the idea of completely rewriting lwlocks > for this. Seems like specialised code is likely to be best, as well as > having less collateral damage. Well, the problem that I have with that is that we're going to end up with a lot of specialized code, particularly around error recovery. This code doesn't remove the need for ProcArrayLock to be taken in exclusive mode, and I don't think there IS any easy way to remove the need for that to happen sometimes. So we have to deal with the possibility that an ERROR might occur while we hold the lock, which means we have to release the lock and clean up our state. That means every place that has a call to LWLockReleaseAll() will now also need to cleanup ProperlyCleanUpProcArrayLockStuff(). And the next time we need to add some kind of specialized lock, we'll need to do the same thing again. It seems to me that that rapidly gets unmanageable, not to mention *slow*. We need some kind of common infrastructure for releasing locks, and this is an attempt to create such a thing. I'm not opposed to doing it some other way, but I think doing each one as a one-off isn't going to work out very well. Also, in this particular case, I really do want shared and exclusive locks on ProcArrayLock to stick around; I just want one additional operation as well. It's a lot less duplicated code to do that this way than it is to write something from scratch. The FlexLock patch may look big, but it's mostly renaming and rearranging; it's really not adding much code. > With that in mind, should we try to fuse the group commit with the > procarraylock approach, so we just queue once and get woken when all > the activities have been handled? If the first woken proc performs the > actions then wakes people further down the queue it could work quite > well. Well, there's too much work there to use the same approach I took here: we can't very well hold onto the LWLock spinlock while flushing WAL or waiting for synchronous replication. Fusing together some parts of the commit sequence might be the right approach (I don't know), but honestly my gut feeling is that the first thing we need to do is go in the opposite direction and break up WALInsertLock into multiple locks that allow better parallelization of WAL insertion. Of course if someone finds a way to fuse the whole commit sequence together in some way that improves performance, fantastic, but having tried a lot of things before I came up with this approach, I'm a bit reluctant to abandon it in favor of an approach that hasn't been coded or tested yet. I think we should pursue this approach for now, and we can always revise it later if someone comes up with something even better. As a practical matter, the test results show that with these patches, ProcArrayLock is NOT a bottleneck at 32 cores, which seems like enough reason to be pretty happy with it, modulo implementation details. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Excerpts from Robert Haas's message of mar nov 15 17:16:31 -0300 2011: > On Tue, Nov 15, 2011 at 1:40 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > Which is the same locking avoidance technique we already use for sync > > rep and for the new group commit patch. > > Yep... > > > I've been saying for some time that we should use the same technique > > for ProcArray and clog also, so we only need to queue once rather than > > queue three times at end of each transaction. > > > > I'm not really enthused by the idea of completely rewriting lwlocks > > for this. Seems like specialised code is likely to be best, as well as > > having less collateral damage. > > Well, the problem that I have with that is that we're going to end up > with a lot of specialized code, particularly around error recovery. > This code doesn't remove the need for ProcArrayLock to be taken in > exclusive mode, and I don't think there IS any easy way to remove the > need for that to happen sometimes. So we have to deal with the > possibility that an ERROR might occur while we hold the lock, which > means we have to release the lock and clean up our state. That means > every place that has a call to LWLockReleaseAll() will now also need > to cleanup ProperlyCleanUpProcArrayLockStuff(). And the next time we > need to add some kind of specialized lock, we'll need to do the same > thing again. It seems to me that that rapidly gets unmanageable, not > to mention *slow*. We need some kind of common infrastructure for > releasing locks, and this is an attempt to create such a thing. I'm > not opposed to doing it some other way, but I think doing each one as > a one-off isn't going to work out very well. I agree. In fact, I would think that we should look into rewriting the sync rep locking (and group commit) on top of flexlocks, not the other way around. As Kevin says nearby it's likely that we could find some way to rewrite the SLRU (clog and such) locking protocol using these new things too. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> wrote: > As Kevin says nearby it's likely that we could find some way to > rewrite the SLRU (clog and such) locking protocol using these new > things too. Yeah, I really meant all SLRU, not just clog. And having seen what Robert has done here, I'm kinda glad I haven't gotten around to trying to reduce LW lock contention yet, even though we're getting dangerously far into the release cycle -- I think it can be done much better with the, er, flexibility offered by the FlexLock patch. -Kevin
On Tue, Nov 15, 2011 at 3:47 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Alvaro Herrera <alvherre@commandprompt.com> wrote: > >> As Kevin says nearby it's likely that we could find some way to >> rewrite the SLRU (clog and such) locking protocol using these new >> things too. > > Yeah, I really meant all SLRU, not just clog. And having seen what > Robert has done here, I'm kinda glad I haven't gotten around to > trying to reduce LW lock contention yet, even though we're getting > dangerously far into the release cycle -- I think it can be done > much better with the, er, flexibility offered by the FlexLock patch. I've had a thought that the SLRU machinery could benefit from having the concept of a "pin", which it currently doesn't. I'm not certain whether that thought is correct. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Nov 15, 2011 at 10:55:49AM -0600, Kevin Grittner wrote: > And I would be > surprised if some creative thinking didn't yield a far better FL > scheme for SSI than we can manage with existing LW locks. One place I could see it being useful is for SerializableFinishedListLock, which protects the queue of committed sxacts that can't yet be cleaned up. When committing a transaction, it gets added to the list, and then scans the queue to find and clean up any sxacts that are no longer needed. If there's contention, we don't need multiple backends doing that scan; it's enough for one to complete it on everybody's behalf. I haven't thought it through, but it may also help with the other contention bottleneck on that lock: that every transaction needs to add itself to the cleanup list when it commits. Mostly unrelatedly, the other thing that's looking like it would be really useful would be some support for atomic integer operations. This would be useful for some SSI things like writableSxactCount, and some things elsewhere like the strong lock count in the regular lock manager. I've been toying with the idea of creating an AtomicInteger type with a few operations like increment-and-get, compare-and-set, swap, etc. This would be implemented using the appropriate hardware operations on platforms that support them (x86_64, perhaps others) and fall back on a spinlock implementation on other platforms. I'll probably give it a try and see what it looks like, but if anyone has any thoughts, let me know. Dan -- Dan R. K. Ports MIT CSAIL http://drkp.net/
On Tue, Nov 15, 2011 at 8:33 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: >> > I'm not really enthused by the idea of completely rewriting lwlocks >> > for this. Seems like specialised code is likely to be best, as well as >> > having less collateral damage. >> >> Well, the problem that I have with that is that we're going to end up >> with a lot of specialized code, particularly around error recovery. >> This code doesn't remove the need for ProcArrayLock to be taken in >> exclusive mode, and I don't think there IS any easy way to remove the >> need for that to happen sometimes. So we have to deal with the >> possibility that an ERROR might occur while we hold the lock, which >> means we have to release the lock and clean up our state. That means >> every place that has a call to LWLockReleaseAll() will now also need >> to cleanup ProperlyCleanUpProcArrayLockStuff(). And the next time we >> need to add some kind of specialized lock, we'll need to do the same >> thing again. It seems to me that that rapidly gets unmanageable, not >> to mention *slow*. We need some kind of common infrastructure for >> releasing locks, and this is an attempt to create such a thing. I'm >> not opposed to doing it some other way, but I think doing each one as >> a one-off isn't going to work out very well. > > I agree. In fact, I would think that we should look into rewriting the > sync rep locking (and group commit) on top of flexlocks, not the other > way around. As Kevin says nearby it's likely that we could find some > way to rewrite the SLRU (clog and such) locking protocol using these new > things too. I see the commonality between ProcArray locking and Sync Rep/ Group Commit locking. It's basically the same design, so although it wasn't my first thought, I agree. I did originally write that using spinlocks, but that was objected to. Presumably the same objection would hold here also, but if it doesn't that's good. Mixing the above 3 things together is enough for me; I just don't see the reason to do a global search and replace on the lwlock name in order to do that. This is 2 patches at same time, 1 we clearly need, 1 I'm not sure about. Perhaps some more explanation about the flexlocks structure and design will smooth that unease. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> wrote: > I just don't see the reason to do a global search and replace on > the lwlock name I was going to review further before commenting on that, but since it has now come up -- it seems odd that a source file which uses only LW locks needs to change so much for the FlexLock implementation. I'm not sure source code which uses the next layer up from FlexLock should need to be aware of it quite so much. I assume that this was done to avoid adding a layer to some code where it could cause an unnecessary performance hit, but maybe it would be worth just wrapping with a macro to isolate the levels where that's all it takes. For example, if these two macros were defined, predicate.c wouldn't have needed any modifications, and I suspect that is true of many other files (although possibly needing a few other macros): #define LWLockId FlexLockId #define LWLockHeldByMe(lock) FlexLockHeldByMe(lock) Particularly with the function call it seems like it's a mistake to assume that test will always be the same between LW locks and flex locks. There may be a better way to do it than the above, but I think a worthy goal would be to impose zero source code changes on code which continues to use "traditional" lightweight locks. -Kevin
On Wed, Nov 16, 2011 at 10:26 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > For example, if these two macros were defined, predicate.c wouldn't > have needed any modifications, and I suspect that is true of many > other files (although possibly needing a few other macros): > > #define LWLockId FlexLockId > #define LWLockHeldByMe(lock) FlexLockHeldByMe(lock) > > Particularly with the function call it seems like it's a mistake to > assume that test will always be the same between LW locks and flex > locks. There may be a better way to do it than the above, but I > think a worthy goal would be to impose zero source code changes on > code which continues to use "traditional" lightweight locks. Well, it would certainly be easy enough to add those macros, and I'm not necessarily opposed to it, but I fear it could end up being a bit confusing in the long run. If we adopt this infrastructure, then I expect knowledge of different types of FlexLocks to gradually propagate through the system. Now, you're always going to use LWLockAcquire() and LWLockRelease() to acquire and release an LWLock, but a FlexLockId isn't guaranteed to be an LWLockId - any given value might also refer to a FlexLock of some other type. If we let everyone continue to refer to those things as LWLockIds, then it seems like only a matter of time before someone has a variable that's declared as LWLockId but actually doesn't refer to an LWLock at all. I think it's better to bite the bullet and do the renaming up front, rather than having to think about it every time you modify some code that uses LWLockId or LWLockHeldByMe and say to yourself, "oh, wait a minute, is this really guaranteed to be an LWLock?" For LWLockHeldByMe, a sensible compromise might be to add a function that asserts that the FlexLockId passed as an argument is in fact pointing to an LWLock, and then calls FlexLockHeldByMe() and returns the result. That way you'd presumably noticed if you used the more specific function when you needed the more general one (because, hopefully, the regression tests would fail). But I'm not seeing any obvious way of providing a similar degree of insulation against abusing LWLockId. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Simon Riggs <simon@2ndQuadrant.com> wrote: >> I just don't see the reason to do a global search and replace on >> the lwlock name > I was going to review further before commenting on that, but since > it has now come up -- it seems odd that a source file which uses > only LW locks needs to change so much for the FlexLock > implementation. Yeah, -1 on wideranging source changes for me too. There is no reason that the current LWLock API need change. (I'm not saying that it has to be same ABI though --- macro wrappers would be fine.) regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > Well, it would certainly be easy enough to add those macros, and I'm > not necessarily opposed to it, but I fear it could end up being a bit > confusing in the long run. If we adopt this infrastructure, then I > expect knowledge of different types of FlexLocks to gradually > propagate through the system. Now, you're always going to use > LWLockAcquire() and LWLockRelease() to acquire and release an LWLock, > but a FlexLockId isn't guaranteed to be an LWLockId - any given value > might also refer to a FlexLock of some other type. If we let everyone > continue to refer to those things as LWLockIds, then it seems like > only a matter of time before someone has a variable that's declared as > LWLockId but actually doesn't refer to an LWLock at all. I think it's > better to bite the bullet and do the renaming up front, rather than > having to think about it every time you modify some code that uses > LWLockId or LWLockHeldByMe and say to yourself, "oh, wait a minute, is > this really guaranteed to be an LWLock?" In that case, I think you've chosen an unfortunate naming convention and should rethink it. There is not any benefit to be gained from a global search and replace here, and as somebody who spends quite enough time dealing with cross-branch coding differences already, I'm going to put my foot down about introducing a useless one. Perhaps it would be better to think of this as "they're all lightweight locks, but some have different locking policies". Or "we're taking a different type of lock on this particular lock" --- that would match up rather better with the way we think about heavyweight locks. regards, tom lane
On Wed, Nov 16, 2011 at 10:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Well, it would certainly be easy enough to add those macros, and I'm >> not necessarily opposed to it, but I fear it could end up being a bit >> confusing in the long run. If we adopt this infrastructure, then I >> expect knowledge of different types of FlexLocks to gradually >> propagate through the system. Now, you're always going to use >> LWLockAcquire() and LWLockRelease() to acquire and release an LWLock, >> but a FlexLockId isn't guaranteed to be an LWLockId - any given value >> might also refer to a FlexLock of some other type. If we let everyone >> continue to refer to those things as LWLockIds, then it seems like >> only a matter of time before someone has a variable that's declared as >> LWLockId but actually doesn't refer to an LWLock at all. I think it's >> better to bite the bullet and do the renaming up front, rather than >> having to think about it every time you modify some code that uses >> LWLockId or LWLockHeldByMe and say to yourself, "oh, wait a minute, is >> this really guaranteed to be an LWLock?" > > In that case, I think you've chosen an unfortunate naming convention > and should rethink it. There is not any benefit to be gained from a > global search and replace here, and as somebody who spends quite enough > time dealing with cross-branch coding differences already, I'm going to > put my foot down about introducing a useless one. > > Perhaps it would be better to think of this as "they're all lightweight > locks, but some have different locking policies". Or "we're taking a > different type of lock on this particular lock" --- that would match up > rather better with the way we think about heavyweight locks. I struggled a lot with how to name this, and I'm not going to pretend that what I came up with is necessarily ideal. But the basic idea here is that all FlexLocks share the following properties in common: - they are identified by a FlexLockId - they are released by FlexLockReleaseAll - they make use of the lwlock-related fields (renamed in the patch) in PGPROC for sleep and wakeup handling - they have a type indicator, a mutex, a retry flag, and a wait queue But the following things are different per-type: - acquire, conditional acquire (if any), and release functions - available lock modes - additional data fields that are part of the lock Now, it seemed to me that if I was going to split the LWLock facililty into two layers, either the upper layer could be LWLocks, or the lower layer could be LWLocks, but they couldn't both be LWLocks. Since we use LWLockAcquire() and LWLockRelease() all over the place but only make reference to LWLockId in comparatively few places, it seemed to me to be by far the less invasive renaming to make the upper layer be LWLocks and the lower layer be something else. Now maybe there is some better way to do this, but at the moment, I'm not seeing it. If we call them all LWLocks, but only some of them support LWLockAcquire(), then that's going to be pretty weird. The situation is not really analagous to heavy-weight locks, where every lock supports every lock mode, but in practice only table locks make use of them all. In this particular case, we do not want to clutter up the vanilla LWLock implementation with a series of special cases that are only useful for a minority of locks in the system. That will cause them to stop being lightweight, which misses the point; and it will be ugly as hell, because the exact frammishes needed will doubtless vary from one lock to another, and having just one lock type that supports every single one of those frammishes is certainly a bad idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Nov 16, 2011 at 3:41 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Now, you're always going to use > LWLockAcquire() and LWLockRelease() to acquire and release an LWLock, > but a FlexLockId isn't guaranteed to be an LWLockId - any given value > might also refer to a FlexLock of some other type. If we let everyone > continue to refer to those things as LWLockIds, then it seems like > only a matter of time before someone has a variable that's declared as > LWLockId but actually doesn't refer to an LWLock at all. I think it's > better to bite the bullet and do the renaming up front, rather than > having to think about it every time you modify some code that uses > LWLockId or LWLockHeldByMe and say to yourself, "oh, wait a minute, is > this really guaranteed to be an LWLock?" But that's an advantage to having a distinct API layer for LWLock instead of having callers directly call FlexLock methods. The LWLock macros can AssertMacro that the lockid they were passed are actually LWLocks and not some other type of lock. That would require assigning FlexLockIds that are recognizably LWLocks but that's not implausible is it? -- greg
Robert Haas <robertmhaas@gmail.com> wrote: > Now maybe there is some better way to do this, but at the moment, > I'm not seeing it. If we call them all LWLocks, but only some of > them support LWLockAcquire(), then that's going to be pretty > weird. Is there any way to typedef our way out of it, such that a LWLock *is a* FlexLock, but a FlexLock isn't a LWLock? If we could do that, you couldn't use just a plain old FlexLock in LWLockAcquire(), but you could do the cleanups, etc., that you want. -Kevin
On Wed, Nov 16, 2011 at 11:14 AM, Greg Stark <stark@mit.edu> wrote: > On Wed, Nov 16, 2011 at 3:41 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> Now, you're always going to use >> LWLockAcquire() and LWLockRelease() to acquire and release an LWLock, >> but a FlexLockId isn't guaranteed to be an LWLockId - any given value >> might also refer to a FlexLock of some other type. If we let everyone >> continue to refer to those things as LWLockIds, then it seems like >> only a matter of time before someone has a variable that's declared as >> LWLockId but actually doesn't refer to an LWLock at all. I think it's >> better to bite the bullet and do the renaming up front, rather than >> having to think about it every time you modify some code that uses >> LWLockId or LWLockHeldByMe and say to yourself, "oh, wait a minute, is >> this really guaranteed to be an LWLock?" > > But that's an advantage to having a distinct API layer for LWLock > instead of having callers directly call FlexLock methods. The LWLock > macros can AssertMacro that the lockid they were passed are actually > LWLocks and not some other type of lock. That would require assigning > FlexLockIds that are recognizably LWLocks but that's not implausible > is it? Well, that works for the most part. You still need a few generic functions, like FlexLockReleaseAll(), which releases all FlexLocks of all types, not just those of some particular type. And it doesn't solve the problem with FlexLockId, which can potentially refer to a FlexLock of any type, not just a LWLock. I think we might be getting slightly more excited about this problem than it actually deserves. Excluding lwlock.{c,h}, the new files added by this patch, and the documentation changes, this patch adds 103 lines and removes 101. We can uncontroversially reduce each numbers by 14 by adding a separate LWLockHeldByMe() function that does the same thing as FlexLockHeldByMe() but also asserts the lock type. That would leave us adding 89 lines of code and removing 87. If we (against my better judgement) take the position that we must continue to use LWLockId rather than FlexLockId as the type name in any place that only treats with LWLocks we could reduce each of those numbers by an additional 34, giving new totals of 55 and 53 lines of changes outside the lwlock/flexlock code itself rather than 89 and 87.I humbly submit that this is not really enough to getexcited about. We've make far more sweeping notational changes than that more than once - even, I think, with some regularity. This may seem invasive because it's touching LWLocks, and we use those everywhere, but in practice the code footprint is very small because typical usage is just LWLockAcquire(BlahLock) and then LWLockRelease(BlahLock). And I'm not proposing to change that usage in any way; avoiding any change in that area was, in fact, one of my main design goals. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Nov 16, 2011 at 11:17 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > >> Now maybe there is some better way to do this, but at the moment, >> I'm not seeing it. If we call them all LWLocks, but only some of >> them support LWLockAcquire(), then that's going to be pretty >> weird. > > Is there any way to typedef our way out of it, such that a LWLock > *is a* FlexLock, but a FlexLock isn't a LWLock? If we could do > that, you couldn't use just a plain old FlexLock in LWLockAcquire(), > but you could do the cleanups, etc., that you want. Well, if we just say: typedef FlexLockId LWLockId; ...that's about equivalent to the #define from the compiler's point of view. We could alternatively change one or the other of them to be a struct with one member, but I think the cure might be worse than the disease. By my count, we are talking about saving perhaps as many as 34 lines of code changes here, and that's only if complicating the type handling doesn't require any changes to places that are untouched at present, which I suspect it would. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >> Is there any way to typedef our way out of it [?] > Well, if we just say: > > typedef FlexLockId LWLockId; > > ...that's about equivalent to the #define from the compiler's > point of view. Bummer -- I was hoping there was some equivalent to "subclassing" that I just didn't know about. :-( > We could alternatively change one or the other of them to be a > struct with one member, but I think the cure might be worse than > the disease. By my count, we are talking about saving perhaps as > many as 34 lines of code changes here, and that's only if > complicating the type handling doesn't require any changes to > places that are untouched at present, which I suspect it would. So I stepped through all the changes of this type, and I notice that most of them are in areas where we've talked about likely benefits of creating new FlexLock variants instead of staying with LWLocks; if any of that is done (as seems likely), it further reduces the impact from 34 lines. If we take care of LWLockHeldByMe() as you describe, I'll concede the FlexLockId changes. -Kevin
On Wed, Nov 16, 2011 at 12:25 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >> We could alternatively change one or the other of them to be a >> struct with one member, but I think the cure might be worse than >> the disease. By my count, we are talking about saving perhaps as >> many as 34 lines of code changes here, and that's only if >> complicating the type handling doesn't require any changes to >> places that are untouched at present, which I suspect it would. > > So I stepped through all the changes of this type, and I notice that > most of them are in areas where we've talked about likely benefits > of creating new FlexLock variants instead of staying with LWLocks; > if any of that is done (as seems likely), it further reduces the > impact from 34 lines. If we take care of LWLockHeldByMe() as you > describe, I'll concede the FlexLockId changes. Updated patches attached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On Tue, Nov 15, 2011 at 7:20 PM, Robert Haas <robertmhaas@gmail.com> wrote: > The lower layer I called "FlexLocks", > and it's designed to allow a variety of locking implementations to be > built on top of it and reuse as much of the basic infrastructure as I > could figure out how to make reusable without hurting performance too > much. LWLocks become the anchor client of the FlexLock system; in > essence, most of flexlock.c is code that was removed from lwlock.c. > The second patch, procarraylock.c, uses that infrastructure to define > a new type of FlexLock specifically for ProcArrayLock. It basically > works like a regular LWLock, except that it has a special operation to > optimize ProcArrayEndTransaction(). In the uncontended case, instead > of acquiring and releasing the lock, it just grabs the lock, observes > that there is no contention, clears the critical PGPROC fields (which > isn't noticeably slower than updating the state of the lock would be) > and releases the spin lock. (Robert, we already discussed this a bit privately, so apologies for duplicating this here) Another idea is to have some sort of shared work queue mechanism which might turn out to be more manageable and extendable. What I am thinking about is having a {Request, Response} kind of structure per backend in shared memory. An obvious place to hold them is in PGPROC for every backend. We the have a new API like LWLockExecute(lock, mode, ReqRes). The caller first initializes the ReqRes structure with the work it needs get done and then calls LWLockExecute with that. IOW, the code flow would look like this: <Initialize the Req/Res structure with request type and input data> LWLockExecute(lock, mode, ReqRes) <Consume Response and proceed further> If the lock is available in the desired mode, LWLockExecute() will internally finish the work and return immediately. If the lock is contended, the process would sleep. When current holder of the lock finishes its work and calls LWLockRelease() to release the lock, it would not only find the processes to wake up, but would also go through their pending work items and complete them before waking them up. The Response area will be populated with the result. I think this general mechanism will be useful for many users of LWLock, especially those who do very trivial updates/reads from the shared area, but still need synchronization. One example that Robert has already found helping a lot if ProcArrayEndTransaction. Also, even though both shared and exclusive waiters can use this mechanism, it may make more sense to the exclusive waiters because of the exclusivity. For sake of simplicity, we can choose to force a semantics that when LWLockExecute returns, the work is guaranteed to be done, either by self or some other backend. That will keep the code simpler for users of this new API. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
On Wed, Nov 16, 2011 at 11:16 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > On Tue, Nov 15, 2011 at 7:20 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> The lower layer I called "FlexLocks", >> and it's designed to allow a variety of locking implementations to be >> built on top of it and reuse as much of the basic infrastructure as I >> could figure out how to make reusable without hurting performance too >> much. LWLocks become the anchor client of the FlexLock system; in >> essence, most of flexlock.c is code that was removed from lwlock.c. >> The second patch, procarraylock.c, uses that infrastructure to define >> a new type of FlexLock specifically for ProcArrayLock. It basically >> works like a regular LWLock, except that it has a special operation to >> optimize ProcArrayEndTransaction(). In the uncontended case, instead >> of acquiring and releasing the lock, it just grabs the lock, observes >> that there is no contention, clears the critical PGPROC fields (which >> isn't noticeably slower than updating the state of the lock would be) >> and releases the spin lock. > > (Robert, we already discussed this a bit privately, so apologies for > duplicating this here) > > Another idea is to have some sort of shared work queue mechanism which > might turn out to be more manageable and extendable. What I am > thinking about is having a {Request, Response} kind of structure per > backend in shared memory. An obvious place to hold them is in PGPROC > for every backend. We the have a new API like LWLockExecute(lock, > mode, ReqRes). The caller first initializes the ReqRes structure with > the work it needs get done and then calls LWLockExecute with that. > IOW, the code flow would look like this: > > <Initialize the Req/Res structure with request type and input data> > LWLockExecute(lock, mode, ReqRes) > <Consume Response and proceed further> > > If the lock is available in the desired mode, LWLockExecute() will > internally finish the work and return immediately. If the lock is > contended, the process would sleep. When current holder of the lock > finishes its work and calls LWLockRelease() to release the lock, it > would not only find the processes to wake up, but would also go > through their pending work items and complete them before waking them > up. The Response area will be populated with the result. > > I think this general mechanism will be useful for many users of > LWLock, especially those who do very trivial updates/reads from the > shared area, but still need synchronization. One example that Robert > has already found helping a lot if ProcArrayEndTransaction. Also, even > though both shared and exclusive waiters can use this mechanism, it > may make more sense to the exclusive waiters because of the > exclusivity. For sake of simplicity, we can choose to force a > semantics that when LWLockExecute returns, the work is guaranteed to > be done, either by self or some other backend. That will keep the code > simpler for users of this new API. I am not convinced that that's a better API. I mean, consider something like this: /* * OK, let's do it. First let other backends know I'm in ANALYZE. */ LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); MyProc->vacuumFlags |= PROC_IN_ANALYZE; LWLockRelease(ProcArrayLock); I'm not sure exactly how you'd proposed to rewrite that, but I think it's almost guaranteed to be more than three lines of code. Also, you can't assume that the "work" can be done equally well by any backend. In this case it could, because the PGPROC structures are all in shared memory, but that won't work for something like GetSnapshotData(), which needs to copy a nontrivial amount of data into backend-local memory. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Nov 17, 2011 at 10:01 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > I am not convinced that that's a better API. I mean, consider > something like this: > > /* > * OK, let's do it. First let other backends know I'm in ANALYZE. > */ > LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); > MyProc->vacuumFlags |= PROC_IN_ANALYZE; > LWLockRelease(ProcArrayLock); > I'm not sure exactly how you'd proposed to rewrite that, but I think > it's almost guaranteed to be more than three lines of code. I would guess the ReqRes will look something like this where ReqResRequest/Response would probably be union of all various requests and responses, one for each type of request: struct ReqRes { ReqResRequestType reqtype; ReqResRequest req; ReqResResponse res; } The code above can be rewritten as: reqRes.reqtype = RR_PROC_SET_VACUUMFLAGS; reqRes.req.set_vacuumflags.flags = PROC_IN_ANALYZE; LWLockExecute(ProcArrayLock, LW_EXCLUSIVE, &reqRes); I mean, I agree it doesn't look exactly elegant and the number of requests types and their handling may go up a lot, but we need to do this only for those heavily contended locks. Other callers can continue with the current code style. But with this general infrastructure, there will be still be a way to do this. > Also, you > can't assume that the "work" can be done equally well by any backend. > In this case it could, because the PGPROC structures are all in shared > memory, but that won't work for something like GetSnapshotData(), > which needs to copy a nontrivial amount of data into backend-local > memory. Yeah, I am not suggesting we should do (even though I think it should be possible with appropriate input/output data) this everywhere. But places where this can done, like end-transaction stuff, the infrastructure might be quite useful. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
On Thu, Nov 17, 2011 at 10:19 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > On Thu, Nov 17, 2011 at 10:01 AM, Robert Haas <robertmhaas@gmail.com> wrote: > >> >> I am not convinced that that's a better API. I mean, consider >> something like this: >> >> /* >> * OK, let's do it. First let other backends know I'm in ANALYZE. >> */ >> LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); >> MyProc->vacuumFlags |= PROC_IN_ANALYZE; >> LWLockRelease(ProcArrayLock); > >> I'm not sure exactly how you'd proposed to rewrite that, but I think >> it's almost guaranteed to be more than three lines of code. > > I would guess the ReqRes will look something like this where > ReqResRequest/Response would probably be union of all various requests > and responses, one for each type of request: > > struct ReqRes { > ReqResRequestType reqtype; > ReqResRequest req; > ReqResResponse res; > } > > The code above can be rewritten as: > > reqRes.reqtype = RR_PROC_SET_VACUUMFLAGS; > reqRes.req.set_vacuumflags.flags = PROC_IN_ANALYZE; > LWLockExecute(ProcArrayLock, LW_EXCLUSIVE, &reqRes); > My apologies for hijacking the thread, but the work seems quite related, so I thought I should post here instead of starting a new thread. Here is a WIP patch based on the idea of having a shared Q. A process trying to access the shared memory protected by a LWLock, sets up the task in its PGPROC and calls a new API LWLockExecute(). If the LWLock is available, the task is performed immediately and the function returns. Otherwise, the process queues up itself on the lock. When the last shared lock holder or the exclusive lock holder call LWLockRelease(), it scans through such pending tasks, executes them via a callback mechanism and wakes all those processes along with any other normal waiter(s) waiting on LWLockAcquire(). I have only coded for ProcArrayEndTransaction, but it should fairly easy to extend the usage at some more places, especially those which does some simple modifications to the protected area. I don't propose to use the technique for every user of LWLock, but there can be some obvious candidates, including this one that Robert found out. I see 35-40% improvement for 32-80 clients on a 5 minutes pgbench -N run with scale factor of 100 and permanent tables. This is on a 32-core HP IA box. There are few things that need some deliberations. The pending tasks are right now executed while holding the mutex (spinlock). This is good and bad for obvious reasons. We can possibly change that so that the work is done without holding the spinlock or leave to the caller to choose the behavior. Doing it without holding the spinlock will make the technique interesting for many more callers. We can also rework the task execution so that pending similar requests from multiple callers can be combined and executed with a single callback, if the caller knows its safe to do so. I haven't thought through the API/callback changes to support that, but its definitely possible and could be quite useful in many cases. For example, status of many transactions can be checked with a single lookup of the ProcArray. Or WAL inserts from multiple processes can be combined and written at once. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Вложения
On Fri, Nov 18, 2011 at 6:26 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > My apologies for hijacking the thread, but the work seems quite > related, so I thought I should post here instead of starting a new > thread. > > Here is a WIP patch based on the idea of having a shared Q. A process > trying to access the shared memory protected by a LWLock, sets up the > task in its PGPROC and calls a new API LWLockExecute(). If the LWLock > is available, the task is performed immediately and the function > returns. Otherwise, the process queues up itself on the lock. When the > last shared lock holder or the exclusive lock holder call > LWLockRelease(), it scans through such pending tasks, executes them > via a callback mechanism and wakes all those processes along with any > other normal waiter(s) waiting on LWLockAcquire(). > > I have only coded for ProcArrayEndTransaction, but it should fairly > easy to extend the usage at some more places, especially those which > does some simple modifications to the protected area. I don't propose > to use the technique for every user of LWLock, but there can be some > obvious candidates, including this one that Robert found out. > > I see 35-40% improvement for 32-80 clients on a 5 minutes pgbench -N > run with scale factor of 100 and permanent tables. This is on a > 32-core HP IA box. > > There are few things that need some deliberations. The pending tasks > are right now executed while holding the mutex (spinlock). This is > good and bad for obvious reasons. We can possibly change that so that > the work is done without holding the spinlock or leave to the caller > to choose the behavior. Doing it without holding the spinlock will > make the technique interesting for many more callers. We can also > rework the task execution so that pending similar requests from > multiple callers can be combined and executed with a single callback, > if the caller knows its safe to do so. I haven't thought through the > API/callback changes to support that, but its definitely possible and > could be quite useful in many cases. For example, status of many > transactions can be checked with a single lookup of the ProcArray. Or > WAL inserts from multiple processes can be combined and written at > once. So the upside and downside of this approach is that it modifies the existing LWLock implementation rather than allowing multiple lock implementations to exist side-by-side. That means every LWLock in the system has access to this functionality, which might be convenient if there turn out to be many uses for this technique. The bad news is that everyone pays the cost of checking the work queue in LWLockRelease(). It also means that you can't, for example, create a custom lock with different lock modes (e.g. S, SX, X, as I proposed upthread). I am pretty dubious that there are going to be very many cases where we can get away with holding the spinlock while doing the work. For example, WAL flush is a clear example of where we can optimize away spinlock acquisitions - if we communicate to people we wake up that their LSN is already flushed, they needn't reacquire the lock to check. But we certainly can't hold a spinlock across a WAL flush. The nice thing about the FlexLock approach is that it permits fine-grained control over these types of policies: one lock type can switch to exclusive mode, do the work, and then reacquire the spinlock to hand off the baton; another can do the work while holding the spinlock; and still a third can forget about work queues altogether but introduce additional lock modes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Nov 18, 2011 at 10:29 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > So the upside and downside of this approach is that it modifies the > existing LWLock implementation rather than allowing multiple lock > implementations to exist side-by-side. That means every LWLock in the > system has access to this functionality, which might be convenient if > there turn out to be many uses for this technique. Right. > The bad news is > that everyone pays the cost of checking the work queue in > LWLockRelease(). I hope that would be minimal (may be just one instruction) for those who don't want to use the facility. > It also means that you can't, for example, create a > custom lock with different lock modes (e.g. S, SX, X, as I proposed > upthread). > Thats a valid point. > I am pretty dubious that there are going to be very many cases where > we can get away with holding the spinlock while doing the work. For > example, WAL flush is a clear example of where we can optimize away > spinlock acquisitions - if we communicate to people we wake up that > their LSN is already flushed, they needn't reacquire the lock to > check. But we certainly can't hold a spinlock across a WAL flush. > I think thats mostly solvable as said upthread. We can and should improve this mechanism so that the work is carried out with the necessary LWLock instead of the spinlock. That would let other processes to queue up for the lock while the tasks are being executed. Or if the tasks only need shared lock, then other normal shared requesters can go ahead and acquire the lock. When I get some time, I would like to see if this can be extended to have shared snapshots so that multiple callers of GetSnapshotData() get the same snapshot, computed only once by scanning the proc array, instead of having each process compute its own snapshot which remains the same unless some transaction ends in between. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Robert Haas wrote: > On Wed, Nov 16, 2011 at 12:25 PM, Kevin Grittner > <Kevin.Grittner@wicourts.gov> wrote: > >> We could alternatively change one or the other of them to be a > >> struct with one member, but I think the cure might be worse than > >> the disease. ?By my count, we are talking about saving perhaps as > >> many as 34 lines of code changes here, and that's only if > >> complicating the type handling doesn't require any changes to > >> places that are untouched at present, which I suspect it would. > > > > So I stepped through all the changes of this type, and I notice that > > most of them are in areas where we've talked about likely benefits > > of creating new FlexLock variants instead of staying with LWLocks; > > if any of that is done (as seems likely), it further reduces the > > impact from 34 lines. ?If we take care of LWLockHeldByMe() as you > > describe, I'll concede the FlexLockId changes. > > Updated patches attached. It would be helpful if the patch included some text about how flexilocks are different from ordinary lwlocks. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Robert Haas <robertmhaas@gmail.com> wrote: > Updated patches attached. I've gotten through several days of performance tests for this pair of related patches, with results posted on a separate thread. I'll link those in to the CF application shortly. To summarize the other (fairly long) thread on benchmarks, it seemed like there might be a very slight slowdown at low concurrency, but it could be the random alignment of code with and without the patch; it was a small enough fraction of a percent to be negligible, in my opinion. At higher concurrency levels the patch showed significant performance improvements. Besides the overall improvement in the median tps numbers of several percent, there was significant mitigation of the "performance collapse" phenomenon, where some runs were much slower than others. It seems a clear win in terms of performance. I've gotten through code review of the flexlock-v2.patch, and have decided to post on that before I go through the procarraylock-v1.patch code. Not surprisingly, this patch was in good form and applied cleanly. There were doc changes, and I can't see where any changes to the tests are required. I liked the structure, and only found a few nit-picky things to point out: I didn't see why num_held_flexlocks and held_flexlocks had the static keyword removed from their declarations. FlexLockRemember() seems to have a pasto for a comment. Maybe change to something like: "Add lock to list of locks held by this backend." In procarraylock.c there is this: /* If there are no lockers, clar the critical PGPROC fields. */ s/clar/clear/ I have to admit I don't have my head around the extraWaits issue, so I can't personally vouch for that code, although I have no reason to doubt it, either. Everything else was something that I at least *think* I understand, and it looked good to me. I'm not changing the status until I get through the other patch file, which should be tomorrow. -Kevin
"Kevin Grittner" wrote: > Robert Haas wrote: >> Updated patches attached. > > I have to admit I don't have my head around the extraWaits issue, > so I can't personally vouch for that code, although I have no > reason to doubt it, either. Everything else was something that I at > least *think* I understand, and it looked good to me. > > I'm not changing the status until I get through the other patch > file, which should be tomorrow. Most of the procarraylock-v1.patch file was pretty straightforward, but I have a few concerns. Why is it OK to drop these lines from the else condition in ProcArrayEndTransaction()?: /* must be cleared with xid/xmin: */ proc->vacuumFlags &= ~PROC_VACUUM_STATE_MASK; The extraWaits code still looks like black magic to me, so unless someone can point me in the right direction to really understand that, I can't address whether it's OK. The need to modify flexlock_internals.h and flexlock.c seems to me to indicate a lack of desirable modularity here. The lower level object type shouldn't need to know about each and every implementation of a higher level type which uses it, particularly not compiled in like that. It would be really nice if each of the higher level types "registered" with flexlock at runtime, so that the areas modified at the flexlock level in this patch file could be generic. Among other things, this could allow extensions to use specialized types, which seems possibly useful. Does that (or some other technique to address the concern) seem feasible? -Kevin
On Wed, Nov 23, 2011 at 7:18 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Why is it OK to drop these lines from the else condition in > ProcArrayEndTransaction()?: > > /* must be cleared with xid/xmin: */ > proc->vacuumFlags &= ~PROC_VACUUM_STATE_MASK; It's probably not. Oops. I believe the attached patch versions address your comments regarding the flexlock patch as well; it is also rebased over the PGXACT patch, which has since been committed. > The extraWaits code still looks like black magic to me, so unless > someone can point me in the right direction to really understand > that, I can't address whether it's OK. I don't think I've changed the behavior, so it should be fine. The idea is that something like this can happen: 1. Backend #1 does some action which will eventually cause some other process to send it a wakeup (like adding itself to the wait-queue for a heavyweight lock). 2. Before actually going to sleep, backend #1 tries to acquire an LWLock. The LWLock is not immediately available, so it sleeps on its process semaphore. 3. Backend #2 sees the shared memory state created in step one and decides sends a wakeup to backend #1 (for example, because the lock is now available). 4. Backend #1 receives the wakeup. It duly reacquires the spinlock protecting the LWLock, sees that the LWLock is not available, releases the spinlock, and goes back to sleep. 5. Backend #3 now releases the LWLock that backend #1 is trying to acquire and, as backend #1 is first in line, it sends backend #1 a wakeup. 6. Backend #1 now wakes up again, reacquires the spinlock, gets the lwlock, releases the spinlock, does some stuff, and releases the lwlock. 7. Backend #1, having now finished what it needed to do while holding the lwlock, is ready to go to sleep and wait for the event that it queued up for back in step #1. However, the wakeup for that event *has already arrived* and was consumed by the LWLock machinery. So when backend #1 goes to sleep, it's waiting for a wakeup that will never arrive, because it already did arrive, and got eaten. The solution is the "extraWaits" thing; in step #6, we remember that we received an extra, useless wakeup in step #4 that we threw away. To make up for having thrown away a wakeup someone else sent us in step #3, we send ourselves a wakeup in step #6. That way, when we go to sleep in step #7, we wake up immediately, just as we should. > The need to modify flexlock_internals.h and flexlock.c seems to me to > indicate a lack of desirable modularity here. The lower level object > type shouldn't need to know about each and every implementation of a > higher level type which uses it, particularly not compiled in like > that. It would be really nice if each of the higher level types > "registered" with flexlock at runtime, so that the areas modified at > the flexlock level in this patch file could be generic. Among other > things, this could allow extensions to use specialized types, which > seems possibly useful. Does that (or some other technique to address > the concern) seem feasible? Possibly; let me think about that. I haven't addressed that in this version. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Robert Haas <robertmhaas@gmail.com> wrote: > Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >> Why is it OK to drop these lines from the else condition in >> ProcArrayEndTransaction()?: >> >> /* must be cleared with xid/xmin: */ >> proc->vacuumFlags &= ~PROC_VACUUM_STATE_MASK; > > It's probably not. Oops. OK. I see that's back now. > I believe the attached patch versions address your comments > regarding the flexlock patch as well; it is also rebased over the > PGXACT patch, which has since been committed. Applies cleanly again. >> The extraWaits code still looks like black magic to me, so unless >> someone can point me in the right direction to really understand >> that, I can't address whether it's OK. > > I don't think I've changed the behavior, so it should be fine. > The idea is that something like this can happen: > > [explanation of the extraWaits behavior] Thanks. I'll spend some time reviewing this part. There is some rearrangement of related code, and this should arm me with enough of a grasp to review that. >> [gripes about modularity compromise and lack of pluggability] > let me think about that. I haven't addressed that in this > version. OK. There are a few things I found in this pass which missed in the last. One contrib module was missed, I found another typo in a comment, and I think we can reduce the include files a bit. Rather than describe it, I'm attaching a patch file over the top of your patches with what I think might be a good idea. I don't think there's anything here to merit a new round of benchmarking. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > OK. There are a few things I found in this pass which missed in the > last. One contrib module was missed, I found another typo in a > comment, and I think we can reduce the include files a bit. Rather > than describe it, I'm attaching a patch file over the top of your > patches with what I think might be a good idea. This time with it really attached. -Kevin
Вложения
On Wed, Nov 30, 2011 at 7:01 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > >> OK. There are a few things I found in this pass which missed in the >> last. One contrib module was missed, I found another typo in a >> comment, and I think we can reduce the include files a bit. Rather >> than describe it, I'm attaching a patch file over the top of your >> patches with what I think might be a good idea. > > This time with it really attached. Thanks, I've merged those in. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: >> Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >>> The extraWaits code still looks like black magic to me >> [explanation of the extraWaits behavior] > > Thanks. I'll spend some time reviewing this part. There is some > rearrangement of related code, and this should arm me with enough > of a grasp to review that. I got through that without spotting any significant problems, although I offer the attached micro-optimizations for your consideration. (Applies over the top of your patches.) As far as I'm concerned it looks great and "Ready for Committer" except for the modularity/pluggability question. Perhaps that could be done as a follow-on patch (if deemed a good idea)? -Kevin
Вложения
On Fri, Dec 2, 2011 at 4:11 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > As far as I'm concerned it looks great and "Ready for Committer" > except for the modularity/pluggability question. Perhaps that could > be done as a follow-on patch (if deemed a good idea)? I investigated the performance issues with the previous version of the patch and found that turning some of the FlexLock support functions into macros seems to help quite a bit, so I've done that in the attached versions. I've also incorporated Kevin's incremental patch from his previous version. That having been said, I'm leaning away from applying any of this for the time being. For one thing, Pavan's PGXACT stuff has greatly eroded the benefit of this patch. I'm fairly optimistic about the prospects of finding other good uses for the FlexLock machinery down the road, but I don't feel like that's enough reason to apply it now. Also, there are several other good ideas kicking around out there for further reducing ProcArrayLock contention, some of which are lower-impact than this and others of which would obsolete the entire approach. So it seems like I should probably let the dust settle on those things before deciding whether this even makes sense. In particular, I'm starting to think that resolving the contention between GetSnapshotData() and ProcArrayEndTransaction() is basically a layup at this point, and I really want to go for a three-pointer, namely also eliminating the spinlock contention between different backends all trying to acquire ProcArrayLock in shared mode during read-only operation. So, I'm going to mark this Returned with Feedback for now and keep working on the problem. Thanks for the review and positive comments. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company