Обсуждение: 'tuple concurrently updated' error for alter role ... set
Hello, We have recently observed a problem with concurrent execution of ALTER ROLE SET... in several sessions. It's similar to theone from http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=fbcf4b92aa64d4577bcf25925b055316b978744a Theresult is the 'tuple concurrently updated' error message, and the problem is easily reproducible: CREATE SCHEMA test; CREATE SCHEMA test2; CREATE ROLE testrole; session 1: while [ 1 ]; do psql postgres -c 'alter role testrole set search_path=test2';done session 2: while [ 1 ]; do psql postgres -c 'alter role testrole set search_path=test';done The error message appears almost immediately on my system. After digging in the code I've found that a RowExclusiveLock is acquired on a pg_db_role_setting table in AlterSetting().While the name of the locks suggests that it should conflict with itself, it doesn't. After I've replacedthe lock in question with ShareUpdateExclusiveLock, the problem disappeared. Attached is the simple patch with thesechanges. Regards, -- Alexey Klyukin The PostgreSQL Company - Command Prompt, Inc.
Вложения
Alexey Klyukin <alexk@commandprompt.com> writes: > After digging in the code I've found that a RowExclusiveLock is acquired on a pg_db_role_setting table in AlterSetting().While the name of the locks suggests that it should conflict with itself, it doesn't. After I've replacedthe lock in question with ShareUpdateExclusiveLock, the problem disappeared. Attached is the simple patch with thesechanges. We're not likely to do that, first because it's randomly different from the handling of every other system catalog update, and second because it would serialize all updates on this catalog, and probably create deadlock cases that don't exist now. (BTW, as the patch is given I'd expect it to still fail, though perhaps with lower probability than before. For this to actually stop all such cases, you'd have to hold the lock till commit, which greatly increases the risks of deadlock.) I see no particular reason why conflicting updates like those *shouldn't* be expected to fail occasionally. regards, tom lane
On Thu, May 12, 2011 at 6:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alexey Klyukin <alexk@commandprompt.com> writes: >> After digging in the code I've found that a RowExclusiveLock is acquired on a pg_db_role_setting table in AlterSetting().While the name of the locks suggests that it should conflict with itself, it doesn't. After I've replacedthe lock in question with ShareUpdateExclusiveLock, the problem disappeared. Attached is the simple patch with thesechanges. > > We're not likely to do that, first because it's randomly different from > the handling of every other system catalog update, We have very robust locking of this type for table-related DDL operations and just about none for anything else. I don't consider the latter to be a feature. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, May 12, 2011 at 6:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> We're not likely to do that, first because it's randomly different from >> the handling of every other system catalog update, > We have very robust locking of this type for table-related DDL > operations and just about none for anything else. I don't consider > the latter to be a feature. I didn't say it was ;-). What I *am* saying is that if we're going to do anything about this sort of problem, there needs to be a well-considered system-wide plan. Arbitrarily changing the locking rules for individual operations is not going to make things better, and taking exclusive locks on whole catalogs is definitely not going to make things better. regards, tom lane
On May 13, 2011, at 1:28 AM, Tom Lane wrote: > Alexey Klyukin <alexk@commandprompt.com> writes: >> After digging in the code I've found that a RowExclusiveLock is acquired on a pg_db_role_setting table in AlterSetting().While the name of the locks suggests that it should conflict with itself, it doesn't. After I've replacedthe lock in question with ShareUpdateExclusiveLock, the problem disappeared. Attached is the simple patch with thesechanges. > > We're not likely to do that, first because it's randomly different from > the handling of every other system catalog update, and second because it > would serialize all updates on this catalog, and probably create > deadlock cases that don't exist now. (BTW, as the patch is given I'd > expect it to still fail, though perhaps with lower probability than > before. For this to actually stop all such cases, you'd have to hold > the lock till commit, which greatly increases the risks of deadlock.) Fair enough. I think the AlterSetting holds the lock till commit (it does heap_close with NoLock). The DropSetting doesn't do this though. > > I see no particular reason why conflicting updates like those *shouldn't* > be expected to fail occasionally. Excellent question, I don't have enough context to properly answer that (other than a guess that an unexpected transaction rollback is too unexpected :)) Let me ask the customer first. -- Alexey Klyukin The PostgreSQL Company - Command Prompt, Inc.
On Thu, May 12, 2011 at 6:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, May 12, 2011 at 6:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> We're not likely to do that, first because it's randomly different from >>> the handling of every other system catalog update, > >> We have very robust locking of this type for table-related DDL >> operations and just about none for anything else. I don't consider >> the latter to be a feature. > > I didn't say it was ;-). What I *am* saying is that if we're going to > do anything about this sort of problem, there needs to be a > well-considered system-wide plan. Arbitrarily changing the locking > rules for individual operations is not going to make things better, > and taking exclusive locks on whole catalogs is definitely not going to > make things better. Yes; true. I'm inclined to say that this is a bug, but not one we're going to fix before 9.2. I think it might be about time to get serious about making an effort to sprinkle the code with a few more LockDatbaseObject() and LockSharedObject() calls. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, May 12, 2011 at 6:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I didn't say it was ;-). What I *am* saying is that if we're going to >> do anything about this sort of problem, there needs to be a >> well-considered system-wide plan. Arbitrarily changing the locking >> rules for individual operations is not going to make things better, >> and taking exclusive locks on whole catalogs is definitely not going to >> make things better. > Yes; true. I'm inclined to say that this is a bug, but not one we're > going to fix before 9.2. I think it might be about time to get > serious about making an effort to sprinkle the code with a few more > LockDatbaseObject() and LockSharedObject() calls. Yeah. That doesn't rise to the level of a "well-considered plan", but I believe that we could develop a plan around that concept, ie, take a lock associated with the individual object we are about to operate on. BTW, I thought a bit more about why I didn't like the initial proposal in this thread, and the basic objection is this: the AccessShareLock or RowExclusiveLock we take on the catalog is not meant to provide any serialization of operations on individual objects within the catalog. What it's there for is to interlock against operations that are operating on the catalog as a table, such as VACUUM FULL (which has to lock out all accesses to the catalog) or REINDEX (which has to lock out updates). So the catalog-level lock is the right thing and shouldn't be changed. If we want to interlock updates of individual objects then we need a different locking concept for that. regards, tom lane
On Fri, May 13, 2011 at 12:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > BTW, I thought a bit more about why I didn't like the initial proposal > in this thread, and the basic objection is this: the AccessShareLock or > RowExclusiveLock we take on the catalog is not meant to provide any > serialization of operations on individual objects within the catalog. > What it's there for is to interlock against operations that are > operating on the catalog as a table, such as VACUUM FULL (which has to > lock out all accesses to the catalog) or REINDEX (which has to lock out > updates). So the catalog-level lock is the right thing and shouldn't be > changed. If we want to interlock updates of individual objects then we > need a different locking concept for that. Right, I agree. Fortunately, we don't have to invent a new one. There is already locking being done exactly along these lines for DROP, COMMENT, and SECURITY LABEL (which is important, because otherwise we could leave behind orphaned security labels that would be inherited by a later object with the same OID, leading to a security problem). I think it would be sensible, and quite simple, to extend that to other DDL operations. I think that we probably *don't* want to lock non-table objects when they are just being *used*. We do that for tables (to lock against concurrent drop operations) and in some workloads it becomes a severe bottleneck. Doing it for functions and operators would make the problem far worse, for no particular benefit. Unlike tables, there is no underlying relation file to worry about, so the worst thing that happens is someone continues to use a dropped object slightly after it's gone, or the old definition of an object that's been modified. Actually, it's occurred to me from time to time that it would be nice to eliminate ACCESS SHARE (and while I'm dreaming, maybe ROW SHARE and ROW EXCLUSIVE) locks for tables as well. Under normal operating conditions (i.e. no DDL running), these locks generate a huge amount of lock manager traffic even though none of the locks conflict with each other. Unfortunately, I don't really see a way to make this work. But maybe it would at least be possible to create some sort of fast path. For example, suppose every backend opens a file and uses that file to record lock tags for the objects on which it is taking "weak" (ACCESS SHARE/ROW SHARE/ROW EXCLUSIVE) locks on. Before taking a "strong" lock (anything that conflicts with one of those lock types), the exclusive locker is required to open all of those files and transfer the locks into the lock manager proper. Of course, it's also necessary to nail down the other direction: you have to have some way of making sure that the backend can't record in it's local file a lock that would have conflicted had it been taken in the actual lock manager. But maybe there's some lightweight way we could detect that, as well. For example, we could keep, say, a 1K array in shared memory, representing a 1024-way partitioning of the locktag space. Each byte is 1 if there are any "strong" locks on objects with that locktag in the lock manager, and 0 if there are none (or maybe you need a 4K array with exact counts, for bookkeeping). When a backend wants to take a "weak" lock, it checks the array: if it finds a 0 then it just records the lock in its file; otherwise, it goes through the lock manager. When a backend wants a "strong" lock, it first sets the byte (or bumps the count) in the array, then transfers any existing weak locks from individual backends to the lock manager, then tries to get its own lock. Possibly the array operations could be done with memory synchronization primitives rather than spinlocks, especially on architectures that support an atomic fetch-and-add. Of course I don't know quite how we recover if we try to do one of these "lock transfers" and run out of shared memory... and overall I'm hand-waving here quite a bit, but in theory it seems like we ought to be able to rejigger this locking so that we reduce the cost of obtaining a "weak" lock, perhaps at the expense of making it more expensive to obtain a "strong" lock, which are relatively rare by comparison. <end of rambling digression> -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Is this a TODO? I don't see it on the TODO list. --------------------------------------------------------------------------- Robert Haas wrote: > On Fri, May 13, 2011 at 12:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > BTW, I thought a bit more about why I didn't like the initial proposal > > in this thread, and the basic objection is this: the AccessShareLock or > > RowExclusiveLock we take on the catalog is not meant to provide any > > serialization of operations on individual objects within the catalog. > > What it's there for is to interlock against operations that are > > operating on the catalog as a table, such as VACUUM FULL (which has to > > lock out all accesses to the catalog) or REINDEX (which has to lock out > > updates). ?So the catalog-level lock is the right thing and shouldn't be > > changed. ?If we want to interlock updates of individual objects then we > > need a different locking concept for that. > > Right, I agree. Fortunately, we don't have to invent a new one. > There is already locking being done exactly along these lines for > DROP, COMMENT, and SECURITY LABEL (which is important, because > otherwise we could leave behind orphaned security labels that would be > inherited by a later object with the same OID, leading to a security > problem). I think it would be sensible, and quite simple, to extend > that to other DDL operations. > > I think that we probably *don't* want to lock non-table objects when > they are just being *used*. We do that for tables (to lock against > concurrent drop operations) and in some workloads it becomes a severe > bottleneck. Doing it for functions and operators would make the > problem far worse, for no particular benefit. Unlike tables, there is > no underlying relation file to worry about, so the worst thing that > happens is someone continues to use a dropped object slightly after > it's gone, or the old definition of an object that's been modified. > > Actually, it's occurred to me from time to time that it would be nice > to eliminate ACCESS SHARE (and while I'm dreaming, maybe ROW SHARE and > ROW EXCLUSIVE) locks for tables as well. Under normal operating > conditions (i.e. no DDL running), these locks generate a huge amount > of lock manager traffic even though none of the locks conflict with > each other. Unfortunately, I don't really see a way to make this > work. But maybe it would at least be possible to create some sort of > fast path. For example, suppose every backend opens a file and uses > that file to record lock tags for the objects on which it is taking > "weak" (ACCESS SHARE/ROW SHARE/ROW EXCLUSIVE) locks on. Before taking > a "strong" lock (anything that conflicts with one of those lock > types), the exclusive locker is required to open all of those files > and transfer the locks into the lock manager proper. Of course, it's > also necessary to nail down the other direction: you have to have some > way of making sure that the backend can't record in it's local file a > lock that would have conflicted had it been taken in the actual lock > manager. But maybe there's some lightweight way we could detect that, > as well. For example, we could keep, say, a 1K array in shared > memory, representing a 1024-way partitioning of the locktag space. > Each byte is 1 if there are any "strong" locks on objects with that > locktag in the lock manager, and 0 if there are none (or maybe you > need a 4K array with exact counts, for bookkeeping). When a backend > wants to take a "weak" lock, it checks the array: if it finds a 0 then > it just records the lock in its file; otherwise, it goes through the > lock manager. When a backend wants a "strong" lock, it first sets the > byte (or bumps the count) in the array, then transfers any existing > weak locks from individual backends to the lock manager, then tries to > get its own lock. Possibly the array operations could be done with > memory synchronization primitives rather than spinlocks, especially on > architectures that support an atomic fetch-and-add. Of course I don't > know quite how we recover if we try to do one of these "lock > transfers" and run out of shared memory... and overall I'm hand-waving > here quite a bit, but in theory it seems like we ought to be able to > rejigger this locking so that we reduce the cost of obtaining a "weak" > lock, perhaps at the expense of making it more expensive to obtain a > "strong" lock, which are relatively rare by comparison. > > <end of rambling digression> > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Fri, May 13, 2011 at 09:07:34AM -0400, Robert Haas wrote: > Actually, it's occurred to me from time to time that it would be nice > to eliminate ACCESS SHARE (and while I'm dreaming, maybe ROW SHARE and > ROW EXCLUSIVE) locks for tables as well. Under normal operating > conditions (i.e. no DDL running), these locks generate a huge amount > of lock manager traffic even though none of the locks conflict with > each other. Unfortunately, I don't really see a way to make this > work. But maybe it would at least be possible to create some sort of > fast path. For example, suppose every backend opens a file and uses > that file to record lock tags for the objects on which it is taking > "weak" (ACCESS SHARE/ROW SHARE/ROW EXCLUSIVE) locks on. Before taking > a "strong" lock (anything that conflicts with one of those lock > types), the exclusive locker is required to open all of those files > and transfer the locks into the lock manager proper. Of course, it's > also necessary to nail down the other direction: you have to have some > way of making sure that the backend can't record in it's local file a > lock that would have conflicted had it been taken in the actual lock > manager. But maybe there's some lightweight way we could detect that, > as well. For example, we could keep, say, a 1K array in shared > memory, representing a 1024-way partitioning of the locktag space. > Each byte is 1 if there are any "strong" locks on objects with that > locktag in the lock manager, and 0 if there are none (or maybe you > need a 4K array with exact counts, for bookkeeping). When a backend > wants to take a "weak" lock, it checks the array: if it finds a 0 then > it just records the lock in its file; otherwise, it goes through the > lock manager. When a backend wants a "strong" lock, it first sets the > byte (or bumps the count) in the array, then transfers any existing > weak locks from individual backends to the lock manager, then tries to > get its own lock. Possibly the array operations could be done with > memory synchronization primitives rather than spinlocks, especially on > architectures that support an atomic fetch-and-add. Of course I don't > know quite how we recover if we try to do one of these "lock > transfers" and run out of shared memory... and overall I'm hand-waving > here quite a bit, but in theory it seems like we ought to be able to > rejigger this locking so that we reduce the cost of obtaining a "weak" > lock, perhaps at the expense of making it more expensive to obtain a > "strong" lock, which are relatively rare by comparison. > > <end of rambling digression> The key is putting a rapid hard stop to all fast-path lock acquisitions and then reconstructing a valid global picture of the affected lock table regions. Your 1024-way table of strong lock counts sounds promising. (Offhand, I do think they would need to be counts, not just flags.) If I'm understanding correctly, your pseudocode would look roughly like this: if (level >= ShareUpdateExclusiveLock) ++strong_lock_counts[my_strong_lock_count_partition] sfence if (strong_lock_counts[my_strong_lock_count_partition]== 1) /* marker 1 */ import_all_local_locks normal_LockAcquireExelseif (level <= RowExclusiveLock) lfence if (strong_lock_counts[my_strong_lock_count_partition]== 0) /* marker 2 */ local_only /* marker 3 */ else normal_LockAcquireExelse normal_LockAcquireEx At marker 1, we need to block until no code is running between markers two and three. You could do that with a per-backend lock (LW_SHARED by the strong locker, LW_EXCLUSIVE by the backend). That would probably still be a win over the current situation, but it would be nice to have something even cheaper. Then you have the actual procedure for transfer of local locks to the global lock manager. Using record locks in a file could work, but that's a system call per lock acquisition regardless of the presence of strong locks. Is that cost sufficiently trivial? I wonder if, instead, we could signal all backends at marker 1 to dump the applicable parts of their local (memory) lock tables to files. Or to another shared memory region, if that didn't mean statically allocating the largest possible required amount. If we were willing to wait until all backends reach a CHECK_FOR_INTERRUPTS, they could instead make the global insertions directly. That might yield a decent amount of bug swatting to fill in missing CHECK_FOR_INTERRUPTS, though. Improvements in this area would also have good synergy with efforts to reduce the global impact of temporary table usage. CREATE TEMP TABLE can be the major source of AccessExclusiveLock acquisitions. However, with the strong lock indicator partitioned 1024 ways or more, that shouldn't be a deal killer. nm
On May 13, 2011, at 2:07 AM, Alexey Klyukin wrote: > On May 13, 2011, at 1:28 AM, Tom Lane wrote: > >> >> We're not likely to do that, first because it's randomly different from >> the handling of every other system catalog update, and second because it >> would serialize all updates on this catalog, and probably create >> deadlock cases that don't exist now. (BTW, as the patch is given I'd >> expect it to still fail, though perhaps with lower probability than >> before. For this to actually stop all such cases, you'd have to hold >> the lock till commit, which greatly increases the risks of deadlock.) > .... >> >> I see no particular reason why conflicting updates like those *shouldn't* >> be expected to fail occasionally. > > Excellent question, I don't have enough context to properly answer that (other > than a guess that an unexpected transaction rollback is too unexpected :)) > Let me ask the customer first. The original use case is sporadical failures of some internal unit tests due to the error message in subject. -- Alexey Klyukin The PostgreSQL Company - Command Prompt, Inc.
On Fri, May 13, 2011 at 4:16 PM, Noah Misch <noah@leadboat.com> wrote: > The key is putting a rapid hard stop to all fast-path lock acquisitions and > then reconstructing a valid global picture of the affected lock table regions. > Your 1024-way table of strong lock counts sounds promising. (Offhand, I do > think they would need to be counts, not just flags.) > > If I'm understanding correctly, your pseudocode would look roughly like this: > > if (level >= ShareUpdateExclusiveLock) > ++strong_lock_counts[my_strong_lock_count_partition] > sfence > if (strong_lock_counts[my_strong_lock_count_partition] == 1) > /* marker 1 */ > import_all_local_locks > normal_LockAcquireEx > else if (level <= RowExclusiveLock) > lfence > if (strong_lock_counts[my_strong_lock_count_partition] == 0) > /* marker 2 */ > local_only > /* marker 3 */ > else > normal_LockAcquireEx > else > normal_LockAcquireEx I think ShareUpdateExclusiveLock should be treated as neither weak nor strong. It certainly can't be treated as weak - i.e. use the fast path - because it's self-conflicting. It could be treated as strong, but since it doesn't conflict with any of the weak lock types, that would only serve to prevent fast-path lock acquisitions that otherwise could have succeeded. In particular, it would unnecessarily disable fast-path lock acquisition for any relation being vacuumed, which could be really ugly considering that one of the main workloads that would benefit from something like this is the case where lots of backends are fighting over a lock manager partition lock on a table they all want to run read and/or modify. I think it's best for ShareUpdateExclusiveLock to always use the regular lock-acquisition path, but it need not worry about incrementing strong_lock_counts[] or importing local locks in so doing. Also, I think in the step just after marker one, we'd only import only local locks whose lock tags were equal to the lock tag on which we were attempting to acquire a strong lock. The downside of this whole approach is that acquiring a strong lock becomes, at least potentially, a lot slower, because you have to scan through the whole backend array looking for fast-path locks to import (let's not use the term "local lock", which is already in use within the lock manager code). But maybe that can be optimized enough not to matter. After all, if the lock manager scaled perfectly at high concurrency, we wouldn't be thinking about this in the first place. > At marker 1, we need to block until no code is running between markers two and > three. You could do that with a per-backend lock (LW_SHARED by the strong > locker, LW_EXCLUSIVE by the backend). That would probably still be a win over > the current situation, but it would be nice to have something even cheaper. I don't have a better idea than to use an LWLock. I have a patch floating around to speed up our LWLock implementation, but I haven't got a workload where the bottleneck is the actual speed of operation of the LWLock rather than the fact that it's contended in the first place. And the whole point of this would be to arrange things so that the LWLocks are uncontended nearly all the time. > Then you have the actual procedure for transfer of local locks to the global > lock manager. Using record locks in a file could work, but that's a system call > per lock acquisition regardless of the presence of strong locks. Is that cost > sufficiently trivial? No, I don't think we want to go into kernel space. When I spoke of using a file, I was imagining it as an mmap'd region that one backend could write to which, at need, another backend could mmap and grovel through. Another (perhaps simpler) option would be to just put it in shared memory. That doesn't give you as much flexibility in terms of expanding the segment, but it would be reasonable to allow space for only, dunno, say 32 locks per backend in shared memory; if you need more than that, you flush them all to the main lock table and start over. You could possibly even just make this a hack for the particular special case where we're taking a relation lock on a non-shared relation; then you'd need only 128 bytes for a 32-entry array, plus the LWLock (I think the database ID is already visible in shared memory). > I wonder if, instead, we could signal all backends at > marker 1 to dump the applicable parts of their local (memory) lock tables to > files. Or to another shared memory region, if that didn't mean statically > allocating the largest possible required amount. If we were willing to wait > until all backends reach a CHECK_FOR_INTERRUPTS, they could instead make the > global insertions directly. That might yield a decent amount of bug swatting to > fill in missing CHECK_FOR_INTERRUPTS, though. I've thought about this; I believe it's unworkable. If one backend goes into the tank (think: SIGSTOP, or blocking on I/O to an unreadable disk sector) this could lead to cascading failure. > Improvements in this area would also have good synergy with efforts to reduce > the global impact of temporary table usage. CREATE TEMP TABLE can be the > major source of AccessExclusiveLock acquisitions. However, with the strong > lock indicator partitioned 1024 ways or more, that shouldn't be a deal killer. If that particular case is a problem for you, it seems like optimizing away the exclusive lock altogether might be possible. No other backend should be able to touch the table until the transaction commits anyhow, and at that point we're going to release the lock. There are possibly some sticky wickets here but it seems at least worth thinking about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, May 13, 2011 at 08:55:34PM -0400, Robert Haas wrote: > On Fri, May 13, 2011 at 4:16 PM, Noah Misch <noah@leadboat.com> wrote: > > If I'm understanding correctly, your pseudocode would look roughly like this: > > > > ? ? ? ?if (level >= ShareUpdateExclusiveLock) > I think ShareUpdateExclusiveLock should be treated as neither weak nor > strong. Indeed; that should be ShareLock. > It certainly can't be treated as weak - i.e. use the fast > path - because it's self-conflicting. It could be treated as strong, > but since it doesn't conflict with any of the weak lock types, that > would only serve to prevent fast-path lock acquisitions that otherwise > could have succeeded. In particular, it would unnecessarily disable > fast-path lock acquisition for any relation being vacuumed, which > could be really ugly considering that one of the main workloads that > would benefit from something like this is the case where lots of > backends are fighting over a lock manager partition lock on a table > they all want to run read and/or modify. I think it's best for > ShareUpdateExclusiveLock to always use the regular lock-acquisition > path, but it need not worry about incrementing strong_lock_counts[] or > importing local locks in so doing. Agreed. > Also, I think in the step just after marker one, we'd only import only > local locks whose lock tags were equal to the lock tag on which we > were attempting to acquire a strong lock. The downside of this whole > approach is that acquiring a strong lock becomes, at least > potentially, a lot slower, because you have to scan through the whole > backend array looking for fast-path locks to import (let's not use the > term "local lock", which is already in use within the lock manager > code). But maybe that can be optimized enough not to matter. After > all, if the lock manager scaled perfectly at high concurrency, we > wouldn't be thinking about this in the first place. Incidentally, I used the term "local lock" because I assumed fast-path locks would still go through the lock manager far enough to populate the local lock table. But there may be no reason to do so. > > I wonder if, instead, we could signal all backends at > > marker 1 to dump the applicable parts of their local (memory) lock tables to > > files. ?Or to another shared memory region, if that didn't mean statically > > allocating the largest possible required amount. ?If we were willing to wait > > until all backends reach a CHECK_FOR_INTERRUPTS, they could instead make the > > global insertions directly. ?That might yield a decent amount of bug swatting to > > fill in missing CHECK_FOR_INTERRUPTS, though. > > I've thought about this; I believe it's unworkable. If one backend > goes into the tank (think: SIGSTOP, or blocking on I/O to an > unreadable disk sector) this could lead to cascading failure. True. It would need some fairly major advantages to justify that risk, and I don't see any. Overall, looks like a promising design sketch to me. Thanks. nm
On Fri, May 13, 2011 at 11:05 PM, Noah Misch <noah@leadboat.com> wrote: > Incidentally, I used the term "local lock" because I assumed fast-path locks > would still go through the lock manager far enough to populate the local lock > table. But there may be no reason to do so. Oh, good point. I think we probably WOULD need to update the local lock lock hash table. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, May 13, 2011 at 11:05 PM, Noah Misch <noah@leadboat.com> wrote: >> Incidentally, I used the term "local lock" because I assumed fast-path locks >> would still go through the lock manager far enough to populate the local lock >> table. But there may be no reason to do so. > Oh, good point. I think we probably WOULD need to update the local > lock lock hash table. I haven't read this thread closely, but the general behavior of the backend assumes that it's very very cheap to re-acquire a lock that's already held by the current transaction. It's probably worth maintaining a local counter just so you can keep that being true, even if there were no other need for it. (Since I've not read the thread, I'll refrain from asking how you're gonna clean up at transaction end if there's no local memory of what locks you hold.) regards, tom lane
On Fri, May 13, 2011 at 5:55 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, May 13, 2011 at 4:16 PM, Noah Misch <noah@leadboat.com> wrote: >> I wonder if, instead, we could signal all backends at >> marker 1 to dump the applicable parts of their local (memory) lock tables to >> files. Or to another shared memory region, if that didn't mean statically >> allocating the largest possible required amount. If we were willing to wait >> until all backends reach a CHECK_FOR_INTERRUPTS, they could instead make the >> global insertions directly. That might yield a decent amount of bug swatting to >> fill in missing CHECK_FOR_INTERRUPTS, though. > > I've thought about this; I believe it's unworkable. If one backend > goes into the tank (think: SIGSTOP, or blocking on I/O to an > unreadable disk sector) this could lead to cascading failure. Would that risk be substantially worse than it currently is? If a backend goes into the tank while holding access shared locks, it will still block access exclusive locks until it recovers. And those queued access exclusive locks will block new access shared locks from other backends. How much is risk magnified by the new approach, going from "any backend holding the lock is tanked" to "any process at all is tanked"? What I'd considered playing with in the past is having LockMethodLocalHash hang on to an Access Shared lock even after locallock->nLocks == 0, so that re-granting the lock would be a purely local operation. Anyone wanting an Access Exclusive lock and not immediately getting it would have to send out a plea (via SINVA?) for other processes to release their locallock->nLocks == 0 locks. But this would suffer from the same problem of tanked processes. Cheers, Jeff
On Sat, May 14, 2011 at 1:33 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > Would that risk be substantially worse than it currently is? If a > backend goes into the tank while holding access shared locks, it will > still block access exclusive locks until it recovers. And those > queued access exclusive locks will block new access shared locks from > other backends. How much is risk magnified by the new approach, > going from "any backend holding the lock is tanked" to "any process at > all is tanked"? I think that's a pretty substantial increase in risk. Consider that there may be 100 backends out there, one of which holds a relevant lock. Needing to wait for all of them to do something instead of just one is quite different. Also, quite apart from the possibility of hanging altogether, the latency would probably be increased quite a bit, and not in a very predictable fashion. I have the impression that most of the problem comes from fighting over CPU cache lines. If that's correct, it may not be important to avoid shared memory access per se; it may be good enough to arrange things so that the shared memory which is accessed is *typically* not being accessed by other backends. > What I'd considered playing with in the past is having > LockMethodLocalHash hang on to an Access Shared lock even after > locallock->nLocks == 0, so that re-granting the lock would be a purely > local operation. Anyone wanting an Access Exclusive lock and not > immediately getting it would have to send out a plea (via SINVA?) for > other processes to release their locallock->nLocks == 0 locks. But > this would suffer from the same problem of tanked processes. Yeah. I have thought about this, too, but as with Noah's suggestion, I think this would make the risk of things hanging up substantially worse than it is now. A backend that, under the present code, wouldn't be holding an AccessShareLock at all, would now be holding one that you'd have to convince it to release. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, May 13, 2011 at 4:16 PM, Noah Misch <noah@leadboat.com> wrote: > if (level >= ShareUpdateExclusiveLock) > ++strong_lock_counts[my_strong_lock_count_partition] > sfence > if (strong_lock_counts[my_strong_lock_count_partition] == 1) > /* marker 1 */ > import_all_local_locks > normal_LockAcquireEx > else if (level <= RowExclusiveLock) > lfence > if (strong_lock_counts[my_strong_lock_count_partition] == 0) > /* marker 2 */ > local_only > /* marker 3 */ > else > normal_LockAcquireEx > else > normal_LockAcquireEx > > At marker 1, we need to block until no code is running between markers two and > three. You could do that with a per-backend lock (LW_SHARED by the strong > locker, LW_EXCLUSIVE by the backend). That would probably still be a win over > the current situation, but it would be nice to have something even cheaper. Barring some brilliant idea, or anyway for a first cut, it seems to me that we can adjust the above pseudocode by assuming the use of a LWLock. In addition, two other adjustments: first, the first line should test level > ShareUpdateExclusiveLock, rather than >=, per previous discussion. Second, import_all_local locks needn't really move everything; just those locks with a matching locktag. Thus: ! if (level > ShareUpdateExclusiveLock) ! ++strong_lock_counts[my_strong_lock_count_partition] ! sfence ! for each backend ! take per-backend lwlock for target backend ! transfer fast-path entries with matching locktag ! release per-backend lwlock for target backend ! normal_LockAcquireEx ! else if (level <= RowExclusiveLock) ! lfence ! if (strong_lock_counts[my_strong_lock_count_partition] == 0) ! take per-backend lwlock for own backend ! fast-path lock acquisition ! release per-backend lwlock for own backend ! else ! normal_LockAcquireEx ! else ! normal_LockAcquireEx Now, a small fly in the ointment is that we haven't got, with PostgreSQL, a portable library of memory primitives. So there isn't an obvious way of doing that sfence/lfence business. Now, it seems to me that in the "strong lock" case, the sfence isn't really needed anyway, because we're about to start acquiring and releasing an lwlock for every backend, and that had better act as a full memory barrier anyhow, or we're doomed. The "weak lock" case is more interesting, because we need the fence before we've taken any LWLock. But perhaps it'd be sufficient to just acquire the per-backend lwlock before checking strong_lock_counts[]. If, as we hope, we get back a zero, then we do the fast-path lock acquisition, release the lwlock, and away we go. If we get back any other value, then we've wasted an lwlock acquisition cycle. Or actually maybe not: it seems to me that in that case we'd better transfer all of our fast-path entries into the main hash table before trying to acquire any lock the slow way, at least if we don't want the deadlock detector to have to know about the fast-path. So then we get this: ! if (level > ShareUpdateExclusiveLock) ! ++strong_lock_counts[my_strong_lock_count_partition] ! for each backend ! take per-backend lwlock for target backend ! transfer fastpath entries with matching locktag ! release per-backend lwlock for target backend ! else if (level <= RowExclusiveLock) ! take per-backend lwlock for own backend ! if (strong_lock_counts[my_strong_lock_count_partition] == 0) ! fast-path lock acquisition ! done = true ! else ! transfer all fastpath entries ! release per-backend lwlock for own backend ! if (!done) ! normal_LockAcquireEx That seems like it ought to work, at least assuming the position of your fencing instructions was correct in the first place. But there's one big problem to worry about: what happens if the lock transfer fails due to shared memory exhaustion? It's not so bad in the "weak lock" case; it'll feel just like the already-existing case where you try to push another lock into the shared-memory hash table and there's no room. Essentially you've been living on borrowed time anyway. On the other hand, the "strong lock" case is a real problem, because a large number of granted fast-path locks can effectively DOS any strong locker, even one that wouldn't have conflicted with them. That's clearly not going to fly, but it's not clear to me what the best way is to patch around it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, May 23, 2011 at 09:15:27PM -0400, Robert Haas wrote: > On Fri, May 13, 2011 at 4:16 PM, Noah Misch <noah@leadboat.com> wrote: > > ? ? ? ?if (level >= ShareUpdateExclusiveLock) > > ? ? ? ? ? ? ? ?++strong_lock_counts[my_strong_lock_count_partition] > > ? ? ? ? ? ? ? ?sfence > > ? ? ? ? ? ? ? ?if (strong_lock_counts[my_strong_lock_count_partition] == 1) > > ? ? ? ? ? ? ? ? ? ? ? ?/* marker 1 */ > > ? ? ? ? ? ? ? ? ? ? ? ?import_all_local_locks > > ? ? ? ? ? ? ? ?normal_LockAcquireEx > > ? ? ? ?else if (level <= RowExclusiveLock) > > ? ? ? ? ? ? ? ?lfence > > ? ? ? ? ? ? ? ?if (strong_lock_counts[my_strong_lock_count_partition] == 0) > > ? ? ? ? ? ? ? ? ? ? ? ?/* marker 2 */ > > ? ? ? ? ? ? ? ? ? ? ? ?local_only > > ? ? ? ? ? ? ? ? ? ? ? ?/* marker 3 */ > > ? ? ? ? ? ? ? ?else > > ? ? ? ? ? ? ? ? ? ? ? ?normal_LockAcquireEx > > ? ? ? ?else > > ? ? ? ? ? ? ? ?normal_LockAcquireEx > > > > At marker 1, we need to block until no code is running between markers two and > > three. ?You could do that with a per-backend lock (LW_SHARED by the strong > > locker, LW_EXCLUSIVE by the backend). ?That would probably still be a win over > > the current situation, but it would be nice to have something even cheaper. > > Barring some brilliant idea, or anyway for a first cut, it seems to me > that we can adjust the above pseudocode by assuming the use of a > LWLock. In addition, two other adjustments: first, the first line > should test level > ShareUpdateExclusiveLock, rather than >=, per > previous discussion. Second, import_all_local locks needn't really > move everything; just those locks with a matching locktag. Thus: > > ! if (level > ShareUpdateExclusiveLock) > ! ++strong_lock_counts[my_strong_lock_count_partition] > ! sfence > ! for each backend > ! take per-backend lwlock for target backend > ! transfer fast-path entries with matching locktag > ! release per-backend lwlock for target backend > ! normal_LockAcquireEx > ! else if (level <= RowExclusiveLock) > ! lfence > ! if (strong_lock_counts[my_strong_lock_count_partition] == 0) > ! take per-backend lwlock for own backend > ! fast-path lock acquisition > ! release per-backend lwlock for own backend > ! else > ! normal_LockAcquireEx > ! else > ! normal_LockAcquireEx This drops the part about only transferring fast-path entries once when a strong_lock_counts cell transitions from zero to one. Granted, that itself requires some yet-undiscussed locking. For that matter, we can't have multiple strong lockers completing transfers on the same cell in parallel. Perhaps add a FastPathTransferLock, or an array of per-cell locks, that each strong locker holds for that entire "if" body and while decrementing the strong_lock_counts cell at lock release. As far as the level of detail of this pseudocode goes, there's no need to hold the per-backend LWLock while transferring the fast-path entries. You just need to hold it sometime between bumping strong_lock_counts and transferring the backend's locks. This ensures that, for example, the backend is not sleeping in the middle of a fast-path lock acquisition for the whole duration of this code. > Now, a small fly in the ointment is that we haven't got, with > PostgreSQL, a portable library of memory primitives. So there isn't > an obvious way of doing that sfence/lfence business. I was thinking that, if the final implementation could benefit from memory barrier interfaces, we should create those interfaces now. Start with only a platform-independent dummy implementation that runs a lock/unlock cycle on a spinlock residing in backend-local memory. I'm 75% sure that would be sufficient on all architectures for which we support spinlocks. It may turn out that we can't benefit from such interfaces at this time ... > Now, it seems to > me that in the "strong lock" case, the sfence isn't really needed > anyway, because we're about to start acquiring and releasing an lwlock > for every backend, and that had better act as a full memory barrier > anyhow, or we're doomed. The "weak lock" case is more interesting, > because we need the fence before we've taken any LWLock. Agreed. > But perhaps it'd be sufficient to just acquire the per-backend lwlock > before checking strong_lock_counts[]. If, as we hope, we get back a > zero, then we do the fast-path lock acquisition, release the lwlock, > and away we go. If we get back any other value, then we've wasted an > lwlock acquisition cycle. Or actually maybe not: it seems to me that > in that case we'd better transfer all of our fast-path entries into > the main hash table before trying to acquire any lock the slow way, at > least if we don't want the deadlock detector to have to know about the > fast-path. So then we get this: > > ! if (level > ShareUpdateExclusiveLock) > ! ++strong_lock_counts[my_strong_lock_count_partition] > ! for each backend > ! take per-backend lwlock for target backend > ! transfer fastpath entries with matching locktag > ! release per-backend lwlock for target backend > ! else if (level <= RowExclusiveLock) > ! take per-backend lwlock for own backend > ! if (strong_lock_counts[my_strong_lock_count_partition] == 0) > ! fast-path lock acquisition > ! done = true > ! else > ! transfer all fastpath entries > ! release per-backend lwlock for own backend > ! if (!done) > ! normal_LockAcquireEx Could you elaborate on the last part (the need for "else transfer all fastpath entries") and, specifically, how it aids deadlock avoidance? I didn't think this change would have any impact on deadlocks, because all relevant locks will be in the global lock table before any call to normal_LockAcquireEx. To validate the locking at this level of detail, I think we need to sketch the unlock protocol, too. On each strong lock release, we'll decrement the strong_lock_counts cell. No particular interlock with fast-path lockers should be needed; a stray AccessShareLock needlessly making it into the global lock table is no problem. As mentioned above, we _will_ need an interlock with lock transfer operations. How will transferred fast-path locks get removed from the global lock table? Presumably, the original fast-path locker should do so at transaction end; anything else would contort the life cycle. Then add a way for the backend to know which locks had been transferred as well as an interlock against concurrent transfer operations. Maybe that's all. > That seems like it ought to work, at least assuming the position of > your fencing instructions was correct in the first place. But there's > one big problem to worry about: what happens if the lock transfer > fails due to shared memory exhaustion? It's not so bad in the "weak > lock" case; it'll feel just like the already-existing case where you > try to push another lock into the shared-memory hash table and there's > no room. Essentially you've been living on borrowed time anyway. On > the other hand, the "strong lock" case is a real problem, because a > large number of granted fast-path locks can effectively DOS any strong > locker, even one that wouldn't have conflicted with them. That's > clearly not going to fly, but it's not clear to me what the best way > is to patch around it. To put it another way: the current system is fair; the chance of hitting lock exhaustion is independent of lock level. The new system would be unfair; lock exhaustion is much more likely to appear for a > ShareUpdateExclusiveLock acquisition, through no fault of that transaction. I agree this isn't ideal, but it doesn't look to me like an unacceptable weakness. Making lock slots first-come, first-served is inherently unfair; we're not at all set up to justly arbitrate between mutually-hostile lockers competing for slots. The overall situation will get better, not worse, for the admin who wishes to defend against hostile unprivileged users attempting a lock table DOS. Thanks, nm
On Tue, May 24, 2011 at 5:07 AM, Noah Misch <noah@leadboat.com> wrote: > This drops the part about only transferring fast-path entries once when a > strong_lock_counts cell transitions from zero to one. Right: that's because I don't think that's what we want to do. I don't think we want to transfer all per-backend locks to the shared hash table as soon as anyone attempts to acquire a strong lock; instead, I think we want to transfer only those fast-path locks which have the same locktag as the strong lock someone is attempting to acquire. If we do that, then it doesn't matter whether the strong_lock_counts[] cell is transitioning from 0 to 1 or from 6 to 7: we still have to check for strong locks with that particular locktag. > Granted, that itself > requires some yet-undiscussed locking. For that matter, we can't have > multiple strong lockers completing transfers on the same cell in parallel. > Perhaps add a FastPathTransferLock, or an array of per-cell locks, that each > strong locker holds for that entire "if" body and while decrementing the > strong_lock_counts cell at lock release. I was imagining that the per-backend LWLock would protect the list of fast-path locks. So to transfer locks, you would acquire the per-backend LWLock for the backend which has the lock, and then the lock manager partition LWLock, and then perform the transfer. > As far as the level of detail of this pseudocode goes, there's no need to hold > the per-backend LWLock while transferring the fast-path entries. You just > need to hold it sometime between bumping strong_lock_counts and transferring > the backend's locks. This ensures that, for example, the backend is not > sleeping in the middle of a fast-path lock acquisition for the whole duration > of this code. See above; I'm lost. >> Now, a small fly in the ointment is that we haven't got, with >> PostgreSQL, a portable library of memory primitives. So there isn't >> an obvious way of doing that sfence/lfence business. > > I was thinking that, if the final implementation could benefit from memory > barrier interfaces, we should create those interfaces now. Start with only a > platform-independent dummy implementation that runs a lock/unlock cycle on a > spinlock residing in backend-local memory. I'm 75% sure that would be > sufficient on all architectures for which we support spinlocks. It may turn > out that we can't benefit from such interfaces at this time ... OK. >> Now, it seems to >> me that in the "strong lock" case, the sfence isn't really needed >> anyway, because we're about to start acquiring and releasing an lwlock >> for every backend, and that had better act as a full memory barrier >> anyhow, or we're doomed. The "weak lock" case is more interesting, >> because we need the fence before we've taken any LWLock. > > Agreed. > >> But perhaps it'd be sufficient to just acquire the per-backend lwlock >> before checking strong_lock_counts[]. If, as we hope, we get back a >> zero, then we do the fast-path lock acquisition, release the lwlock, >> and away we go. If we get back any other value, then we've wasted an >> lwlock acquisition cycle. Or actually maybe not: it seems to me that >> in that case we'd better transfer all of our fast-path entries into >> the main hash table before trying to acquire any lock the slow way, at >> least if we don't want the deadlock detector to have to know about the >> fast-path. So then we get this: >> >> ! if (level > ShareUpdateExclusiveLock) >> ! ++strong_lock_counts[my_strong_lock_count_partition] >> ! for each backend >> ! take per-backend lwlock for target backend >> ! transfer fastpath entries with matching locktag >> ! release per-backend lwlock for target backend >> ! else if (level <= RowExclusiveLock) >> ! take per-backend lwlock for own backend >> ! if (strong_lock_counts[my_strong_lock_count_partition] == 0) >> ! fast-path lock acquisition >> ! done = true >> ! else >> ! transfer all fastpath entries >> ! release per-backend lwlock for own backend >> ! if (!done) >> ! normal_LockAcquireEx > > Could you elaborate on the last part (the need for "else transfer all fastpath > entries") and, specifically, how it aids deadlock avoidance? I didn't think > this change would have any impact on deadlocks, because all relevant locks > will be in the global lock table before any call to normal_LockAcquireEx. Oh, hmm, maybe you're right. I was concerned about the possibility that of a backend which already holds locks going to sleep on a lock wait, and maybe running the deadlock detector, and failing to notice a deadlock. But I guess that can't happen: if any of the locks it holds are relevant to the deadlock detector, the backend attempting to acquire those locks will transfer them before attempting to acquire the lock itself, so it should be OK. > To validate the locking at this level of detail, I think we need to sketch the > unlock protocol, too. On each strong lock release, we'll decrement the > strong_lock_counts cell. No particular interlock with fast-path lockers > should be needed; a stray AccessShareLock needlessly making it into the global > lock table is no problem. As mentioned above, we _will_ need an interlock > with lock transfer operations. How will transferred fast-path locks get > removed from the global lock table? Presumably, the original fast-path locker > should do so at transaction end; anything else would contort the life cycle. > Then add a way for the backend to know which locks had been transferred as > well as an interlock against concurrent transfer operations. Maybe that's > all. I'm thinking that the backend can note, in its local-lock table, whether it originally acquired a lock via the fast-path or not. Any lock not originally acquired via the fast-path will be released just as now. For any lock that WAS originally acquired via the fast-path, we'll take our own per-backend lwlock, which protects the fast-path queue, and scan the fast-path queue for a matching entry. If none is found, then we know the lock was transferred, so release the per-backend lwlock and do it the regular way (take lock manager partition lock, etc.). At transaction end, we need to release all non-session locks, so we can consolidate a bit to avoid excess locking and unlocking. Take the per-backend lwlock just once and scan through the queue. Any locks we find there (that are not session locks) can be nuked from the local-lock table and we're done. Release the per-backend lwlock. At this point, any remaining locks that need to be released are in the shared hash tables and we can proceed as now (see LockReleaseAll - basically, we iterate over the lock partitions). > To put it another way: the current system is fair; the chance of hitting lock > exhaustion is independent of lock level. The new system would be unfair; lock > exhaustion is much more likely to appear for a > ShareUpdateExclusiveLock > acquisition, through no fault of that transaction. I agree this isn't ideal, > but it doesn't look to me like an unacceptable weakness. Making lock slots > first-come, first-served is inherently unfair; we're not at all set up to > justly arbitrate between mutually-hostile lockers competing for slots. The > overall situation will get better, not worse, for the admin who wishes to > defend against hostile unprivileged users attempting a lock table DOS. Well, it's certainly true that the proposed system is far less likely to bomb out trying to acquire an AccessShareLock than what we have today, since in the common case the AccessShareLock doesn't use up any shared resources. And that should make a lot of people happy. But as to the bad scenario, one needn't presume that the lockers are hostile - it may just be that the system is running on the edge of a full lock table. In the worst case, someone wanting a strong lock on a table may end up transferring a hundred or more locks (up to one per backend) into that table. Even if they all fit and the strong locker gets his lock, it may now happen that the space is just about exhausted and other transactions start rolling back, apparently at random. Or, even more pathologically, one backend grabs a strong lock on table X, but it just so happens that there is another table Y in the same lock partition which is highly trafficked but, as all of the locks involved are weak, uncontended. Now that strong_lock_counts[] is non-zero for that partition, all those locks start going into the main lock manager table. Performance will drop, which isn't great but we can live with it. But maybe all the locks don't fit. Now we have a situation in which, due to one backend acquiring a strong lock on table A, a bunch of other backends are unable to obtain weak locks on table B, and transactions start rolling back all over the place. Now maybe you can argue that this scenario is sufficiently unlikely in practice that we shouldn't worry about it, but if it does happen the DBA will be incredibly confused, because an apparently innocuous operation on one table will have resulted in rollbacks acquiring locks on an apparently unrelated table. Maybe you want to argue that's sufficiently rare that we shouldn't worry about it, but the unpleasantness factor seems pretty high to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, May 24, 2011 at 08:53:11AM -0400, Robert Haas wrote: > On Tue, May 24, 2011 at 5:07 AM, Noah Misch <noah@leadboat.com> wrote: > > This drops the part about only transferring fast-path entries once when a > > strong_lock_counts cell transitions from zero to one. > > Right: that's because I don't think that's what we want to do. I > don't think we want to transfer all per-backend locks to the shared > hash table as soon as anyone attempts to acquire a strong lock; > instead, I think we want to transfer only those fast-path locks which > have the same locktag as the strong lock someone is attempting to > acquire. If we do that, then it doesn't matter whether the > strong_lock_counts[] cell is transitioning from 0 to 1 or from 6 to 7: > we still have to check for strong locks with that particular locktag. Oh, I see. I was envisioning that you'd transfer all locks associated with the strong_lock_counts cell; that is, all the locks that would now go directly to the global lock table when requested going forward. Transferring only exact matches seems fine too, and then I agree with your other conclusions. > > Granted, that itself > > requires some yet-undiscussed locking. ?For that matter, we can't have > > multiple strong lockers completing transfers on the same cell in parallel. > > Perhaps add a FastPathTransferLock, or an array of per-cell locks, that each > > strong locker holds for that entire "if" body and while decrementing the > > strong_lock_counts cell at lock release. > > I was imagining that the per-backend LWLock would protect the list of > fast-path locks. So to transfer locks, you would acquire the > per-backend LWLock for the backend which has the lock, and then the > lock manager partition LWLock, and then perform the transfer. I see later in your description that the transferer will delete each fast-path lock after transferring it. Given that, this does sound adequate. > > As far as the level of detail of this pseudocode goes, there's no need to hold > > the per-backend LWLock while transferring the fast-path entries. ?You just > > need to hold it sometime between bumping strong_lock_counts and transferring > > the backend's locks. ?This ensures that, for example, the backend is not > > sleeping in the middle of a fast-path lock acquisition for the whole duration > > of this code. > > See above; I'm lost. It wasn't a particularly useful point. > > To validate the locking at this level of detail, I think we need to sketch the > > unlock protocol, too. ?On each strong lock release, we'll decrement the > > strong_lock_counts cell. ?No particular interlock with fast-path lockers > > should be needed; a stray AccessShareLock needlessly making it into the global > > lock table is no problem. ?As mentioned above, we _will_ need an interlock > > with lock transfer operations. ?How will transferred fast-path locks get > > removed from the global lock table? ?Presumably, the original fast-path locker > > should do so at transaction end; anything else would contort the life cycle. > > Then add a way for the backend to know which locks had been transferred as > > well as an interlock against concurrent transfer operations. ?Maybe that's > > all. > > I'm thinking that the backend can note, in its local-lock table, > whether it originally acquired a lock via the fast-path or not. Any > lock not originally acquired via the fast-path will be released just > as now. For any lock that WAS originally acquired via the fast-path, > we'll take our own per-backend lwlock, which protects the fast-path > queue, and scan the fast-path queue for a matching entry. If none is > found, then we know the lock was transferred, so release the > per-backend lwlock and do it the regular way (take lock manager > partition lock, etc.). Sounds good. > > To put it another way: the current system is fair; the chance of hitting lock > > exhaustion is independent of lock level. ?The new system would be unfair; lock > > exhaustion is much more likely to appear for a > ShareUpdateExclusiveLock > > acquisition, through no fault of that transaction. ?I agree this isn't ideal, > > but it doesn't look to me like an unacceptable weakness. ?Making lock slots > > first-come, first-served is inherently unfair; we're not at all set up to > > justly arbitrate between mutually-hostile lockers competing for slots. ?The > > overall situation will get better, not worse, for the admin who wishes to > > defend against hostile unprivileged users attempting a lock table DOS. > > Well, it's certainly true that the proposed system is far less likely > to bomb out trying to acquire an AccessShareLock than what we have > today, since in the common case the AccessShareLock doesn't use up any > shared resources. And that should make a lot of people happy. But as > to the bad scenario, one needn't presume that the lockers are hostile > - it may just be that the system is running on the edge of a full lock > table. In the worst case, someone wanting a strong lock on a table > may end up transferring a hundred or more locks (up to one per > backend) into that table. Even if they all fit and the strong locker > gets his lock, it may now happen that the space is just about > exhausted and other transactions start rolling back, apparently at > random. > > Or, even more pathologically, one backend grabs a strong lock on table > X, but it just so happens that there is another table Y in the same > lock partition which is highly trafficked but, as all of the locks > involved are weak, uncontended. Now that strong_lock_counts[] is > non-zero for that partition, all those locks start going into the main > lock manager table. Performance will drop, which isn't great but we > can live with it. But maybe all the locks don't fit. Now we have a > situation in which, due to one backend acquiring a strong lock on > table A, a bunch of other backends are unable to obtain weak locks on > table B, and transactions start rolling back all over the place. > > Now maybe you can argue that this scenario is sufficiently unlikely in > practice that we shouldn't worry about it, but if it does happen the > DBA will be incredibly confused, because an apparently innocuous > operation on one table will have resulted in rollbacks acquiring locks > on an apparently unrelated table. Maybe you want to argue that's > sufficiently rare that we shouldn't worry about it, but the > unpleasantness factor seems pretty high to me. Let's see if I understand the risk better now: the new system will handle lock load better, but when it does hit a limit, understanding why that happened will be more difficult. Good point. No silver-bullet ideas come to mind for avoiding that. A lock transfer operation could abort if it sees itself burning all the shared memory, but I'd be pessimistic about identifying a concrete heuristic not subject to its own confusing corner cases. Overall, the original outcome doesn't sound especially confusing to me. YMMV. Will the pg_locks view scan fast-path lock tables? If not, we probably need another view that does. We can then encourage administrators to monitor for fast-path lock counts that get high relative to shared memory capacity. Thanks, nm
On Tue, May 24, 2011 at 10:03 AM, Noah Misch <noah@leadboat.com> wrote: > Let's see if I understand the risk better now: the new system will handle lock > load better, but when it does hit a limit, understanding why that happened > will be more difficult. Good point. No silver-bullet ideas come to mind for > avoiding that. The only idea I can think of is to try to impose some bounds. For example, suppose we track the total number of locks that the system can handle in the shared hash table. We try to maintain the system in a state where the number of locks that actually exist is less than that number, even though some of them may be stored elsewhere. You can imagine a system where backends grab a global mutex to reserve a certain number of slots, and store that in shared memory together with their fast-path list, but another backend which is desperate for space can go through and trim back reservations to actual usage. > Will the pg_locks view scan fast-path lock tables? If not, we probably need > another view that does. We can then encourage administrators to monitor for > fast-path lock counts that get high relative to shared memory capacity. I think pg_locks should probably scan the fast-path tables. Another random idea for optimization: we could have a lock-free array with one entry per backend, indicating whether any fast-path locks are present. Before acquiring its first fast-path lock, a backend writes a 1 into that array and inserts a store fence. After releasing its last fast-path lock, it performs a store fence and writes a 0 into the array. Anyone who needs to grovel through all the per-backend fast-path arrays for whatever reason can perform a load fence and then scan the array. If I understand how this stuff works (and it's very possible that I don't), when the scanning backend sees a 0, it can be assured that the target backend has no fast-path locks and therefore doesn't need to acquire and release that LWLock or scan that fast-path array for entries. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, May 24, 2011 at 10:35:23AM -0400, Robert Haas wrote: > On Tue, May 24, 2011 at 10:03 AM, Noah Misch <noah@leadboat.com> wrote: > > Let's see if I understand the risk better now: the new system will handle lock > > load better, but when it does hit a limit, understanding why that happened > > will be more difficult. ?Good point. ?No silver-bullet ideas come to mind for > > avoiding that. > > The only idea I can think of is to try to impose some bounds. For > example, suppose we track the total number of locks that the system > can handle in the shared hash table. We try to maintain the system in > a state where the number of locks that actually exist is less than > that number, even though some of them may be stored elsewhere. You > can imagine a system where backends grab a global mutex to reserve a > certain number of slots, and store that in shared memory together with > their fast-path list, but another backend which is desperate for space > can go through and trim back reservations to actual usage. Forcing artificial resource exhaustion is a high price to pay. I suppose it's quite like disabling Linux memory overcommit, and some folks would like it. > Another random idea for optimization: we could have a lock-free array > with one entry per backend, indicating whether any fast-path locks are > present. Before acquiring its first fast-path lock, a backend writes > a 1 into that array and inserts a store fence. After releasing its > last fast-path lock, it performs a store fence and writes a 0 into the > array. Anyone who needs to grovel through all the per-backend > fast-path arrays for whatever reason can perform a load fence and then > scan the array. If I understand how this stuff works (and it's very > possible that I don't), when the scanning backend sees a 0, it can be > assured that the target backend has no fast-path locks and therefore > doesn't need to acquire and release that LWLock or scan that fast-path > array for entries. I'm probably just missing something, but can't that conclusion become obsolete arbitrarily quickly? What if the scanning backend sees a 0, and the subject backend is currently sleeping just before it would have bumped that value? We need to take the LWLock is there's any chance that the subject backend has not yet seen the scanning backend's strong_lock_counts[] update. nm
On Tue, May 24, 2011 at 11:38 AM, Noah Misch <noah@leadboat.com> wrote: >> Another random idea for optimization: we could have a lock-free array >> with one entry per backend, indicating whether any fast-path locks are >> present. Before acquiring its first fast-path lock, a backend writes >> a 1 into that array and inserts a store fence. After releasing its >> last fast-path lock, it performs a store fence and writes a 0 into the >> array. Anyone who needs to grovel through all the per-backend >> fast-path arrays for whatever reason can perform a load fence and then >> scan the array. If I understand how this stuff works (and it's very >> possible that I don't), when the scanning backend sees a 0, it can be >> assured that the target backend has no fast-path locks and therefore >> doesn't need to acquire and release that LWLock or scan that fast-path >> array for entries. > > I'm probably just missing something, but can't that conclusion become obsolete > arbitrarily quickly? What if the scanning backend sees a 0, and the subject > backend is currently sleeping just before it would have bumped that value? We > need to take the LWLock is there's any chance that the subject backend has not > yet seen the scanning backend's strong_lock_counts[] update. Can't we bump strong_lock_counts[] *first*, make sure that change is globally visible, and only then start scanning the array? Once we've bumped strong_lock_counts[] and made sure everyone can see that change, it's still possible for backends to take a fast-path lock in some *other* fast-path partition, but nobody should be able to add any more fast-path locks in the partition we care about after that point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, May 24, 2011 at 11:52:54AM -0400, Robert Haas wrote: > On Tue, May 24, 2011 at 11:38 AM, Noah Misch <noah@leadboat.com> wrote: > >> Another random idea for optimization: we could have a lock-free array > >> with one entry per backend, indicating whether any fast-path locks are > >> present. ?Before acquiring its first fast-path lock, a backend writes > >> a 1 into that array and inserts a store fence. ?After releasing its > >> last fast-path lock, it performs a store fence and writes a 0 into the > >> array. ?Anyone who needs to grovel through all the per-backend > >> fast-path arrays for whatever reason can perform a load fence and then > >> scan the array. ?If I understand how this stuff works (and it's very > >> possible that I don't), when the scanning backend sees a 0, it can be > >> assured that the target backend has no fast-path locks and therefore > >> doesn't need to acquire and release that LWLock or scan that fast-path > >> array for entries. > > > > I'm probably just missing something, but can't that conclusion become obsolete > > arbitrarily quickly? ?What if the scanning backend sees a 0, and the subject > > backend is currently sleeping just before it would have bumped that value? ?We > > need to take the LWLock is there's any chance that the subject backend has not > > yet seen the scanning backend's strong_lock_counts[] update. > > Can't we bump strong_lock_counts[] *first*, make sure that change is > globally visible, and only then start scanning the array? > > Once we've bumped strong_lock_counts[] and made sure everyone can see > that change, it's still possible for backends to take a fast-path lock > in some *other* fast-path partition, but nobody should be able to add > any more fast-path locks in the partition we care about after that > point. There's a potentially-unbounded delay between when the subject backend reads strong_lock_counts[] and when it sets its fast-path-used flag. (I didn't mean "not yet seen" in the sense that some memory load would not show the latest value. I just meant that the subject backend may still be taking relevant actions based on its previous load of the value.) We could have the subject set its fast-path-used flag before even checking strong_lock_counts[], then clear the flag when strong_lock_counts[] dissuaded it from proceeding. Maybe that's what you had in mind? That being said, it's a slight extra cost for all fast-path lockers to benefit the strong lockers, so I'm not prepared to guess whether it will pay off.
On Tue, May 24, 2011 at 12:34 PM, Noah Misch <noah@leadboat.com> wrote: > There's a potentially-unbounded delay between when the subject backend reads > strong_lock_counts[] and when it sets its fast-path-used flag. (I didn't mean > "not yet seen" in the sense that some memory load would not show the latest > value. I just meant that the subject backend may still be taking relevant > actions based on its previous load of the value.) We could have the subject > set its fast-path-used flag before even checking strong_lock_counts[], then > clear the flag when strong_lock_counts[] dissuaded it from proceeding. Maybe > that's what you had in mind? I'd like to say yes, but actually, no, I just failed to notice the race condition. It's definitely less appealing if we have to do it that way. Another idea would be to only clear the fast-path-used flags lazily. If backend A inspects the fast-path queue for backend B and finds it completely empty, it clears the flag; otherwise it just stays set indefinitely. > That being said, it's a slight extra cost for all fast-path lockers to benefit > the strong lockers, so I'm not prepared to guess whether it will pay off. Yeah. Basically this entire idea is about trying to make life easier for weak lockers at the expense of making it more difficult for strong lockers. I think that's a good trade-off in general, but we might need to wait until we have an actual implementation to judge whether we've turned the dial too far. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, May 24, 2011 at 6:37 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> That being said, it's a slight extra cost for all fast-path lockers to benefit >> the strong lockers, so I'm not prepared to guess whether it will pay off. > > Yeah. Basically this entire idea is about trying to make life easier > for weak lockers at the expense of making it more difficult for strong > lockers. I think that's a good trade-off in general, but we might > need to wait until we have an actual implementation to judge whether > we've turned the dial too far. I like this overall concept and like the way this has been described with strong and weak locks. It seems very useful to me, since temp tables can be skipped. That leaves shared DDL and we have done lots to reduce the lock levels held and are looking at further reductions also. I think even quite extensive delays are worth the trade-off. I'd been looking at this also, though hadn't mentioned it previously because I found an Oracle patent that discusses dynamically turning on and off locking. So that's something to be aware of. IMHO if we discuss this in terms of sharing/not sharing locking information then it is sufficient to avoid the patent. That patent also discusses the locking state change needs to wait longer than required. I got a bit lost with the description of a potential solution. It seemed like you were unaware that there is a local lock and a shared lock table, maybe just me? Design seemed relatively easy from there: put local lock table in shared memory for all procs. We then have a use_strong_lock at proc and at transaction level. Anybody that wants a strong lock first sets use_strong_lock at proc and transaction level, then copies all local lock data into shared lock table, double checking TransactionIdIsInProgress() each time. Then queues for lock using the now fully set up shared lock table. When transaction with strong lock completes we do not attempt to reset transaction level boolean, only at proc level, since DDL often occurs in groups and we want to avoid flip-flopping quickly between lock share states. Cleanup happens by regularly by bgwriter, perhaps every 10 seconds or so. All locks are still visible for pg_locks. Hopefully thats of use. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, May 25, 2011 at 8:27 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > I got a bit lost with the description of a potential solution. It > seemed like you were unaware that there is a local lock and a shared > lock table, maybe just me? No, I'm not unaware of the local lock table. The point of this proposal is to avoid fighting over the LWLocks that protect the shared hash table by allowing some locks to be taken without touching it. > Design seemed relatively easy from there: put local lock table in > shared memory for all procs. We then have a use_strong_lock at proc > and at transaction level. Anybody that wants a strong lock first sets > use_strong_lock at proc and transaction level, then copies all local > lock data into shared lock table, double checking > TransactionIdIsInProgress() each time. Then queues for lock using the > now fully set up shared lock table. When transaction with strong lock > completes we do not attempt to reset transaction level boolean, only > at proc level, since DDL often occurs in groups and we want to avoid > flip-flopping quickly between lock share states. Cleanup happens by > regularly by bgwriter, perhaps every 10 seconds or so. All locks are > still visible for pg_locks. I'm not following this... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, May 25, 2011 at 1:44 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, May 25, 2011 at 8:27 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> I got a bit lost with the description of a potential solution. It >> seemed like you were unaware that there is a local lock and a shared >> lock table, maybe just me? > > No, I'm not unaware of the local lock table. The point of this > proposal is to avoid fighting over the LWLocks that protect the shared > hash table by allowing some locks to be taken without touching it. Yes, I got that. I just couldn't work out where mmap came in. >> Design seemed relatively easy from there: put local lock table in >> shared memory for all procs. We then have a use_strong_lock at proc >> and at transaction level. Anybody that wants a strong lock first sets >> use_strong_lock at proc and transaction level, then copies all local >> lock data into shared lock table, double checking >> TransactionIdIsInProgress() each time. Then queues for lock using the >> now fully set up shared lock table. When transaction with strong lock >> completes we do not attempt to reset transaction level boolean, only >> at proc level, since DDL often occurs in groups and we want to avoid >> flip-flopping quickly between lock share states. Cleanup happens by >> regularly by bgwriter, perhaps every 10 seconds or so. All locks are >> still visible for pg_locks. > > I'm not following this... Which bit aren't you following? It's a design outline for how to implement, deliberately brief to allow a discussion of design alternatives. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > On Wed, May 25, 2011 at 1:44 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Wed, May 25, 2011 at 8:27 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> Design seemed relatively easy from there: put local lock table in >>> shared memory for all procs. We then have a use_strong_lock at proc >>> and at transaction level. Anybody that wants a strong lock first sets >>> use_strong_lock at proc and transaction level, then copies all local >>> lock data into shared lock table, >> I'm not following this... > Which bit aren't you following? It's a design outline for how to > implement, deliberately brief to allow a discussion of design > alternatives. What I'm not following is how moving the local lock table into shared memory can possibly be a good idea. The reason we invented the local lock table in the first place (we didn't use to have one) is so that a process could do some manipulations without touching shared memory. (Notably, it is currently nearly free, and certainly lock-free, to re-request a lock type you already hold. This is not an infrequent case.) That advantage will go away if you do this. regards, tom lane
On Wed, May 25, 2011 at 8:56 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, May 25, 2011 at 1:44 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Wed, May 25, 2011 at 8:27 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> I got a bit lost with the description of a potential solution. It >>> seemed like you were unaware that there is a local lock and a shared >>> lock table, maybe just me? >> >> No, I'm not unaware of the local lock table. The point of this >> proposal is to avoid fighting over the LWLocks that protect the shared >> hash table by allowing some locks to be taken without touching it. > > Yes, I got that. I just couldn't work out where mmap came in. I don't think we were planning to use mmap anywhere. >>> Design seemed relatively easy from there: put local lock table in >>> shared memory for all procs. We then have a use_strong_lock at proc >>> and at transaction level. Anybody that wants a strong lock first sets >>> use_strong_lock at proc and transaction level, then copies all local >>> lock data into shared lock table, double checking >>> TransactionIdIsInProgress() each time. Then queues for lock using the >>> now fully set up shared lock table. When transaction with strong lock >>> completes we do not attempt to reset transaction level boolean, only >>> at proc level, since DDL often occurs in groups and we want to avoid >>> flip-flopping quickly between lock share states. Cleanup happens by >>> regularly by bgwriter, perhaps every 10 seconds or so. All locks are >>> still visible for pg_locks. >> >> I'm not following this... > > Which bit aren't you following? It's a design outline for how to > implement, deliberately brief to allow a discussion of design > alternatives. Well, OK: 1. I don't think putting the local lock table in shared memory is a good idea both for performance (keeping it uncontended has value) and memory management (it would increase shared memory requirements quite a bit). The design Noah and I were discussing upthread uses only a small and fixed amount of shared memory for each backend, and preserves the local lock table as an unshared resource. 2. You haven't explained the procedure for acquire a weak lock at all, and in particular how a weak locker would be able to quickly determine whether a conflicting strong lock was potentially present. 3. I don't understand the proposed procedure for acquiring a strong lock either; in particular, I don't see why TransactionIdIsInProgress() would be relevant. The lock manager doesn't really do anything with transaction IDs now, and you haven't offered any explanation of why that would be necessary or advisable. 4. You haven't explained what the transaction-level boolean would actually do. It's not even clear whether you intend for that to be kept in local or shared memory. It's also unclear what you intend for bgwriter to clean up. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Simon Riggs wrote: > On Tue, May 24, 2011 at 6:37 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > >> That being said, it's a slight extra cost for all fast-path lockers to benefit > >> the strong lockers, so I'm not prepared to guess whether it will pay off. > > > > Yeah. ?Basically this entire idea is about trying to make life easier > > for weak lockers at the expense of making it more difficult for strong > > lockers. ?I think that's a good trade-off in general, but we might > > need to wait until we have an actual implementation to judge whether > > we've turned the dial too far. > > I like this overall concept and like the way this has been described > with strong and weak locks. It seems very useful to me, since temp > tables can be skipped. That leaves shared DDL and we have done lots to > reduce the lock levels held and are looking at further reductions > also. I think even quite extensive delays are worth the trade-off. > > I'd been looking at this also, though hadn't mentioned it previously > because I found an Oracle patent that discusses dynamically turning on > and off locking. So that's something to be aware of. IMHO if we > discuss this in terms of sharing/not sharing locking information then > it is sufficient to avoid the patent. That patent also discusses the > locking state change needs to wait longer than required. FYI, I thought we had agreed not to look at any patents because looking at them might cause us more problems than not looking at them. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Tue, May 24, 2011 at 10:03 AM, Noah Misch <noah@leadboat.com> wrote: > On Tue, May 24, 2011 at 08:53:11AM -0400, Robert Haas wrote: >> On Tue, May 24, 2011 at 5:07 AM, Noah Misch <noah@leadboat.com> wrote: >> > This drops the part about only transferring fast-path entries once when a >> > strong_lock_counts cell transitions from zero to one. >> >> Right: that's because I don't think that's what we want to do. I >> don't think we want to transfer all per-backend locks to the shared >> hash table as soon as anyone attempts to acquire a strong lock; >> instead, I think we want to transfer only those fast-path locks which >> have the same locktag as the strong lock someone is attempting to >> acquire. If we do that, then it doesn't matter whether the >> strong_lock_counts[] cell is transitioning from 0 to 1 or from 6 to 7: >> we still have to check for strong locks with that particular locktag. > > Oh, I see. I was envisioning that you'd transfer all locks associated with > the strong_lock_counts cell; that is, all the locks that would now go directly > to the global lock table when requested going forward. Transferring only > exact matches seems fine too, and then I agree with your other conclusions. I took a crack at implementing this and ran into difficulties. Actually, I haven't gotten to the point of actually testing whether it works, but I'm worried about a possible problem with the algorithm. When a strong lock is taken or released, we have to increment or decrement strong_lock_counts[fasthashpartition]. Here's the question: is that atomic? In other words, suppose that strong_lock_counts[42] starts out at 0, and two backends both do ++strong_lock_counts[42]. Are we guaranteed to end up with "2" in that memory location or might we unluckily end up with "1"? I think the latter is possible... and some guard is needed to make sure that doesn't happen. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > When a strong lock is taken or released, we have to increment or > decrement strong_lock_counts[fasthashpartition]. Here's the question: > is that atomic? In other words, suppose that strong_lock_counts[42] > starts out at 0, and two backends both do ++strong_lock_counts[42]. > Are we guaranteed to end up with "2" in that memory location or might > we unluckily end up with "1"? I think the latter is possible... and > some guard is needed to make sure that doesn't happen. There are "atomic increment" primitives on most/all multiprocessors, although availing ourselves of them everywhere will take an amount of work not unlike developing the spinlock primitives :-(. You are dead right that this is unsafe without that. regards, tom lane
On Fri, May 27, 2011 at 04:55:07PM -0400, Robert Haas wrote: > When a strong lock is taken or released, we have to increment or > decrement strong_lock_counts[fasthashpartition]. Here's the question: > is that atomic? In other words, suppose that strong_lock_counts[42] > starts out at 0, and two backends both do ++strong_lock_counts[42]. > Are we guaranteed to end up with "2" in that memory location or might > we unluckily end up with "1"? I think the latter is possible... and > some guard is needed to make sure that doesn't happen. Yeah: what Tom said. Guard it with a spinlock? Given that the backend is about to (or did earlier) go off and acquire dozens or hundreds of LWLocks, it doesn't seem like an area begging for early optimization.
On Wed, May 25, 2011 at 3:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On Wed, May 25, 2011 at 1:44 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Wed, May 25, 2011 at 8:27 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>>> Design seemed relatively easy from there: put local lock table in >>>> shared memory for all procs. We then have a use_strong_lock at proc >>>> and at transaction level. Anybody that wants a strong lock first sets >>>> use_strong_lock at proc and transaction level, then copies all local >>>> lock data into shared lock table, > >>> I'm not following this... > >> Which bit aren't you following? It's a design outline for how to >> implement, deliberately brief to allow a discussion of design >> alternatives. > > What I'm not following is how moving the local lock table into shared > memory can possibly be a good idea. The reason we invented the local > lock table in the first place (we didn't use to have one) is so that a > process could do some manipulations without touching shared memory. > (Notably, it is currently nearly free, and certainly lock-free, to > re-request a lock type you already hold. This is not an infrequent > case.) That advantage will go away if you do this. The basis for this is that weak locks do not conflict with each other, whereas strong locks conflict with both strong and weak locks. (There's a couple of special cases which I ignore for now). (Using Robert's description of strong/weak locks) Since most actions in normal running only require weak locks then we see that 99% of the time we don't need to share lock information at all. So the idea is that we have 2 modes of operation: mode (1) when nobody is requesting a strong lock we don't share lock information. We switch into mode (2) when somebody requests a strong lock and in this mode we must share all lock information just as we do now. The basic analysis is that we have a way of removing 99% of the overhead of lock information sharing. We still have the local lock table and we still perform locking, we just don't share that with other backends unless we need to. So there is a slight reduction in path length and a total avoidance of contention. Ideally, we would want to be in mode 2 for a short period of time. The difficulty is how to move from mode 1 (non-shared locking) to mode 2 (shared locking) and back again? A strong lock request causes the mode flip automatically via one of these mechanisms: 1. signal to each backend causes them to update shared lock information (at that point non-conflicting) 2. local lock table in shared memory 3. files 4. other The requirement is that the mode be flipped in all backends before we process the request for a strong lock. The idea is to make the local lock table accessible for occasional use in mode switching. Reading the local lock table by its owning backend would always be lock free. Locks are only required when modifying the local lock table by the owning backend, or when another backend reads it. So making the local lock table accessible is not a problem. Flipping back from mode 2 to mode 1 should be fairly slow, since DDL usually occurs in groups. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, May 31, 2011 at 9:22 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > The basis for this is that weak locks do not conflict with each other, > whereas strong locks conflict with both strong and weak locks. > (There's a couple of special cases which I ignore for now). > (Using Robert's description of strong/weak locks) > > Since most actions in normal running only require weak locks then we > see that 99% of the time we don't need to share lock information at > all. > > So the idea is that we have 2 modes of operation: mode (1) when nobody > is requesting a strong lock we don't share lock information. We switch > into mode (2) when somebody requests a strong lock and in this mode we > must share all lock information just as we do now. > > The basic analysis is that we have a way of removing 99% of the > overhead of lock information sharing. We still have the local lock > table and we still perform locking, we just don't share that with > other backends unless we need to. So there is a slight reduction in > path length and a total avoidance of contention. > > Ideally, we would want to be in mode 2 for a short period of time. > > The difficulty is how to move from mode 1 (non-shared locking) to mode > 2 (shared locking) and back again? > > A strong lock request causes the mode flip automatically via one of > these mechanisms: > 1. signal to each backend causes them to update shared lock > information (at that point non-conflicting) > 2. local lock table in shared memory > 3. files > 4. other > The requirement is that the mode be flipped in all backends before we > process the request for a strong lock. > > The idea is to make the local lock table accessible for occasional use > in mode switching. Reading the local lock table by its owning backend > would always be lock free. Locks are only required when modifying the > local lock table by the owning backend, or when another backend reads > it. So making the local lock table accessible is not a problem. You can't actually make the local lock table lock-free to the owning backend, if other backends are going to be modifying it, or even examining it. However, as discussed upthread, what does seem possible is to allow each backend to maintain a queue of "weak" locks that are protected by an LWLock which is normally taken only by the owning backend, except on those rare occasions when a "strong" lock enters the picture. This doesn't completely eliminate LWLocks from the picture, but preliminary tests with my hacked-up, work-in-progress patch shows that it results in a very large decrease in LWLock *contention*. I'm going to post the patch once I get it debugged and tested a bit more. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company