Обсуждение: Lightweight locking primitive
Hi, It seems that the Linux kernel will very shortly acquire a lightweight userlevel locking primitive (called futexes), thanks primarily to Rusty and Hubertus. It looks to be very useful for the sort of locking that databases of various types need to do. They have a bunch of rather nice properties: a) low overhead in the no-contention case - a single locked instruction on i386 b) no kernel overhead for non-contended locks - make as many as you like, the kernel memory cost is only O(number of lockswith waiters) c) are interruptible / restartable across signals d) the name :-) They don't do: a) deadlock detection b) cleanup on process exit -- the kernel doesn't know who had the lock, so it can't help here A reader/writer version is available, though it's currently implemented with two futexes. Spin-for-a-while-before-sleeping versions are planned. The API looks like this: /* these can be stored anywhere -- mmapped file, * sysv shm, shared anonymous memory */struct futex lock; /* this does mprotect(.., ...|PROT_SEM, ...); * seemingly some architectures need to do odd * things to get atomic/coherentmemory */if(futex_region(lock, sizeof(lock))) fail("futexes not available on this kernel"); futex_init(&lock); ... /* grab the lock -- we also have a futex_trydown */if(futex_down(&lock)) fail("couldn't get lock"); /* critical section */ /* release lock */futex_up(&lock); We're looking for interesting applications to try this stuff out on. Are there: a) parts of postgresql which would like this, or b) changes to the interface (or feature set) which would make it more suited ? The LWLocks look not unlike this, but my impression is that they are low-contention, so any improvement would be small. Any pointers into appropriate bits of source would be greatly appreciated. Thanks, Matthew.
Matthew Kirkwood wrote: > > Hi, > > It seems that the Linux kernel will very shortly acquire a lightweight > userlevel locking primitive (called futexes), thanks primarily to Rusty > and Hubertus. It looks to be very useful for the sort of locking that > databases of various types need to do. > > They have a bunch of rather nice properties: I am curious how 'futexes' are different/better than POSIX (pthread style) mutexes? > > a) low overhead in the no-contention case - a single locked > instruction on i386 should be same for pthread_mutex_lock() > b) no kernel overhead for non-contended locks - make as > many as you like, the kernel memory cost is only > O(number of locks with waiters) Well it can't have kernel overhead for non-contended locks if a non-contended lock is one opcode, can it? > c) are interruptible / restartable across signals Not sure what 'restartable' means? Do you mean locking primitives would restarted by kernel when interrupted by signals? Like kernel calls with SA_RESTART set? How that would be possible if kernel does not even know about non-contended locks? > d) the name :-) > > They don't do: > > a) deadlock detection > b) cleanup on process exit -- the kernel doesn't know who > had the lock, so it can't help here > > A reader/writer version is available, though it's currently implemented > with two futexes. Spin-for-a-while-before-sleeping versions are planned. > RW locks are defined by POSIX too and can be implemented by mutex + condvar. I wonder what is wrong with those... At the same time Linux has POSIX semaphores which can not be shared across processes, making them quite useless. Fixing that could help postgres quite a bit more I think... -- igor
Igor Kovalenko <Igor.Kovalenko@motorola.com> writes: > Matthew Kirkwood wrote: > > > > Hi, > > > > It seems that the Linux kernel will very shortly acquire a lightweight > > userlevel locking primitive (called futexes), thanks primarily to Rusty > > and Hubertus. It looks to be very useful for the sort of locking that > > databases of various types need to do. > > > > They have a bunch of rather nice properties: > > I am curious how 'futexes' are different/better than POSIX (pthread > style) mutexes? They're basically the same thing. Currently, pthread_mutexes on Linux (implemented in glibc) are fairly gross in the contended case, since there is no clean way to wait for lock release, and they interact fairly nastily with signal semantics. The futex patches create a new system call which lets you cleanly wait for a locked futex (an unlocking task checks for waiting lockers and calls into the kernel for wakeups if it finds any). There's no reason that POSIX mutextes and semaphores couldn't be implemented on top of futexes, usable both in threaded and multiprocess shared-memory environments. > Not sure what 'restartable' means? Do you mean locking primitives would > restarted by kernel when interrupted by signals? Like kernel calls with > SA_RESTART set? How that would be possible if kernel does not even know > about non-contended locks? I interpret the above as meaning: contended case (blocked in futex_wait syscall or whatever it's called) can be cleanly interrupted and by a signal and restarted automatically. > RW locks are defined by POSIX too and can be implemented by mutex + > condvar. I wonder what is wrong with those... There's no conflict between POSIX locks and futexes; the latter are just a good, new way to implement the former. > At the same time Linux has > POSIX semaphores which can not be shared across processes, making them > quite useless. Fixing that could help postgres quite a bit more I > think... Yes, having mutexes and semaphores shareable by different processes is one of the benefits of the new locks as I understand them. -Doug -- Doug McNaught Wireboard Industries http://www.wireboard.com/ Custom software development, systems and network consulting. Java PostgreSQL Enhydra Python Zope Perl Apache LinuxBSD...
You should take a look at pthread_mutex_setpshared(). May be they don't in Linux, but that's just consequence of braindead implementation. -- igor Hubertus Franke wrote: > > Biggest difference, FUTEX work across address spaces, pthread_mutexes don't > ! > > Hubertus Franke > Enterprise Linux Group (Mgr), Linux Technology Center (Member Scalability) > , OS-PIC (Chair) > email: frankeh@us.ibm.com > (w) 914-945-2003 (fax) 914-945-4425 TL: 862-2003 > > Igor Kovalenko <Igor.Kovalenko@motorola.com> on 03/12/2002 04:48:23 PM > > To: > cc: pgsql-hackers@postgresql.org, Hubertus Franke/Watson/IBM@IBMUS, > rusty@rustcorp.com.au > Subject: Re: [HACKERS] Lightweight locking primitive > > Matthew Kirkwood wrote: > > > > Hi, > > > > It seems that the Linux kernel will very shortly acquire a lightweight > > userlevel locking primitive (called futexes), thanks primarily to Rusty > > and Hubertus. It looks to be very useful for the sort of locking that > > databases of various types need to do. > > > > They have a bunch of rather nice properties: > > I am curious how 'futexes' are different/better than POSIX (pthread > style) mutexes? > > > > > a) low overhead in the no-contention case - a single locked > > instruction on i386 > > should be same for pthread_mutex_lock() > > > b) no kernel overhead for non-contended locks - make as > > many as you like, the kernel memory cost is only > > O(number of locks with waiters) > > Well it can't have kernel overhead for non-contended locks if a > non-contended lock is one opcode, can it? > > > c) are interruptible / restartable across signals > > Not sure what 'restartable' means? Do you mean locking primitives would > restarted by kernel when interrupted by signals? Like kernel calls with > SA_RESTART set? How that would be possible if kernel does not even know > about non-contended locks? > > > d) the name :-) > > > > They don't do: > > > > a) deadlock detection > > b) cleanup on process exit -- the kernel doesn't know who > > had the lock, so it can't help here > > > > A reader/writer version is available, though it's currently implemented > > with two futexes. Spin-for-a-while-before-sleeping versions are planned. > > > > RW locks are defined by POSIX too and can be implemented by mutex + > condvar. I wonder what is wrong with those... At the same time Linux has > POSIX semaphores which can not be shared across processes, making them > quite useless. Fixing that could help postgres quite a bit more I > think... > > -- igor
Doug McNaught wrote: > They're basically the same thing. Currently, pthread_mutexes on Linux > (implemented in glibc) are fairly gross in the contended case, since > there is no clean way to wait for lock release, and they interact > fairly nastily with signal semantics. The futex patches create a new > system call which lets you cleanly wait for a locked futex (an > unlocking task checks for waiting lockers and calls into the kernel > for wakeups if it finds any). Strange that it doesn't wait for the lock. BSD/OS has: The pthread_mutex_lock() function locks the mutex pointed to by mutex. If mutex is already locked, the calling threadwill block until the mutex becomes available. Upon success the pthread_mutex_lock() function re- turns withthe mutex locked and the calling thread as its owner. In fact, they have a pthread_mutex_trylock() version that doesn't block: The pthread_mutex_trylock() function performs a non-blocking mutex lock operation. It behaves exactly like pthread_mutex_lock()except that if any thread (including the calling thread) currently owns the mutex, an immediateerror return is performed. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Tue, 12 Mar 2002, Bruce Momjian wrote: > > They're basically the same thing. Currently, pthread_mutexes on Linux > > (implemented in glibc) are fairly gross in the contended case, since > > there is no clean way to wait for lock release, > Strange that it doesn't wait for the lock. [..] It does wait, in that the call will not return before or unless the thread has acquired the lock. However, it waits in an ugly way, via spin-and-yield or some evil signal or pipe hackery via a manager thread. pthread_mutexes are fairly ugly, but they should still be lightweight. Until now, there was no way to do that under Linux. (I don't know how the other free Unixes do it, but I suspect it is not much better.) Matthew.
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Doug McNaught wrote: > > They're basically the same thing. Currently, pthread_mutexes on Linux > > (implemented in glibc) are fairly gross in the contended case, since > > there is no clean way to wait for lock release, and they interact > > fairly nastily with signal semantics. The futex patches create a new > > system call which lets you cleanly wait for a locked futex (an > > unlocking task checks for waiting lockers and calls into the kernel > > for wakeups if it finds any). > > Strange that it doesn't wait for the lock. BSD/OS has: It does wait. If the lock is already locked (atomic test in userspace) the process makes a futex_wait system call, which puts the process on a kernel waitqueue. From what I can see, the new Linux locks are not a replacement for POSIX locks/semaphores, they're simply a fairly clean way of implementing them. They also just went into the development kernel, so any appearance in production systems may take a few months at least. -Doug -- Doug McNaught Wireboard Industries http://www.wireboard.com/ Custom software development, systems and network consulting. Java PostgreSQL Enhydra Python Zope Perl Apache LinuxBSD...
Matthew Kirkwood wrote: > > On Tue, 12 Mar 2002, Bruce Momjian wrote: > > > > They're basically the same thing. Currently, pthread_mutexes on Linux > > > (implemented in glibc) are fairly gross in the contended case, since > > > there is no clean way to wait for lock release, > > > Strange that it doesn't wait for the lock. > [..] > > It does wait, in that the call will not return before or unless > the thread has acquired the lock. However, it waits in an ugly > way, via spin-and-yield or some evil signal or pipe hackery via > a manager thread. > > pthread_mutexes are fairly ugly, but they should still be > lightweight. Until now, there was no way to do that under > Linux. (I don't know how the other free Unixes do it, but I > suspect it is not much better.) If all free Unixes do it in such an ugly way then well, what you get is what you paid for ;) I still would be surprized if all implementations were as bad as Linux one is. Pthread mutexes are very lightweight and fast on Solaris and QNX (I mostly work with those). They can be shared across processes on both. Implementation-wise, QNX has corresponding blocking state so when some thread locks a mutex other contenders get blocked by kernel. They are (one of them) unblocked by kernel when mutex is released. Speaking about ugliness, the only issue I see with pthread mutexes is that they can get orphaned. There is no portable way to deal with that, but again both Solaris and QNX have extended API which allows some thread to aqcuire ownership of an orphaned mutex. I guess that eventually will make its way into POSIX. -- igor
Biggest difference, FUTEX work across address spaces, pthread_mutexes don't ! Hubertus Franke Enterprise Linux Group (Mgr), Linux Technology Center (Member Scalability) , OS-PIC (Chair) email: frankeh@us.ibm.com (w) 914-945-2003 (fax) 914-945-4425 TL: 862-2003 Igor Kovalenko <Igor.Kovalenko@motorola.com> on 03/12/2002 04:48:23 PM To: cc: pgsql-hackers@postgresql.org, Hubertus Franke/Watson/IBM@IBMUS, rusty@rustcorp.com.au Subject: Re: [HACKERS] Lightweight locking primitive Matthew Kirkwood wrote: > > Hi, > > It seems that the Linux kernel will very shortly acquire a lightweight > userlevel locking primitive (called futexes), thanks primarily to Rusty > and Hubertus. It looks to be very useful for the sort of locking that > databases of various types need to do. > > They have a bunch of rather nice properties: I am curious how 'futexes' are different/better than POSIX (pthread style) mutexes? > > a) low overhead in the no-contention case - a single locked > instruction on i386 should be same for pthread_mutex_lock() > b) no kernel overhead for non-contended locks - make as > many as you like, the kernel memory cost is only > O(number of locks with waiters) Well it can't have kernel overhead for non-contended locks if a non-contended lock is one opcode, can it? > c) are interruptible / restartable across signals Not sure what 'restartable' means? Do you mean locking primitives would restarted by kernel when interrupted by signals? Like kernel calls with SA_RESTART set? How that would be possible if kernel does not even know about non-contended locks? > d) the name :-) > > They don't do: > > a) deadlock detection > b) cleanup on process exit -- the kernel doesn't know who > had the lock, so it can't help here > > A reader/writer version is available, though it's currently implemented > with two futexes. Spin-for-a-while-before-sleeping versions are planned. > RW locks are defined by POSIX too and can be implemented by mutex + condvar. I wonder what is wrong with those... At the same time Linux has POSIX semaphores which can not be shared across processes, making them quite useless. Fixing that could help postgres quite a bit more I think... -- igor