Обсуждение: LISTEN/NOTIFY benchmarks?
Hi, I'm looking for information on the scalabality of the LISTEN/NOTIFY mechanism. How well does it scale with respect to: - hundreds of clients registered for LISTENs I guess this translates to hundreds of the corresponding backend processes receiving SIG_USR2 signals. The efficiencyof this is probably OS-dependent. Would anyone be in a position to give me signal delivery benchmarks forFreeBSD on Unix? - each client registered for thousands of LISTENs From a look at backend/commands/async.c, it would seem that each listening backend would get a signal for *every* LISTENit registered for, resulting in thousands of signals to the same listening backend, instead of only one. Wouldit help if this was optimized so that a signal was sent only once? Again, info on relevant signal delivery benchmarkswould be useful. I'm not an expert on signals, not even a novice, so I might be totally off base, but it seems like the Async Notification implementation does not scale. If it does not, does anyone have a solution for the problem of signalling a each event in a possibly very large set of events to a large number of clients? Thanks, --prashanth
prashanth@jibenetworks.com writes: > I'm not an expert on signals, not even a novice, so I might be totally > off base, but it seems like the Async Notification implementation does > not scale. Very possibly. You didn't even mention the problems that would occur if the pg_listener table didn't get vacuumed often enough. The pghackers archives contain some discussion about reimplementing listen/notify using a non-table-based infrastructure. But AFAIK no one has picked up that task yet. regards, tom lane
prashanth@jibenetworks.com kirjutas T, 29.04.2003 kell 04:14: > Hi, > > I'm looking for information on the scalabality of the LISTEN/NOTIFY > mechanism. How well does it scale with respect to: > > - hundreds of clients registered for LISTENs > > I guess this translates to hundreds of the corresponding backend > processes receiving SIG_USR2 signals. The efficiency of this is > probably OS-dependent. Would anyone be in a position to give me > signal delivery benchmarks for FreeBSD on Unix? > > - each client registered for thousands of LISTENs > > From a look at backend/commands/async.c, it would seem that each > listening backend would get a signal for *every* LISTEN it > registered for, resulting in thousands of signals to the same > listening backend, instead of only one. But as the signals are usually generated async, you have no way to know if a particular backend has already received a signal. Or do you mean some mechanism that remembers "signals sent" in some shared structure that the receiving backend can then clear when it actually receives the signal ? That could mean lock contention on that shared structure, unless we decide that it is cheaper to just consult it without locking it and accept an occasional delivery of unneeded signals. > Would it help if this was > optimized so that a signal was sent only once? Again, info on > relevant signal delivery benchmarks would be useful. I still suspect that replacing pg_listener table from the mechanism would give gains faster. Of course we could rework the signal mechanism as well while doing it. > I'm not an expert on signals, not even a novice, so I might be totally > off base, but it seems like the Async Notification implementation does > not scale. If it does not, does anyone have a solution for the > problem of signalling a each event in a possibly very large set of > events to a large number of clients? ----------------- Hannu
On Mon, Apr 28, 2003 at 10:19:16PM -0400, Tom Lane wrote: > prashanth@jibenetworks.com writes: > > I'm not an expert on signals, not even a novice, so I might be totally > > off base, but it seems like the Async Notification implementation does > > not scale. > > Very possibly. You didn't even mention the problems that would occur if > the pg_listener table didn't get vacuumed often enough. > > The pghackers archives contain some discussion about reimplementing > listen/notify using a non-table-based infrastructure. But AFAIK no one > has picked up that task yet. I found some messages in 03/2002 that also brought up the performance issue. You had suggested the use of shared-memory, and made reference to a "SI model". I did find see any alternative non-table-based suggestions. What is the "SI model"? Thanks, --prashanth
On Tue, Apr 29, 2003 at 10:10:47AM +0300, Hannu Krosing wrote: > prashanth@jibenetworks.com kirjutas T, 29.04.2003 kell 04:14: > > - each client registered for thousands of LISTENs > > > > From a look at backend/commands/async.c, it would seem that each > > listening backend would get a signal for *every* LISTEN it > > registered for, resulting in thousands of signals to the same > > listening backend, instead of only one. > > But as the signals are usually generated async, you have no way to know > if a particular backend has already received a signal. > > Or do you mean some mechanism that remembers "signals sent" in some > shared structure that the receiving backend can then clear when it > actually receives the signal ? No, I meant that a listening backend process would be sent multiple signals from a notifying process, *in the inner loop* of backend/commands/async.c:AtCommit_Notify(). If the listening backend had registered tens of thousands of LISTENs, it would be sent an equivalent number of signals during a single run of AtCommit_Notify(). I'm not sure what the cost of this is, since I'm not sure how signal delivery works, but the tens of thousands of system calls cannot be very cheap. --prashanth
prashanth@jibenetworks.com writes: > I found some messages in 03/2002 that also brought up the performance > issue. You had suggested the use of shared-memory, and made reference > to a "SI model". I did find see any alternative non-table-based > suggestions. What is the "SI model"? I meant following the example of the existing shared-cache-invalidation signaling mechanism --- see src/backend/storage/ipc/sinvaladt.c src/backend/storage/ipc/sinval.c src/include/storage/sinvaladt.h src/include/storage/sinval.h regards, tom lane
prashanth@jibenetworks.com writes: > If the listening backend had registered tens of thousands of LISTENs, > it would be sent an equivalent number of signals during a single run > of AtCommit_Notify(). Not unless the notifier had notified all tens of thousands of condition names in a single transaction. regards, tom lane
On Tue, Apr 29, 2003 at 06:21:15PM -0400, Tom Lane wrote: > prashanth@jibenetworks.com writes: > > If the listening backend had registered tens of thousands of LISTENs, > > it would be sent an equivalent number of signals during a single run > > of AtCommit_Notify(). > > Not unless the notifier had notified all tens of thousands of condition > names in a single transaction. Unfortunately, that is a possibility in our application. We are now working around this non-scalability. Regardless, it would seem redundant to send more than one SIG_USR2 to the recipient backend in that loop. -- prashanth
> I'm not an expert on signals, not even a novice, so I might be > totally off base, but it seems like the Async Notification > implementation does not scale. If it does not, does anyone have a > solution for the problem of signalling a each event in a possibly > very large set of events to a large number of clients? <brainfart_for_the_archives> Hrm.... I should see about porting kqueue/kevent as a messaging buss for the listen/notify bits to postgresql... that does scale and it scales well to tens of thousands of connections a second (easily over 60K, likely closer to 1M is the limit).... </brainfart_for_the_archives> -- Sean Chittenden
On Tue, 29 Apr 2003, Sean Chittenden wrote: > > I'm not an expert on signals, not even a novice, so I might be > > totally off base, but it seems like the Async Notification > > implementation does not scale. If it does not, does anyone have a > > solution for the problem of signalling a each event in a possibly > > very large set of events to a large number of clients? > > <brainfart_for_the_archives> Hrm.... I should see about porting > kqueue/kevent as a messaging buss for the listen/notify bits to > postgresql... that does scale and it scales well to tens of thousands > of connections a second (easily over 60K, likely closer to 1M is the > limit).... </brainfart_for_the_archives> Except that it is FreeBSD specific -- being system calls and all -- if I remember correctly. If you're going to move to a system like that, which is a good idea, best move to a portable system. Thanks, Gavin
> > > I'm not an expert on signals, not even a novice, so I might be > > > totally off base, but it seems like the Async Notification > > > implementation does not scale. If it does not, does anyone have > > > a solution for the problem of signalling a each event in a > > > possibly very large set of events to a large number of clients? > > > > <brainfart_for_the_archives> Hrm.... I should see about porting > > kqueue/kevent as a messaging buss for the listen/notify bits to > > postgresql... that does scale and it scales well to tens of > > thousands of connections a second (easily over 60K, likely closer > > to 1M is the limit).... </brainfart_for_the_archives> > > Except that it is FreeBSD specific -- being system calls and all -- > if I remember correctly. If you're going to move to a system like > that, which is a good idea, best move to a portable system. You can #ifdef abstract things so that select() and poll() work if available. Though now that I think about it, a queue that existed completely in userland would be better... an shm implementation that's abstracted would be ideal, but shm is a precious resource and can't scale all that big. A shared mmap() region, however, is much less scarce and can scale much higher. mmap() + semaphore as a gate to a queue would be ideal, IMHO. I shouldn't be posti^H^H^H^H^Hrambling though, haven't slept in 72hrs. :-/ *stops reading email* -sc -- Sean Chittenden
Sorry for the late response to this, but I've been caught up in merging TCQ to the 7.3.2 code base. BTW, an announcement for those interested. We'll be doing a demonstration of TelegraphCQ during the ACM SIGMOD Conference in June. This year's SIGMOD is held in San Diego as part of the ACM FCRC (Federated Computer Research Conf) - visit http://www.sigmod.org for more details. SIGMOD runs from June 8-12 2003. All pgsql hackers (and others) are cordially invited :-) Do drop us an email if you're planning to show up. >>>>> "Sean" == Sean Chittenden <sean@chittenden.org> writes: Sean> You can #ifdef abstract things so that select() and poll() Sean> work if available. Though now that I think aboutit, a Sean> queue that existed completely in userland would be Sean> better... an shm implementation that's abstractedwould be Sean> ideal, but shm is a precious resource and can't scale all Sean> that big. A shared mmap() region,however, is much less Sean> scarce and can scale much higher. mmap() + semaphore as a Sean> gate to a queue wouldbe ideal, IMHO. As part of our TelegraphCQ work, we've implemented a generic userland queue. We support blocking/non-blocking operation at both enqueue/dequeue time as well as different forms of latching. The queue can also live in shared memory, for which we use a new Shared Memory MemoryContext. This is implemented using libmm - a memory management library that's came out of the Apache project. Our current released version is based on the 7.2.1 source base. However, our internal CVS tip is based on 7.3.2 - we had to make a few changes to the shm allocator - one more function that's part of a MemoryContext. (We can afford to be slightly more profligate in our use of shared memory as we process all concurrently executing streaming queries in a single monster query plan. New queries are dynamically folded into a running query plan on the fly. Since streams represent append-only data we play fast and loose with transaction isolation ...) The current version of the code is available at: http://telegraph.cs.berkeley.edu/telegraphcq If there is interest, we would love to contribute our queue infrastructure to PostgreSQL. In fact, we'd love to contribute any of our stuff that the pgsql folks find interesting/useful. Our motivations are two-fold: (1) We'd like to give back to the pgsql community. (2) It's in our interest if things like the Queue/ShMem stuff is part of pgsql as it means one less of a merge hassle infuture. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh
> (2) It's in our interest if things like the Queue/ShMem stuff is > part of pgsql as it means one less of a merge hassle in future. I'd be quite interested in the work as it would remove my dependence on jabberd as a distributed event/message bus and I could keep everything inside of PostgreSQL, which is always a good thing. :) -sc -- Sean Chittenden
>>>>> "Sean" == Sean Chittenden <sean@chittenden.org> writes: >> (2) It's in our interest if things like the Queue/ShMem stuff >> is part of pgsql as it means one less of a mergehassle in >> future. Sean> I'd be quite interested in the work as it would remove my Sean> dependence on jabberd as a distributed event/messagebus and Sean> I could keep everything inside of PostgreSQL, which is Sean> always a good thing. :) -sc Sounds great ! Would it make more sense for us to correspond privately and see if you can use our code and then submit a patch ? Or is it better to have a discussion on HACKERS itself and lend itself to further googling. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh
> >> (2) It's in our interest if things like the Queue/ShMem stuff > >> is part of pgsql as it means one less of a merge hassle in > >> future. > > Sean> I'd be quite interested in the work as it would remove my > Sean> dependence on jabberd as a distributed event/message bus and > Sean> I could keep everything inside of PostgreSQL, which is > Sean> always a good thing. :) -sc > > Sounds great ! Would it make more sense for us to correspond privately > and see if you can use our code and then submit a patch ? > > Or is it better to have a discussion on HACKERS itself and lend itself > to further googling. Do you have a URL for the patch? If not, send it to me privately. I can take any non-critical issues off line but I bet others have an interest in this code as well. I'm particularly interested in the API atm to see how hard it would be to integrate. -sc -- Sean Chittenden
>>>>> "Sean" == Sean Chittenden <sean@chittenden.org> writes: Sean> Do you have a URL for the patch? If not, send it to me Sean> privately. I can take any non-critical issues offline but Sean> I bet others have an interest in this code as well. TCQ website: http://telegraph.cs.berkeley.edu/telegraphcq The code we have on the web is a source distribution based on 7.2 - not as a patch. I think I can produce a patch off of 7.3.2 - it's just a bunch of new modules, although we had to add a few functions to the changed semaphore abstractions. Sean> I'm particularly interested in the API atm to see how hard Sean> it would be to integrate. -sc Since the API hasn't changed significantly internally maybe the best bet is for you to download the src distribution on the link above and look at the directories src/backend/rqueue as well src/include/rqueue If things look promising, I can rustle up code that fits the 7.3.x codebase. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh