Обсуждение: pgbench internal contention
On machines with lots of CPU cores, pgbench can start eating up a lot of system time. Investigation reveals that the problem is with random(), which glibc implements like this: long int __random () { int32_t retval; __libc_lock_lock (lock); (void) __random_r (&unsafe_state, &retval); __libc_lock_unlock (lock); return retval; } weak_alias (__random, random) Rather obviously, if you're running enough pgbench threads, you're going to have a pretty ugly point of contention there. On the 32-core machine provided by Nate Boley, with my usual 5-minute SELECT-only test, lazy-vxid and sinval-fastmessages applied, and scale factor 100, "time" shows that pgbench uses almost as much system time as user time: $ time pgbench -n -S -T 300 -c 64 -j 64 transaction type: SELECT only scaling factor: 100 query mode: simple number of clients: 64 number of threads: 64 duration: 300 s number of transactions actually processed: 55319555 tps = 184396.016257 (including connections establishing) tps = 184410.926840 (excluding connections establishing) real 5m0.019s user 21m10.100s sys 17m45.480s I patched it to use random_r() - the patch is attached - and here are the (rather gratifying) results of that test: $ time ./pgbench -n -S -T 300 -c 64 -j 64 transaction type: SELECT only scaling factor: 100 query mode: simple number of clients: 64 number of threads: 64 duration: 300 s number of transactions actually processed: 71851589 tps = 239503.585813 (including connections establishing) tps = 239521.816698 (excluding connections establishing) real 5m0.016s user 20m40.880s sys 9m25.930s Since a client-limited benchmark isn't very interesting, I think this change makes sense. Thoughts? Objections? Coding style improvements? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Robert Haas <robertmhaas@gmail.com> writes: > On machines with lots of CPU cores, pgbench can start eating up a lot > of system time. Investigation reveals that the problem is with > random(), Interesting. > I patched it to use random_r() - the patch is attached - and here are > the (rather gratifying) results of that test: > Since a client-limited benchmark isn't very interesting, I think this > change makes sense. Thoughts? Objections? Portability, or rather lack of it. What about using erand48, which we already have a dependency on (and substitute code for)? regards, tom lane
On Fri, Jul 29, 2011 at 5:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On machines with lots of CPU cores, pgbench can start eating up a lot >> of system time. Investigation reveals that the problem is with >> random(), > > Interesting. > >> I patched it to use random_r() - the patch is attached - and here are >> the (rather gratifying) results of that test: >> Since a client-limited benchmark isn't very interesting, I think this >> change makes sense. Thoughts? Objections? > > Portability, or rather lack of it. What about using erand48, which we > already have a dependency on (and substitute code for)? Neither our implementation nor glibc's appears to be thread-safe, and erand48() is deprecated according to my Linux man page: NOTES These functions are declared obsolete by SVID 3, which states that rand(3) should be used instead. glibc provides erand48_r(), and I suppose we could kludge up something similar out of what's already in src/port? This is also not exactly the world's most sophisticated algorithm, but perhaps for pgbench that doesn't matter. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Jul 29, 2011 at 5:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Portability, or rather lack of it. �What about using erand48, which we >> already have a dependency on (and substitute code for)? > Neither our implementation nor glibc's appears to be thread-safe, I think you're confusing srand48 with erand48. The latter relies on a caller-supplied state value, so if it's not thread-safe the caller has only itself to blame. regards, tom lane
On Sat, Jul 30, 2011 at 2:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Jul 29, 2011 at 5:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Portability, or rather lack of it. What about using erand48, which we >>> already have a dependency on (and substitute code for)? > >> Neither our implementation nor glibc's appears to be thread-safe, > > I think you're confusing srand48 with erand48. The latter relies on a > caller-supplied state value, so if it's not thread-safe the caller has > only itself to blame. I may be confused, but I'm not mixing it up with srand48. On my Fedora 12 VM, the man page says for erand48_r says, in relevant part: SYNOPSIS #include <stdlib.h> int erand48_r(unsigned short xsubi[3], struct drand48_data *buffer, double *result); DESCRIPTION These functions are the reentrant analogs of the functions described in drand48(3). Instead of modifying the global random generator state, they use the supplied data buffer. And the glibc implementation of erand48 is this: double erand48 (xsubi) unsigned short int xsubi[3]; { double result; (void) __erand48_r (xsubi, &__libc_drand48_data, &result); return result; } On second look, I think our version is probably safe in this regard since it appears modify only the state passed in and not anything else. *pokes a little more* If I'm reading the code right, it only modifies __libc_drand48_data on first call, so as long as we called erand48() at least once before spawning the child threads, it would probably work. That seems pretty fragile in the face of the fact that they explicitly state that they're modifying the global random generator state and that you should use erand48_r() if you want reentrant behavior. So I think if we're going to go the erand48() route we probably ought to force pgbench to always use our version rather than any OS-supplied version. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 07/29/2011 04:00 PM, Robert Haas wrote: > On machines with lots of CPU cores, pgbench can start eating up a lot > of system time. Investigation reveals that the problem is with > random(), which glibc implements like this: > > long int > __random () > { > int32_t retval; > __libc_lock_lock (lock); > (void) __random_r (&unsafe_state,&retval); > __libc_lock_unlock (lock); > return retval; > } > weak_alias (__random, random) > > Rather obviously, if you're running enough pgbench threads, you're > going to have a pretty ugly point of contention there. On the 32-core > machine provided by Nate Boley, with my usual 5-minute SELECT-only > test, lazy-vxid and sinval-fastmessages applied, and scale factor 100, > "time" shows that pgbench uses almost as much system time as user > time: > > $ time pgbench -n -S -T 300 -c 64 -j 64 > transaction type: SELECT only > scaling factor: 100 > query mode: simple > number of clients: 64 > number of threads: 64 > duration: 300 s > number of transactions actually processed: 55319555 > tps = 184396.016257 (including connections establishing) > tps = 184410.926840 (excluding connections establishing) > > real 5m0.019s > user 21m10.100s > sys 17m45.480s > > I patched it to use random_r() - the patch is attached - and here are > the (rather gratifying) results of that test: > > $ time ./pgbench -n -S -T 300 -c 64 -j 64 > transaction type: SELECT only > scaling factor: 100 > query mode: simple > number of clients: 64 > number of threads: 64 > duration: 300 s > number of transactions actually processed: 71851589 > tps = 239503.585813 (including connections establishing) > tps = 239521.816698 (excluding connections establishing) > > real 5m0.016s > user 20m40.880s > sys 9m25.930s > > Since a client-limited benchmark isn't very interesting, I think this > change makes sense. Thoughts? Objections? Coding style > improvements? > > > > > How much randomness do we really need for test data. What if it where changed to more of a random starting point and thenautoinc'd after that. Or if there were two func's, a rand() and a next(). If your test really needs randomness userand(), otherwise use next(), it would be way faster, and you dont really care what the number is anyway. -Andy
On Jul 30, 2011, at 9:40 AM, Andy Colson <andy@squeakycode.net> wrote: > On 07/29/2011 04:00 PM, Robert Haas wrote: >> On machines with lots of CPU cores, pgbench can start eating up a lot >> of system time. Investigation reveals that the problem is with >> random(), which glibc implements like this: >> >> long int >> __random () >> { >> int32_t retval; >> __libc_lock_lock (lock); >> (void) __random_r (&unsafe_state,&retval); >> __libc_lock_unlock (lock); >> return retval; >> } >> weak_alias (__random, random) >> >> Rather obviously, if you're running enough pgbench threads, you're >> going to have a pretty ugly point of contention there. On the 32-core >> machine provided by Nate Boley, with my usual 5-minute SELECT-only >> test, lazy-vxid and sinval-fastmessages applied, and scale factor 100, >> "time" shows that pgbench uses almost as much system time as user >> time: >> >> $ time pgbench -n -S -T 300 -c 64 -j 64 >> transaction type: SELECT only >> scaling factor: 100 >> query mode: simple >> number of clients: 64 >> number of threads: 64 >> duration: 300 s >> number of transactions actually processed: 55319555 >> tps = 184396.016257 (including connections establishing) >> tps = 184410.926840 (excluding connections establishing) >> >> real 5m0.019s >> user 21m10.100s >> sys 17m45.480s >> >> I patched it to use random_r() - the patch is attached - and here are >> the (rather gratifying) results of that test: >> >> $ time ./pgbench -n -S -T 300 -c 64 -j 64 >> transaction type: SELECT only >> scaling factor: 100 >> query mode: simple >> number of clients: 64 >> number of threads: 64 >> duration: 300 s >> number of transactions actually processed: 71851589 >> tps = 239503.585813 (including connections establishing) >> tps = 239521.816698 (excluding connections establishing) >> >> real 5m0.016s >> user 20m40.880s >> sys 9m25.930s >> >> Since a client-limited benchmark isn't very interesting, I think this >> change makes sense. Thoughts? Objections? Coding style >> improvements? >> >> >> >> >> > How much randomness do we really need for test data. What if it where changed to more of a random starting point and thenautoinc'd after that. Or if there were two func's, a rand() and a next(). If your test really needs randomness userand(), otherwise use next(), it would be way faster, and you dont really care what the number is anyway. Well, I think you need at least pseudo-randomness for pgbench - reading the table in sequential order is not going to performthe same as doing random fetches against it. ...Robert
On Sat, Jul 30, 2011 at 9:08 AM, Robert Haas <robertmhaas@gmail.com> wrote: > If I'm reading the code right, it only modifies __libc_drand48_data on > first call, so as long as we called erand48() at least once before > spawning the child threads, it would probably work. That seems pretty > fragile in the face of the fact that they explicitly state that > they're modifying the global random generator state and that you > should use erand48_r() if you want reentrant behavior. So I think if > we're going to go the erand48() route we probably ought to force > pgbench to always use our version rather than any OS-supplied version. Attached is a try at that approach. Performance benefits are similar to before. Same test case as in my OP on this thread, alternating runs without and with this patch: tps = 199133.418419 (including connections establishing) real 5m0.017s user 23m42.170s sys 18m46.270s tps = 226202.289151 (including connections establishing) real 5m0.018s user 22m7.520s sys 9m54.570s tps = 191144.247489 (including connections establishing) real 5m0.025s user 23m35.200s sys 17m19.070s tps = 226353.187955 (including connections establishing) real 5m0.017s user 21m42.300s sys 10m9.820s tps = 189602.248908 (including connections establishing) real 5m0.044s user 23m24.060s sys 17m1.050s tps = 224521.459164 (including connections establishing) real 5m0.017s user 22m9.620s sys 10m22.590s -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Robert Haas <robertmhaas@gmail.com> writes: > If I'm reading the code right, it only modifies __libc_drand48_data on > first call, so as long as we called erand48() at least once before > spawning the child threads, it would probably work. That seems pretty > fragile in the face of the fact that they explicitly state that > they're modifying the global random generator state and that you > should use erand48_r() if you want reentrant behavior. So I think if > we're going to go the erand48() route we probably ought to force > pgbench to always use our version rather than any OS-supplied version. Or add erand48_r() to our version and use that. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Jul 30, 2011 at 9:08 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> If I'm reading the code right, it only modifies __libc_drand48_data on >> first call, so as long as we called erand48() at least once before >> spawning the child threads, it would probably work. �That seems pretty >> fragile in the face of the fact that they explicitly state that >> they're modifying the global random generator state and that you >> should use erand48_r() if you want reentrant behavior. �So I think if >> we're going to go the erand48() route we probably ought to force >> pgbench to always use our version rather than any OS-supplied version. > Attached is a try at that approach. I don't find this to be a particularly safe idea: > #ifndef HAVE_ERAND48 > -/* we assume all of these are present or missing together */ > -extern double erand48(unsigned short xseed[3]); > -extern long lrand48(void); > -extern void srand48(long seed); > +#define erand48(x) pg_erand48(x) > +#define lrand48() pg_lrand48() > +#define srand48(x) pg_srand48(x) > #endif See our recent trials with python headers for an example of why this might cause problems on some platforms. For that matter, I believe <stdlib.h> would be within its rights to be defining these names as macros already. If you want erand48_r, best to provide that API, not kluge up some other functions. regards, tom lane
On 07/30/2011 09:08 AM, Robert Haas wrote: > If I'm reading the code right, it only modifies __libc_drand48_data on > first call, so as long as we called erand48() at least once before > spawning the child threads, it would probably work. That seems pretty > fragile in the face of the fact that they explicitly state that > they're modifying the global random generator state and that you > should use erand48_r() if you want reentrant behavior. So I think if > we're going to go the erand48() route we probably ought to force > pgbench to always use our version rather than any OS-supplied version. > By the way: the landmines in this whole area are what sunk the attempt to switch pgbench over to using 64 bits for the accounts table back in February. See the last few messages of http://postgresql.1045698.n5.nabble.com/Re-PERFORM-pgbench-to-the-MAXINT-td3337374.html to revisit. I think you've retraced all of that now, but double checking your plan against things like the AIX specific weirdness I pointed out there may be useful. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
On Mon, Aug 1, 2011 at 11:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Attached is a try at that approach. > > I don't find this to be a particularly safe idea: > >> #ifndef HAVE_ERAND48 >> -/* we assume all of these are present or missing together */ >> -extern double erand48(unsigned short xseed[3]); >> -extern long lrand48(void); >> -extern void srand48(long seed); >> +#define erand48(x) pg_erand48(x) >> +#define lrand48() pg_lrand48() >> +#define srand48(x) pg_srand48(x) >> #endif > > See our recent trials with python headers for an example of why this > might cause problems on some platforms. For that matter, I believe > <stdlib.h> would be within its rights to be defining these names as > macros already. If HAVE_ERAND48 isn't defined and erand48() is nevertheless defined as a macro, it's a bug in the autoconf test for erand48(). The whole point is to do that only when there isn't any erand48(). But we could dodge the issue at any rate by making erand48(), lrand48(), and srand48() functions that call pg_erand48(), pg_lrand48(), and pg_srand48() rather than macros. And I'd rather do that than this, honestly: > If you want erand48_r, best to provide that API, not kluge up some > other functions. ...because erand48() is a GNU extension with a stupid API. I don't see much value in supporting that, on both counts. We're going to end up with the built-in erand48_r() on precisely those systems that use glibc, and our own everywhere else. For the 25 SLOCs it's going cost us, I'd rather use the same code everywhere. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: >> If you want erand48_r, best to provide that API, not kluge up some >> other functions. > ...because erand48() is a GNU extension with a stupid API. I assume you mean erand48_r, there, because erand48 is pretty standard. > I don't > see much value in supporting that, on both counts. We're going to end > up with the built-in erand48_r() on precisely those systems that use > glibc, and our own everywhere else. For the 25 SLOCs it's going cost > us, I'd rather use the same code everywhere. Maybe. But if that's the approach we want to use, let's just call it pg_erand48 in the code, and dispense with the alias macros as well as all vestiges of configure support. BTW, as far as the original plan of using random_r is concerned, how did you manage to not run into this? http://sourceware.org/bugzilla/show_bug.cgi?id=3662 I just wasted half an hour on that stupidity in an unrelated context... regards, tom lane
On Tue, Aug 2, 2011 at 8:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >>> If you want erand48_r, best to provide that API, not kluge up some >>> other functions. > >> ...because erand48() is a GNU extension with a stupid API. > > I assume you mean erand48_r, there, because erand48 is pretty standard. Yes. >> I don't >> see much value in supporting that, on both counts. We're going to end >> up with the built-in erand48_r() on precisely those systems that use >> glibc, and our own everywhere else. For the 25 SLOCs it's going cost >> us, I'd rather use the same code everywhere. > > Maybe. But if that's the approach we want to use, let's just call it > pg_erand48 in the code, and dispense with the alias macros as well as > all vestiges of configure support. Works for me. Just to confirm, that means we'd also change GEQO to use pg_erand48(). > BTW, as far as the original plan of using random_r is concerned, how > did you manage to not run into this? > http://sourceware.org/bugzilla/show_bug.cgi?id=3662 > I just wasted half an hour on that stupidity in an unrelated context... Good grief. It's hard to imagine a more user-hostile attitude than the one taken there. I did run into that precise issue, but managed to stumble on what is apparently the officially sanctioned method of working around it - viz, zeroing the state array. However, that bug report is a pretty compelling argument for the position that expecting any of the GNU blah_r() functions to behave halfway sanely or be properly documented is a pipe dream. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Aug 2, 2011 at 8:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Maybe. �But if that's the approach we want to use, let's just call it >> pg_erand48 in the code, and dispense with the alias macros as well as >> all vestiges of configure support. > Works for me. Just to confirm, that means we'd also change GEQO to > use pg_erand48(). Right, that's what I was thinking. regards, tom lane
Robert Haas wrote: > Works for me. Just to confirm, that means we'd also change GEQO to > use pg_erand48(). > > > BTW, as far as the original plan of using random_r is concerned, how > > did you manage to not run into this? > > http://sourceware.org/bugzilla/show_bug.cgi?id=3662 > > I just wasted half an hour on that stupidity in an unrelated context... > > Good grief. It's hard to imagine a more user-hostile attitude than > the one taken there. I did run into that precise issue, but managed Yes, it is scary to think people are relying on software written by people which such a poor attitude toward quality. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +