Обсуждение: random() function produces wrong range
The comment in the random() function indicates that its author thought it'd produce output in the range 0..1, which seems like a pretty reasonable definition: /* result 0.0-1.0 */ result = ((double) random()) / RAND_MAX; Unfortunately, at least on my box, it produces no such thing. random() actually yields values in the range 0..2^31-1 --- while RAND_MAX is only 32767, because it applies to the rand() function not random(). So what I actually get is floating-point output in the range 0..65535. regression=# select random(); random ------------------ 35771.3981139561 (1 row) regression=# select random(); random ------------------ 58647.5821405683 (1 row) This is, to say the least, a bizarre definition. I would like to propose changing the code to /* result 0.0-1.0 */ result = ((double) random()) / INT_MAX; (and making the corresponding change in setseed()). But I wonder if anyone out there has applications that depend on the current behavior. As far as I can find, random() isn't mentioned in the documentation currently, so there probably aren't many people using it... regards, tom lane
On Tue, 1 Aug 2000, Tom Lane wrote: > The comment in the random() function indicates that its author thought > it'd produce output in the range 0..1, which seems like a pretty > reasonable definition: > > /* result 0.0-1.0 */ > result = ((double) random()) / RAND_MAX; > > Unfortunately, at least on my box, it produces no such thing. random() > actually yields values in the range 0..2^31-1 --- while RAND_MAX is > only 32767, because it applies to the rand() function not random(). > I would like to propose changing the code to > > /* result 0.0-1.0 */ > result = ((double) random()) / INT_MAX; > > (and making the corresponding change in setseed()). But I wonder if > anyone out there has applications that depend on the current behavior. Actually, on my machines, both man pages for rand() and random() say they return values between 0 and RAND_MAX (whether that's true or not is another matter). In my case RAND_MAX==INT_MAX so the change wouldn't be a problem, but it might be problematic on some of the 64 bit machines.
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > Actually, on my machines, both man pages for rand() and random() say > they return values between 0 and RAND_MAX (whether that's true or not > is another matter). In my case RAND_MAX==INT_MAX so the change wouldn't > be a problem, but it might be problematic on some of the 64 bit machines. Oh, that's interesting. What platform do you use? If RAND_MAX applies to random() on some machines that'd probably explain why the code is written like it is. But on my box (HPUX) the rand() function is old and crufty and considerably different from random(). regards, tom lane
What build of postgres is this running on. I've tried this on 6x and it doesn't seem to work. ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Stephan Szabo" <sszabo@megazone23.bigpanda.com> Cc: <pgsql-hackers@postgreSQL.org>; <pgsql-general@postgreSQL.org> Sent: Tuesday, August 01, 2000 11:23 AM Subject: Re: [GENERAL] random() function produces wrong range > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > Actually, on my machines, both man pages for rand() and random() say > > they return values between 0 and RAND_MAX (whether that's true or not > > is another matter). In my case RAND_MAX==INT_MAX so the change wouldn't > > be a problem, but it might be problematic on some of the 64 bit machines. > > Oh, that's interesting. What platform do you use? If RAND_MAX applies > to random() on some machines that'd probably explain why the code is > written like it is. But on my box (HPUX) the rand() function is old > and crufty and considerably different from random(). > > regards, tom lane >
"Mike Sears" <msears@vianet.ca> writes: > What build of postgres is this running on. I think random() is new in 7.0. regards, tom lane
On Tue, 1 Aug 2000, Tom Lane wrote: > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > Actually, on my machines, both man pages for rand() and random() say > > they return values between 0 and RAND_MAX (whether that's true or not > > is another matter). In my case RAND_MAX==INT_MAX so the change wouldn't > > be a problem, but it might be problematic on some of the 64 bit machines. > > Oh, that's interesting. What platform do you use? If RAND_MAX applies > to random() on some machines that'd probably explain why the code is > written like it is. But on my box (HPUX) the rand() function is old > and crufty and considerably different from random(). That's from a pair of linux boxes, although checking on a FreeBSD box a friend has, his boxes man pages show the range as explicitly 0 to 2^31-1 as your box does.
Stephan Szabo wrote: > > On Tue, 1 Aug 2000, Tom Lane wrote: > > > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > > Actually, on my machines, both man pages for rand() and random() say > > > they return values between 0 and RAND_MAX (whether that's true or not > > > is another matter). In my case RAND_MAX==INT_MAX so the change wouldn't > > > be a problem, but it might be problematic on some of the 64 bit machines. > > > > Oh, that's interesting. What platform do you use? If RAND_MAX applies > > to random() on some machines that'd probably explain why the code is > > written like it is. But on my box (HPUX) the rand() function is old > > and crufty and considerably different from random(). > > That's from a pair of linux boxes, although checking on a FreeBSD box a > friend has, his boxes man pages show the range as explicitly 0 to 2^31-1 > as your box does. On my SCO 5.0.4 box, rand() from the man page... The rand function uses a multiplicative congruential random-number generator with period 2^32 that returns successive pseudo-random numbers in the range from 0 to (2^15)-1. The following functions define the semantics of the functions rand and srand. static unsigned long int next = 1; int rand() { next = next * 1103515245 + 12345; return ((unsigned int)(next/65536) % 32768); } void srand(seed) unsigned int seed; { next = seed; } -- Dave Smith Candata Systems Ltd. (416) 493-9020 dave@candata.com
> The comment in the random() function indicates that its author thought > it'd produce output in the range 0..1, which seems like a pretty > reasonable definition: > Unfortunately, at least on my box, it produces no such thing. random() > actually yields values in the range 0..2^31-1 --- while RAND_MAX is > only 32767, because it applies to the rand() function not random(). > So what I actually get is floating-point output in the range 0..65535. > This is, to say the least, a bizarre definition. Or, a bizarre machine. Linux (where I did the testing) produces the expected result. > I would like to propose changing the code to > /* result 0.0-1.0 */ > result = ((double) random()) / INT_MAX; Erk... Actually, I depend on the behavior being as advertised, which is what you would have expected. This is true on Linux boxes from RedHat-5.2 to Mandrake-7.1. Not sure why your machine is different, but if there is a more portable way to define this let's find it. Otherwise, get used to #ifdef HPUX ;) The Linux man pages indicate that the behavior and underlying implementation of random() and rand() are the same (so I just picked one). Would it be better to try using rand() instead? - Thomas
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: > The Linux man pages indicate that the behavior and underlying > implementation of random() and rand() are the same (so I just picked > one). Ah, well, there's your problem. Whoever did this part of the library on Linux took shortcuts. On older-line systems, rand() is a considerably older and crummier generator than random(). It would definitely not be a wise decision to use rand() instead. It appears that on SysV-heritage machines, rand() delivers 15-bit results (which is what I'm getting) whereas on BSD-heritage platforms it produces 31-bit results. But even the BSD machines say The spectral properties of rand() leave a great deal to be desired. drand48(3) and random(3) provide much better, though more elaborate, random-number generators. (quote from SunOS 4.1 man page for rand()). I believe using random() is the right thing. The portability bug here is the assumption that RAND_MAX applies to random() (or is even defined; none of the man pages I've looked at so far mention it). But all the machines say that the output of random() is 31 bits, so INT_MAX should work. regards, tom lane
Malcolm Beattie <mbeattie@sable.ox.ac.uk> writes: > Tom Lane writes: >> Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >>>> The Linux man pages indicate that the behavior and underlying >>>> implementation of random() and rand() are the same (so I just picked >>>> one). >> >> Ah, well, there's your problem. Whoever did this part of the library >> on Linux took shortcuts. > Why? Linux is compliant with the Single Unix Standard v2 from my brief > reading of it. Sorry, bad choice of words on my part. Linux is within its rights to use the same underlying implementation for rand() and random(), but it is a poor guide to the behavior of other systems, which are equally within their rights not to. >> none of the man pages I've looked at so far mention it). But all the >> machines say that the output of random() is 31 bits, so INT_MAX should >> work. > SuSv2 says explicitly 2^31-1 so you should use that, otherwise you'll > be non-portable to platforms with 64-bit ints, for example. Maybe. You don't think that a 64-bit-int platform would choose to supply a random() function with a range of 2^63-1? The HPUX and SunOS man pages clearly specify that random()'s result is "long", so I think a case could also be made for LONG_MAX. I suspect we have a good chance at getting burned no matter what we use :-(. But RAND_MAX is definitely the wrong thing. regards, tom lane
>> none of the man pages I've looked at so far mention it). But all the
>> machines say that the output of random() is 31 bits, so INT_MAX should
>> work.
> SuSv2 says explicitly 2^31-1 so you should use that, otherwise you'll
> be non-portable to platforms with 64-bit ints, for example.
Maybe. You don't think that a 64-bit-int platform would choose to
supply a random() function with a range of 2^63-1? The HPUX and SunOS
man pages clearly specify that random()'s result is "long", so I think
a case could also be made for LONG_MAX.
I suspect we have a good chance at getting burned no matter what we use
:-(. But RAND_MAX is definitely the wrong thing.
Is it possible to test (during configure phase) and then go from there... or does it need to be the same for all platforms?
-
- Thomas Swan
- Graduate Student - Computer Science
- The University of Mississippi
-
- "People can be categorized into two fundamental
- groups, those that divide people into two groups
- and those that don't."
Thomas Swan <tswan@olemiss.edu> writes: >> I suspect we have a good chance at getting burned no matter what we use >> :-(. But RAND_MAX is definitely the wrong thing. > Is it possible to test (during configure phase) and then go from there... > or does it need to be the same for all platforms? I thought about that last night. We could do a configure test. Since it'd be probing random() results there'd be a small probability of failure, but if we wire in an assumption that the max value must be 2^(15 + n*16)-1 for some n, ten or so probes would give us a failure probability on the order of 2^-160, which ought to satisfy anyone. However, in the absence of any hard evidence that there are platforms where the value is different from 2^31-1, it's probably just a waste of configuration cycles at the moment. I suggest we add a config.h constant like /* The local random() function yields values 0 .. MAX_RANDOM_VALUE */ #define MAX_RANDOM_VALUE <2^31-1> and use that in the code. Then, if we ever find a platform where random() does actually produce 64-bit results, it'll be time enough to crank up a real configure test to set the value. Comments? regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- >>>>> "Thomas" == Thomas Swan <tswan@olemiss.edu> writes: Thomas> Is it possible to test (during configure phase) and then Thomas> go from there... or does it need to be the same for all Thomas> platforms? You should be able to test, but it is, of course, a statistical test. Call random() several times and test the maximum value against your thresholds of 2^15 and 2^31. If random() is generating values in the range 1:2^31-1, you would expect half of your values to be greater than 2^15-1; more importantly, if you generate, say, 10 values, you expect only a 1:1024 chance that they are all below 2^15. If those odds aren't good enough, generate more. roland - -- PGP Key ID: 66 BC 3B CD Roland B. Roberts, PhD Unix Software Solutions roberts@panix.com 76-15 113th Street, Apt 3B rbroberts@acm.org Forest Hills, NY 11375 -----BEGIN PGP SIGNATURE----- Version: 2.6.3a Charset: noconv Comment: Processed by Mailcrypt 3.5.4, an Emacs/PGP interface iQCVAwUBOYmNSeoW38lmvDvNAQHFegQAlexEvaG0t+1H0IkPWikbdMUIck1fE0DI rfcGi1M/SQ6K9Hmvi1HB/SVEU4DKGaHDoqlU7ei78OgOzchbsLL5cqAJNIsKv5QJ F4u/A0Fg7yuyRZ3/CNCo0+nTwhyDANktMw78AM5ssHCs75UxC+vHWibHHBmXJzrF 8WD2LyjPSNI= =dR25 -----END PGP SIGNATURE-----
Roland Roberts <roberts@panix.com> writes: > Call random() several times and test the maximum value against your > thresholds of 2^15 and 2^31. If random() is generating values in the > range 1:2^31-1, you would expect half of your values to be greater > than 2^15-1; more importantly, if you generate, say, 10 values, you > expect only a 1:1024 chance that they are all below 2^15. Actually the odds are far better than that. If the range is 2^31-1 then only about 2^-16th of the outputs should be less than 2^15. So ten probes gives you a failure probability of about 2^-160 not 2^-10. Generalizing, you could tell the difference between widths of 31, 47, or 63 bits with the same level of reliability. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- >>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> Actually the odds are far better than that. If the range is Tom> 2^31-1 then only about 2^-16th of the outputs should be less Tom> than 2^15. So ten probes gives you a failure probability of Tom> about 2^-160 not 2^-10. Oops, 2^16 != 2^32 / 2. So a dynamic test is not only possible but wouldn't cost to much at configure time. roland - -- PGP Key ID: 66 BC 3B CD Roland B. Roberts, PhD Unix Software Solutions roberts@panix.com 76-15 113th Street, Apt 3B rbroberts@acm.org Forest Hills, NY 11375 -----BEGIN PGP SIGNATURE----- Version: 2.6.3a Charset: noconv Comment: Processed by Mailcrypt 3.5.4, an Emacs/PGP interface iQCVAwUBOYmtf+oW38lmvDvNAQGWYwP/eXRtrDPu/xN+W9pCd9y34d4jbrPH7jku nBAuSYtCRyoMgTkjdCtqThzq3vzPLDwfmOZcmWP8W5AmQPJjvcdFwI7y1XgGlaxd aAIlqqf+TTkZwIUh2vnWTuu5JKkiAZI6UuzNSzy79O/frxKE2y97zCuMw02I0kMK iGNSybN3L5w= =36yP -----END PGP SIGNATURE-----
Tom Lane <tgl@sss.phg.pa.us> writes: > Thomas Lockhart <lockhart@alumni.caltech.edu> writes: > > The Linux man pages indicate that the behavior and underlying > > implementation of random() and rand() are the same (so I just picked > > one). > > Ah, well, there's your problem. Whoever did this part of the library > on Linux took shortcuts. On older-line systems, rand() is a > considerably older and crummier generator than random(). It would > definitely not be a wise decision to use rand() instead. > > I believe using random() is the right thing. The portability bug here > is the assumption that RAND_MAX applies to random() (or is even defined; > none of the man pages I've looked at so far mention it). But all the > machines say that the output of random() is 31 bits, so INT_MAX should > work. On an i386 machine, certainly; but not on an Alpha or a Sparc. Probably safer to use (2**31)-1, which is what my (NetBSD) man page says. FWIW, I believe random(3) running on NetBSD/alpha, for example, will return a 31-bit result. Chris -- ---------------------------------------------------- cjones@rightnowtech.com Chris Jones System Administrator, RightNow Technologies "Is this going to be a stand-up programming session, sir, or another bug hunt?"
On Thu, Aug 03, 2000 at 11:45:39AM -0400, Tom Lane wrote: > Actually the odds are far better than that. If the range is 2^31-1 > then only about 2^-16th of the outputs should be less than 2^15. > So ten probes gives you a failure probability of about 2^-160 not > 2^-10. It occurs to me that Perl has to provide a portable rand function, so I looked at how its Configure script works. It's pretty much what you've been discussing. First it checks for a couple of possible random functions (preferring drand48(), then random(), then bitching and using rand()). int main() { register int i; register unsigned long tmp; register unsigned long max = 0L; for (i = 1000; i; i--) { tmp = (unsigned long) $randfunc(); if (tmp > max) max = tmp; } for (i = 0; max; i++) max /= 2; printf("%d\n",i); } Oh well. -- Christopher Masto Senior Network Monkey NetMonger Communications chris@netmonger.net info@netmonger.net http://www.netmonger.net Free yourself, free your machine, free the daemon -- http://www.freebsd.org/