Обсуждение: pgsql: Make pgbench use erand48() rather than random().

Поиск
Список
Период
Сортировка

pgsql: Make pgbench use erand48() rather than random().

От
Robert Haas
Дата:
Make pgbench use erand48() rather than random().

glibc renders random() thread-safe by wrapping a futex lock around it;
testing reveals that this limits the performance of pgbench on machines
with many CPU cores.  Rather than switching to random_r(), which is
only available on GNU systems and crashes unless you use undocumented
alchemy to initialize the random state properly, switch to our built-in
implementation of erand48(), which is both thread-safe and concurrent.

Since the list of reasons not to use the operating system's erand48()
is getting rather long, rename ours to pg_erand48() (and similarly
for our implementations of lrand48() and srand48()) and just always
use those.  We were already doing this on Cygwin anyway, and the
glibc implementation is not quite thread-safe, so pgbench wouldn't
be able to use that either.

Per discussion with Tom Lane.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/4af43ee3f165c8e4b332a7e680a44f4b7ba2d3c1

Modified Files
--------------
configure                                   |   14 +-----------
configure.in                                |    8 +------
contrib/pgbench/pgbench.c                   |   32 ++++++++++++--------------
src/backend/optimizer/geqo/geqo_random.c    |    2 +-
src/backend/optimizer/geqo/geqo_selection.c |    6 ++--
src/include/optimizer/geqo.h                |    2 +-
src/include/pg_config.h.in                  |    3 --
src/include/port.h                          |    9 ++-----
src/port/Makefile                           |    4 +-
src/port/erand48.c                          |   18 +++++++++------
src/port/random.c                           |    2 +-
src/port/srandom.c                          |    2 +-
12 files changed, 40 insertions(+), 62 deletions(-)


Re: pgsql: Make pgbench use erand48() rather than random().

От
Tom Lane
Дата:
Robert Haas <rhaas@postgresql.org> writes:
> Make pgbench use erand48() rather than random().

Hmm.  I find the pgbench part of this a bit questionable, specifically
your decision to remove the code around line 2590 that installed a
variable srandom() seed in child processes.  That would be okay if we
were no longer depending on random() at all, but you are still depending
on it to initialize the "per thread" state near line 2240.  AFAICS, this
will result in each child process generating the same random
sequence(s), which is exactly what we don't want.

IOW, please put this back:

-   /*
-    * Set a different random seed in each child process.  Otherwise they all
-    * inherit the parent's state and generate the same "random" sequence. (In
-    * the threaded case, the different threads will obtain subsets of the
-    * output of a single random() sequence, which should be okay for our
-    * purposes.)
-    */
-   INSTR_TIME_SET_CURRENT(start_time);
-   srandom(((unsigned int) INSTR_TIME_GET_MICROSEC(start_time)) +
-           ((unsigned int) getpid()));

or provide a credible explanation why we don't need it.

(I guess the comment needs some work, though.)

            regards, tom lane

Re: pgsql: Make pgbench use erand48() rather than random().

От
Robert Haas
Дата:
On Wed, Aug 3, 2011 at 5:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <rhaas@postgresql.org> writes:
>> Make pgbench use erand48() rather than random().
>
> Hmm.  I find the pgbench part of this a bit questionable, specifically
> your decision to remove the code around line 2590 that installed a
> variable srandom() seed in child processes.  That would be okay if we
> were no longer depending on random() at all, but you are still depending
> on it to initialize the "per thread" state near line 2240.  AFAICS, this
> will result in each child process generating the same random
> sequence(s), which is exactly what we don't want.
>
> IOW, please put this back:
>
> -   /*
> -    * Set a different random seed in each child process.  Otherwise they all
> -    * inherit the parent's state and generate the same "random" sequence. (In
> -    * the threaded case, the different threads will obtain subsets of the
> -    * output of a single random() sequence, which should be okay for our
> -    * purposes.)
> -    */
> -   INSTR_TIME_SET_CURRENT(start_time);
> -   srandom(((unsigned int) INSTR_TIME_GET_MICROSEC(start_time)) +
> -           ((unsigned int) getpid()));
>
> or provide a credible explanation why we don't need it.

Unless I'm asleep at the switch, the srandom() calls you're worrying
about execute in the parent thread, which still makes its own call to
srandom().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pgsql: Make pgbench use erand48() rather than random().

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Aug 3, 2011 at 5:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hmm.  I find the pgbench part of this a bit questionable, specifically
>> your decision to remove the code around line 2590 that installed a
>> variable srandom() seed in child processes.

> Unless I'm asleep at the switch, the srandom() calls you're worrying
> about execute in the parent thread, which still makes its own call to
> srandom().

[ looks more closely... ]  OK, you're right: the per-"thread" random
seeds are initialized in the parent process and then propagated to child
processes by fork(); we'll never call random() in the children so
there's no need to make their states diverge.  Never mind.

I do however notice a vestigial reference to MAX_RANDOM_VALUE at line
1063, which we probably should get rid of.  What I think we probably
need instead, and don't have, is a check that "max - min + 1" doesn't
overflow.

            regards, tom lane

Re: pgsql: Make pgbench use erand48() rather than random().

От
Robert Haas
Дата:
On Wed, Aug 3, 2011 at 6:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I do however notice a vestigial reference to MAX_RANDOM_VALUE at line
> 1063, which we probably should get rid of.  What I think we probably
> need instead, and don't have, is a check that "max - min + 1" doesn't
> overflow.

Seems reasonable.  Do you want to take care of that, or shall I do it?

Off-hand, the easiest approach seems to be to define SAMESIGN in
pgbench.c using the same definition we employ in various other places,
and then copy the test we use in int4pl().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company