Re: pgbench randomness initialization

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: pgbench randomness initialization
Date: 2016-04-07 09:56:12
Message-ID: alpine.DEB.2.10.1604071147420.11001@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Andres,

> et al I was wondering why it's a good idea for pgbench to do
> INSTR_TIME_SET_CURRENT(start_time);
> srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
> to initialize randomness and then
> for (i = 0; i < nthreads; i++)
> thread->random_state[0] = random();
> thread->random_state[1] = random();
> thread->random_state[2] = random();
> to initialize the individual thread random state which is then used by
> pg_erand48().
>
> To me it seems better to instead initialize srandom() with a known value
> (say, uh, 0). Or even better don't use random() at all, and fill a
> global pg_erand48() with a known state; and use pg_erand48() to
> initialize the thread states.
>
> Obviously that doesn't make pgbench entirely reproducible, but it seems
> a lot better than now. Individual threads would do work in a
> reproducible order.
>
> I see very little reason to have the current behaviour, or at the very
> least not by default.

I think that it depends on what you want, which may vary:

(1) "exactly" reproducible runs, but one run may hit a particular
steady state not representative of what happens in general.

(2) runs which really vary from one to the next, so as
to have an idea about how much it may vary, what is the
performance stability.

Currently pgbench focusses on (2), which may or may not be fine depending
on what you are doing. From a personal point of view I think that (2) is
more significant to collect performance data, even if the results are more
unstable: that simply reflects reality and its intrinsic variations, so
I'm fine that as the default.

Now for those interested in (1) for some reason, I would suggest to rely a
PGBENCH_RANDOM_SEED environment variable or --random-seed option which
could be used to have a oxymoronic "deterministic randomness", if desired.
I do not think that it should be the default, though.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-04-07 10:02:15 Re: pgbench randomness initialization
Previous Message postgres_sure 2016-04-07 09:39:34 Why the "UPDATE tab SET tab.col" is invalid?