Re: Re: [HACKERS] pgbench randomness initialization

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Chapman Flack <chap(at)anastigmatix(dot)net>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [HACKERS] pgbench randomness initialization
Date: 2018-01-09 13:42:31
Message-ID: alpine.DEB.2.20.1801091027030.6718@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Chapman,

Thanks for the review,

>> The tests assume that stdlib random/srandom behavior is standard thus
>> deterministic between platform.
>
> Is the behavior of srandom() and the system generator really so precisely
> specified that seed 5432 will produce the same values hardcoded in the
> tests on all platforms? [...]

Good question.

I'm hoping that in practice it would be the same, or that their would be
very few cases (eg linux vs windows vs macos...). I was counting on the
the buildfarm to tell me if I'm wrong, and fix it if needed.

> To have the test run pgbench twice with the same seed and compare the
> results sounds safer.

Interesting idea. The test script would be more complicated to do that,
though. I would prefer to bet on "random" determinism (:-) and resort to
such a solution only if it is not the case. Or maybe just put back some
"\d+" to keep it simple.

This is a debatable strategy.

> I did some wordsmithing of the doc, which I hope was not presumptuous of
> me, only as a conversation starter. [...]

Thanks for the doc improvements.

> The documentation doesn't say that --random-seed= (or PGBENCH_RANDOM_SEED=)
> will have the same effect as 'time', and indeed, I really think it should
> be unset (defaulting to 'time'), or 'time', or 'rand', or an integer,
> or an error.

Ok, done.

> The error, right now, says only "expecting an unsigned integer"; it
> should also mention time and rand.

Ok, done.

> Should 'rand' be something that conveys more about its meaning, 'strong'
> perhaps?

Hmmm. "Random means random":-). I have no opinion about whether it would
be better. ISTM that "strong" would require some explanations. I let is as
"rand" for now.

> The documentation doesn't mention the allowed range of the unsigned
> integer (which I get, because 'unsigned int' is exactly the signature
> for srandom, but somebody might read "unsigned integer" in a more
> generic sense).

Ok. I extended so that it works with octal, decimal and hexadecimal, and
updated the doc accordingly. I did not add range information though.

> I'm not sure what would be a better way to say it.
> The base (only decimal, as now in the code) isn't specified either.

Sure.

> Maybe the documentation should mention that the output now includes the
> random seed being used, so that (even without planning ahead) [...]

It does so only if the seed is explicitely set, otherwise it keeps the
previous behavior of being silent. I added a sentence about that.

> Something more may need to be said in the doc about reproducibility. I think
> the real story is that a run can be reproduced if the number of clients is
> equal to the number of threads.

Yes, good point. I tried to hide the issue under the "as far as random
numbers are concerned". I've tried to improve this point in the doc.

> Although each thread has its own generator state, each client does not
> (if there is more than one per thread), and as soon as real select()
> delays start happening in CSTATE_WAIT_RESULT, the clients dealt out to a
> given thread may not be hitting that thread's generator in a
> deterministic order.

Yep. This may evolve, for instance the error handling patch needs to
restart transactions so it adds a state into the client.

--
Fabien.

Attachment Content-Type Size
pgbench-seed-5.patch text/x-diff 7.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2018-01-09 13:46:24 Re: pgbench - add \if support
Previous Message Vik Fearing 2018-01-09 13:17:42 Re: proposal: alternative psql commands quit and exit