|From:||Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>|
|To:||Chapman Flack <chap(at)anastigmatix(dot)net>|
|Cc:||Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: Re: [HACKERS] pgbench randomness initialization|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
Thanks for the review,
>> The tests assume that stdlib random/srandom behavior is standard thus
>> deterministic between platform.
> Is the behavior of srandom() and the system generator really so precisely
> specified that seed 5432 will produce the same values hardcoded in the
> tests on all platforms? [...]
I'm hoping that in practice it would be the same, or that their would be
very few cases (eg linux vs windows vs macos...). I was counting on the
the buildfarm to tell me if I'm wrong, and fix it if needed.
> To have the test run pgbench twice with the same seed and compare the
> results sounds safer.
Interesting idea. The test script would be more complicated to do that,
though. I would prefer to bet on "random" determinism (:-) and resort to
such a solution only if it is not the case. Or maybe just put back some
"\d+" to keep it simple.
This is a debatable strategy.
> I did some wordsmithing of the doc, which I hope was not presumptuous of
> me, only as a conversation starter. [...]
Thanks for the doc improvements.
> The documentation doesn't say that --random-seed= (or PGBENCH_RANDOM_SEED=)
> will have the same effect as 'time', and indeed, I really think it should
> be unset (defaulting to 'time'), or 'time', or 'rand', or an integer,
> or an error.
> The error, right now, says only "expecting an unsigned integer"; it
> should also mention time and rand.
> Should 'rand' be something that conveys more about its meaning, 'strong'
Hmmm. "Random means random":-). I have no opinion about whether it would
be better. ISTM that "strong" would require some explanations. I let is as
"rand" for now.
> The documentation doesn't mention the allowed range of the unsigned
> integer (which I get, because 'unsigned int' is exactly the signature
> for srandom, but somebody might read "unsigned integer" in a more
> generic sense).
Ok. I extended so that it works with octal, decimal and hexadecimal, and
updated the doc accordingly. I did not add range information though.
> I'm not sure what would be a better way to say it.
> The base (only decimal, as now in the code) isn't specified either.
> Maybe the documentation should mention that the output now includes the
> random seed being used, so that (even without planning ahead) [...]
It does so only if the seed is explicitely set, otherwise it keeps the
previous behavior of being silent. I added a sentence about that.
> Something more may need to be said in the doc about reproducibility. I think
> the real story is that a run can be reproduced if the number of clients is
> equal to the number of threads.
Yes, good point. I tried to hide the issue under the "as far as random
numbers are concerned". I've tried to improve this point in the doc.
> Although each thread has its own generator state, each client does not
> (if there is more than one per thread), and as soon as real select()
> delays start happening in CSTATE_WAIT_RESULT, the clients dealt out to a
> given thread may not be hitting that thread's generator in a
> deterministic order.
Yep. This may evolve, for instance the error handling patch needs to
restart transactions so it adds a state into the client.
|Next Message||Fabien COELHO||2018-01-09 13:46:24||Re: pgbench - add \if support|
|Previous Message||Vik Fearing||2018-01-09 13:17:42||Re: proposal: alternative psql commands quit and exit|