From: | Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> |
---|---|
To: | Ildar Musin <i(dot)musin(at)postgrespro(dot)ru> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: General purpose hashing func in pgbench |
Date: | 2018-01-12 15:03:00 |
Message-ID: | alpine.DEB.2.20.1801121555530.13422@lancre |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello Ildar,
>> Hmm. I do not think that we should want a shared seed value. The seed
>> should be different for each call so as to avoid undesired
>> correlations. If wanted, correlation could be obtained by using an
>> explicit identical seed.
>
> Probably I'm missing something but I cannot see the point. If we change
> seed on every invokation then we get uniform-like distribution (see
> attached image). And we don't get the same hash value for the same input
> which is the whole point of hash functions. Maybe I didn't understand
> you correctly.
I suggest to fix the seed when parsing the script, so that it is the same
seed on each script for a given pgbench invocation, so that for one run it
runs with the same seed for each hash call, but changes if pgbench is
re-invoked so that the results would be different.
Also, if hash(:i) and hash(:j) appears in two distinct scripts, ISTM that
we do not necessarily want the same seed, otherwise i == j would correlate
to hash(i) == hash(j), which may not be a desirable property for some use
case.
Maybe it would be desirable for other use cases, though.
> Anyway I've attached a new version with some tests and docs added.
--
Fabien.
From | Date | Subject | |
---|---|---|---|
Next Message | Fabien COELHO | 2018-01-12 15:03:59 | Re: General purpose hashing func in pgbench |
Previous Message | Robert Haas | 2018-01-12 14:54:44 | Re: [HACKERS] UPDATE of partition key |