Re: General purpose hashing func in pgbench

From: Ildar Musin <i(dot)musin(at)postgrespro(dot)ru>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: General purpose hashing func in pgbench
Date: 2018-01-12 11:50:51
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello Fabien,

11/01/2018 19:21, Ildar Musin пишет:
> 10/01/2018 21:42, Fabien COELHO пишет:
>> Hmm. I do not think that we should want a shared seed value. The seed
>> should be different for each call so as to avoid undesired
>> correlations. If wanted, correlation could be obtained by using an
>> explicit identical seed.
>> ISTM that the best way to add the seed is to call random() when the
>> second arg is missing in make_func. Also, this means that the executor
>> would always get its two arguments, so it would simplify the code there.
> Ok, I think I understand what you meant. You meant the case like following:
> \set x random(1, 100)
> \set h1 hash(:x)
> \set h2 hash(:x)  -- will have different seed from h1
> so that different instances of hash function within one script would
> have different seeds. Yes, that is a good idea, I can do that.
Added this feature in attached patch. But on a second thought this could
be something that user won't expect. For example, they may want to run
pgbench with two scripts:
- the first one updates row by key that is a hashed random_zipfian value;
- the second one reads row by key generated the same way
(that is actually what YCSB workloads A and B do)

It feels natural to write something like this:
\set rnd random_zipfian(0, 1000000, 0.99)
\set key abs(hash(:rnd)) % 1000
in both scripts and expect that they both would have the same
distribution. But they wouldn't. We could of course describe this
implicit behaviour in documentation, but ISTM that shared seed would be
more clear.


Ildar Musin
Postgres Professional:
Russian Postgres Company

Attachment Content-Type Size
pgbench_hash_v6.patch text/plain 9.2 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2018-01-12 12:47:08 Re: WIP Patch: Pgbench Serialization and deadlock errors
Previous Message Alexander Kuzmenkov 2018-01-12 11:34:09 Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()