Re: General purpose hashing func in pgbench

From: Ildar Musin <i(dot)musin(at)postgrespro(dot)ru>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: General purpose hashing func in pgbench
Date: 2018-01-11 15:53:22
Message-ID: ce7e0934-d32b-9329-cfee-99a048d8c3b3@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Fabien,

10/01/2018 21:42, Fabien COELHO пишет:
>>>> Should we probably add some infrastructure for optional arguments?
>>>
>>> You can look at the handling of "CASE" which may or may not have an
>>> "ELSE" clause.
>>>
>>> I'd suggest you use a new negative argument with the special meaning
>>> for hash, and create the seed value when missing when building the
>>> function, so as to simplify the executor code.
>
>> Added a new nargs option -3 for hash functions and moved arguments check
>> to parser. It's starting to look a bit odd and I'm thinking about
>> replacing bare numbers (-1, -2, -3) with #defined macros. E.g.:
>>
>> #define PGBENCH_NARGS_VARIABLE (-1)
>> #define PGBENCH_NARGS_CASE (-2)
>> #define PGBENCH_NARGS_HASH (-3)
>
> Yes, I'm more than fine with improving readability.
>
Added macros.
>>> Instead of 0, I'd consider providing a random default so that the
>>> hashing behavior is not the same from one run to the next. What do you
>>> think?
>>
>> Makes sence since each thread is also initializes its random_state with
>> random numbers before start. So I added global variable 'hash_seed' and
>> initialize it with random() before threads spawned.
>
> Hmm. I do not think that we should want a shared seed value. The seed
> should be different for each call so as to avoid undesired
> correlations. If wanted, correlation could be obtained by using an
> explicit identical seed.
Probably I'm missing something but I cannot see the point. If we change
seed on every invokation then we get uniform-like distribution (see
attached image). And we don't get the same hash value for the same input
which is the whole point of hash functions. Maybe I didn't understand
you correctly.

Anyway I've attached a new version with some tests and docs added.

--
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment Content-Type Size
murmur2_random_seed_per_invokation.png image/png 77.8 KB
pgbench_hash_v5.patch text/plain 8.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Arthur Zakirov 2018-01-11 15:53:49 Re: [HACKERS] PoC: custom signal handler for extensions
Previous Message Diego Silva e Silva 2018-01-11 15:52:10 The first function call