Re: General purpose hashing func in pgbench

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Ildar Musin <i(dot)musin(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: General purpose hashing func in pgbench
Date: 2018-01-10 18:42:26
Message-ID: alpine.DEB.2.20.1801101903150.15856@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Ildar,

>> Patch needs a rebase after Teodor push for a set of pgbench functions.
> Done. Congratulations on your patch finally being committed : )

Over 21 months... I hope that pgbench will have hash functions sooner:-)

>>> Should we probably add some infrastructure for optional arguments?
>>
>> You can look at the handling of "CASE" which may or may not have an
>> "ELSE" clause.
>>
>> I'd suggest you use a new negative argument with the special meaning
>> for hash, and create the seed value when missing when building the
>> function, so as to simplify the executor code.

> Added a new nargs option -3 for hash functions and moved arguments check
> to parser. It's starting to look a bit odd and I'm thinking about
> replacing bare numbers (-1, -2, -3) with #defined macros. E.g.:
>
> #define PGBENCH_NARGS_VARIABLE (-1)
> #define PGBENCH_NARGS_CASE (-2)
> #define PGBENCH_NARGS_HASH (-3)

Yes, I'm more than fine with improving readability.

>> Instead of 0, I'd consider providing a random default so that the
>> hashing behavior is not the same from one run to the next. What do you
>> think?
>
> Makes sence since each thread is also initializes its random_state with
> random numbers before start. So I added global variable 'hash_seed' and
> initialize it with random() before threads spawned.

Hmm. I do not think that we should want a shared seed value. The seed
should be different for each call so as to avoid undesired correlations.
If wanted, correlation could be obtained by using an explicit identical
seed.

ISTM that the best way to add the seed is to call random() when the second
arg is missing in make_func. Also, this means that the executor would
always get its two arguments, so it would simplify the code there.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-01-10 18:45:14 Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Previous Message Alvaro Herrera 2018-01-10 18:19:46 Re: PATCH: Configurable file mode mask