Re: pgbench-ycsb

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: a(dot)bykov(at)postgrespro(dot)ru
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgbench-ycsb
Date: 2018-07-22 20:42:14
Message-ID: alpine.DEB.2.21.1807221615000.13768@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


>>> Just to clarify - if I understand Anthony correctly, this proposal is
>>> not about implementing exactly YCSB as it is, but more about using
>>> zipfian distribution for an id in the regular pgbench table structure
>>> in conjunction with read/write balance to simulate something similar
>>> to it.
>>
>> Ok, I misunderstood. My 0.02€: If it does not implement YCSB, and the
>> point is not to implement YCSB, then do not call it YCSB:-)
>>
>> Maybe there could be other simpler builtins to use non uniform
>> distributions: {zipf,exp,...}-{simple,select} and default values
>> (exp_param, zipf_param?) for the random distribution parameters.
>>
>> \set id random_zipfian(1, 100000*:scale, :zipf_param)
>> \set val random(-5000, 5000)
>> UPDATE pgbench_whatever ...;
>>
>> Then
>>
>> pgbench -b zipf-se(at)1 -b zipf-si(at)1 [ -D zipf_param=1.1 ... ] -T 10000 ...
>>
>>> And probably instead of implementing the exact YCSB workload inside
>>> pgbench, it makes more sense to add PostgreSQL Jsonb as one of the
>>> options into the framework itself (I was in the middle of it few years
>>> ago, but then was distracted by some interesting benchmarking
>>> results).
>>
>> Sure.
>
> Hello,
> thank you for your interest. I'm still improving this idea, the patch
> and I'm very happy about the discussion we have. It really helps.
>
> The idea was to implement the workloads as close to YCSB as possible
> using pgbench.

Basically I'm against having something called YCSB if it is not YCSB;-)

> So, the schema it should be applied to - is default schema generated by
> pgbnench -i (pgbench_accounts).

This is a contradiction, because pgbench_accounts table is in no way, even
remotely, conformant to the YCSB benchmark test table.

So for me there are three possibilities:

(1) do nothing, always an option as committers may be against extending
pgbench in this direction anyway. Personally I'm fine with having it.

(2) implement YCSB cleanly, i.e. both initialization and operations, at
least if this is "reasonable" (i.e. it does not result in 2000 lines of
new code). ISTM that it can be done, given that the YCSB schema is very
simple, hence I suggested "pgbench -i --schema yscb" to trigger a non
default initialization.

(3) if you are interested in demonstrating non uniform distribution on
pgbench_accounts, I'm also fine with it, just do so, but do *NOT* call it
YCSB.

Also it seems that the YCSB bench uses some hashing to mix keys and avoid
having 1 as the most frequent, 2 as the second, and so on. There is a hash
function in pgbench which can be used (although the solution is not
perfect, some values cannot be reached), but it is used by YCSB. Otherwise
I'm planning to submit a pseudo-random permutation function to ease this
some day, provided that the size of the table stays constant.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-07-22 20:50:32 Re: patch to allow disable of WAL recycling
Previous Message Tomas Vondra 2018-07-22 20:24:57 Re: [HACKERS] plpgsql - additional extra checks