Re: gaussian distribution pgbench

From: KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gaussian distribution pgbench
Date: 2014-03-17 08:40:35
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Heikki-san,

(2014/03/17 14:39), KONDO Mitsumasa wrote:
> (2014/03/15 15:53), Fabien COELHO wrote:
>> Hello Heikki,
>>> A couple of comments:
>>> * There should be an explicit "\setrandom ... uniform" option too, even though
>>> you get that implicitly if you don't specify the distribution
Fix. We can use "\setrandom val min max uniform" without error messages.

>>> * What exactly does the "threshold" mean? The docs informally explain that "the
>>> larger the thresold, the more frequent values close to the middle of the
>>> interval are drawn", but that's pretty vague.
>> There are explanations and computations as comments in the code. If it is about
>> the documentation, I'm not sure that a very precise mathematical definition will
>> help a lot of people, and might rather hinder understanding, so the doc focuses
>> on an intuitive explanation instead.
Add more detail information in the document. Is it OK? Please confirm it.

>>> * Does min and max really make sense for gaussian and exponential
>>> distributions? For gaussian, I would expect mean and standard deviation as the
>>> parameters, not min/max/threshold.
>> Yes... and no:-) The aim is to draw an integer primary key from a table, so it
>> must be in a specified range. This is approximated by drawing a double value with
>> the expected distribution (gaussian or exponential) and project it carefully onto
>> integers. If it is out of range, there is a loop and another value is drawn. The
>> minimal threshold constraint (2.0) ensures that the probability of looping is low.
It make sense. Please see the attached picutre in last day.

>>> * How about setting the variable as a float instead of integer? Would seem more
>>> natural to me. At least as an option.
>> Which variable? The values set by setrandom are mostly used for primary keys. We
>> really want integers in a range.
> Oh, I see. He said about documents.
The document was mistaken.
Threshold parameter must be double and fix the document.

By the way, you seem to want to remove --gaussian=NUM and --exponential=NUM
command options. Can you tell me the objective reason? I think pgbench is the
benchmark test on PostgreSQL and default benchmark is TPC-B-like benchmark.
It is written in documents, and default benchmark wasn't changed by my patch.
So we need not remove command options, and they are one of the variety of
benchmark options. Maybe you have something misunderstanding about my patch...

Mitsumasa KONDO
NTT Open Source Software Center

Attachment Content-Type Size
gaussian_and_exponential_pgbench_v12.patch text/x-diff 24.7 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-03-17 08:46:17 Re: gaussian distribution pgbench
Previous Message Kyotaro HORIGUCHI 2014-03-17 08:30:44 Re: inherit support for foreign tables