Re: gaussian distribution pgbench

From: KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gaussian distribution pgbench
Date: 2014-03-17 05:39:44
Message-ID: 53268AA0.2040909@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2014/03/15 15:53), Fabien COELHO wrote:
>
> Hello Heikki,
>
>> A couple of comments:
>>
>> * There should be an explicit "\setrandom ... uniform" option too, even though
>> you get that implicitly if you don't specify the distribution
>
> Indeed. I agree. I suggested it, but it got lost.
>
>> * What exactly does the "threshold" mean? The docs informally explain that "the
>> larger the thresold, the more frequent values close to the middle of the
>> interval are drawn", but that's pretty vague.
>
> There are explanations and computations as comments in the code. If it is about
> the documentation, I'm not sure that a very precise mathematical definition will
> help a lot of people, and might rather hinder understanding, so the doc focuses
> on an intuitive explanation instead.
>
>> * Does min and max really make sense for gaussian and exponential
>> distributions? For gaussian, I would expect mean and standard deviation as the
>> parameters, not min/max/threshold.
>
> Yes... and no:-) The aim is to draw an integer primary key from a table, so it
> must be in a specified range. This is approximated by drawing a double value with
> the expected distribution (gaussian or exponential) and project it carefully onto
> integers. If it is out of range, there is a loop and another value is drawn. The
> minimal threshold constraint (2.0) ensures that the probability of looping is low.
>
>> * How about setting the variable as a float instead of integer? Would seem more
>> natural to me. At least as an option.
>
> Which variable? The values set by setrandom are mostly used for primary keys. We
> really want integers in a range.
Oh, I see. He said about documents.

+ Moreover, set gaussian or exponential with threshold interger value,
+ we can get gaussian or exponential random in integer value between
+ <replaceable>min</> and <replaceable>max</> bounds inclusive.

Collectry,
+ Moreover, set gaussian or exponential with threshold double value,
+ we can get gaussian or exponential random in integer value between
+ <replaceable>min</> and <replaceable>max</> bounds inclusive.

And I am going to fix the document more easily understanding for user.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua Yanovski 2014-03-17 07:14:49 Re: [WIP] Better partial index-only scans
Previous Message David Johnston 2014-03-17 05:15:31 Re: BUG #9578: Undocumented behaviour for temp tables created inside query language (SQL) functions