Re: gaussian distribution pgbench

From: KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gaussian distribution pgbench
Date: 2014-03-17 09:43:33
Message-ID: 5326C3C5.7000904@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2014/03/17 17:46), Heikki Linnakangas wrote:
> On 03/15/2014 08:53 AM, Fabien COELHO wrote:
>>> >* Does min and max really make sense for gaussian and exponential
>>> >distributions? For gaussian, I would expect mean and standard deviation as
>>> >the parameters, not min/max/threshold.
>> Yes... and no:-) The aim is to draw an integer primary key from a table,
>> so it must be in a specified range.
>
> Well, I don't agree with that aim. It's useful for choosing a primary key, as in
> the pgbench TPC-B workload, but a gaussian distributed random number could be
> used for many other things too. For example:
>
> \setrandom foo ... gaussian
>
> select * from cheese where weight > :foo
>
> And :foo should be a float, not an integer. That's what I was trying to say
> earlier, when I said that the variable should be a float. If you need an integer,
> just cast or round it in the query.
>
> I realize that the current \setrandom sets the variable to an integer, so
> gaussian/exponential would be different. But so what? An option to generate
> uniformly distributed floats would be handy too, though.
Well, it seems new feature. If you want to realise it as double, add
'\setrandomd' as a double random generator in pgbebch. I will agree with that.

>> This is approximated by drawing a
>> double value with the expected distribution (gaussian or exponential) and
>> project it carefully onto integers. If it is out of range, there is a loop
>> and another value is drawn. The minimal threshold constraint (2.0) ensures
>> that the probability of looping is low.
>
> Well, that's one way to do constraint it to the given range, but there are many
> other ways to do it. Like, clamp it to the min/max if it's out of range.
It's too heavy method.. Client calculation must be light.

> I don't
> think we need to choose any particular method, you can handle that in the test
> script.
I think our implementation is the best way to realize it.
It is fast and robustness for the probability of looping is low.

If you have better idea, please teach us.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message KONDO Mitsumasa 2014-03-17 10:07:04 Re: gaussian distribution pgbench
Previous Message Heikki Linnakangas 2014-03-17 09:02:23 Re: gaussian distribution pgbench