Re: gaussian distribution pgbench

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gaussian distribution pgbench
Date: 2014-03-15 06:53:47
Message-ID: alpine.DEB.2.10.1403150738110.13791@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Heikki,

> A couple of comments:
>
> * There should be an explicit "\setrandom ... uniform" option too, even
> though you get that implicitly if you don't specify the distribution

Indeed. I agree. I suggested it, but it got lost.

> * What exactly does the "threshold" mean? The docs informally explain that
> "the larger the thresold, the more frequent values close to the middle of the
> interval are drawn", but that's pretty vague.

There are explanations and computations as comments in the code. If it is
about the documentation, I'm not sure that a very precise mathematical
definition will help a lot of people, and might rather hinder
understanding, so the doc focuses on an intuitive explanation instead.

> * Does min and max really make sense for gaussian and exponential
> distributions? For gaussian, I would expect mean and standard deviation as
> the parameters, not min/max/threshold.

Yes... and no:-) The aim is to draw an integer primary key from a table,
so it must be in a specified range. This is approximated by drawing a
double value with the expected distribution (gaussian or exponential) and
project it carefully onto integers. If it is out of range, there is a loop
and another value is drawn. The minimal threshold constraint (2.0) ensures
that the probability of looping is low.

> * How about setting the variable as a float instead of integer? Would seem
> more natural to me. At least as an option.

Which variable? The values set by setrandom are mostly used for primary
keys. We really want integers in a range.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mitsumasa KONDO 2014-03-15 08:50:43 Re: gaussian distribution pgbench
Previous Message Peter Geoghegan 2014-03-15 05:40:41 Re: jsonb and nested hstore