Quick Links

Re: gaussian distribution pgbench

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: gaussian distribution pgbench
Date:	2014-07-04 10:05:56
Message-ID:	20140704100556.GO25909@awork2.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2014-07-04 11:59:23 +0200, Fabien COELHO wrote:
>
> >Yea. I certainly disagree with the patch in it's current state because it
> >copies the same 15 lines several times with a two word difference.
> >Independent of whether we want those options, I don't think that's going
> >to fly.
>
> I liked a simple static string for the different variants, which means
> replication. Factorizing out the (large) common part will mean malloc &
> sprintf. Well, why not.

It sucks from a maintenance POV. And I don't see the overhead of malloc
being relevant here...

> >>OTOH, we've almost reached the consensus that supporting gaussian
> >>and exponential options in \setrandom. So I think that you should
> >>separate those two features into two patches, and we should apply
> >>the \setrandom one first. Then we can discuss whether the other patch
> >>should be applied or not.
>
> >Sounds like a good plan.
>
> Sigh. I'll do that as it seems to be a blocker...

I think we also need documentation about the actual mathematical
behaviour of the randomness generators.

> + <para>
> + With the gaussian option, the larger the <replaceable>threshold</>,
> + the more frequently values close to the middle of the interval are drawn,
> + and the less frequently values close to the <replaceable>min</> and
> + <replaceable>max</> bounds.
> + In other worlds, the larger the <replaceable>threshold</>,
> + the narrower the access range around the middle.
> + the smaller the threshold, the smoother the access pattern
> + distribution. The minimum threshold is 2.0 for performance.
> + </para>

The only way to actually understand the distribution here is to create a
table, insert random values, and then look at the result. That's not a
good thing.

> The caveat that I have is that without these options there is:
>
> (1) no return about the actual distributions in the final summary, which
> depend on the threshold value, and
>
> (2) no included mean to test the feature, so the first patch is less
> meaningful if the feature cannot be used simply and require a custom script.

I personally agree that we likely want that as an additional
feature. Even if just because it makes the results easier to compare.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Re: gaussian distribution pgbench at 2014-07-04 09:59:23 from Fabien COELHO

Responses

Re: gaussian distribution pgbench at 2014-07-13 06:27:19 from Mitsumasa KONDO
Re: gaussian distribution pgbench -- part 1/2 at 2014-07-17 04:09:00 from Fabien COELHO

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2014-07-04 10:08:02	Re: pg_xlogdump --stats
Previous Message	Abhijit Menon-Sen	2014-07-04 10:04:14	Re: pg_xlogdump --stats