Re: gaussian distribution pgbench

From: KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gaussian distribution pgbench
Date: 2014-03-18 08:54:45
Message-ID: 532809D5.4010705@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2014/03/17 22:37), Tom Lane wrote:
> KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp> writes:
>> (2014/03/17 18:02), Heikki Linnakangas wrote:
>>> On 03/17/2014 10:40 AM, KONDO Mitsumasa wrote:
>>> There is an infinite number of variants of the TPC-B test that we could include
>>> in pgbench. If we start adding every one of them, we're quickly going to have
>>> hundreds of options to choose the workload. I'd like to keep pgbench simple.
>>> These two new test variants, gaussian and exponential, are not that special that
>>> they'd deserve to be included in the program itself.
>
>> Well, I add only two options, and they are major distribution that are seen in
>> real database system than uniform distiribution. I'm afraid, I think you are too
>> worried and it will not be added hundreds of options. And pgbench is still simple.
>
> FWIW, I concur with Heikki on this. Adding new versions of \setrandom is
> useful functionality. Embedding them in the "standard" test is not,
> because that just makes it (even) less standard. And pgbench has too darn
> many switches already.
Hmm, I cooled down and see the pgbench option. I can understand his arguments,
there are many sitches already and it will become more largear options unless we
stop adding new option. However, I think that the man who added the option in
the past thought the option will be useful for PostgreSQL performance
improvement. But now, they are disturb the new option such like my feature which
can create more real system benchmark distribution. I think it is very
unfortunate and also tending to stop progress of improvement of PostgreSQL
performance, not only pgbench. And if we remove command line option, I think new
feature will tend to reject. It is not also good.

By the way, if we remove command line option, it is difficult to understand
distirbution of gaussian, because threshold parameter is very sensitive and it is
also very useful feature. It is difficult and taking labor that analyzing and
visualization pgbench_history using SQL.

What do you think about this problem? This is not disscussed yet.

> [mitsu-ko(at)pg-rex31 pgbench]$ ./pgbench --gaussian=2
> ~
> access probability of top 20%, 10% and 5% records: 0.32566 0.16608 0.08345
> ~
> [mitsu-ko(at)pg-rex31 pgbench]$ ./pgbench --gaussian=4
> ~
> access probability of top 20%, 10% and 5% records: 0.57633 0.31086 0.15853
> ~
> [mitsu-ko(at)pg-rex31 pgbench]$ ./pgbench --gaussian=10
> ~
> access probability of top 20%, 10% and 5% records: 0.95450 0.68269 0.38292
> ~

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message KONDO Mitsumasa 2014-03-18 08:58:55 Re: gaussian distribution pgbench
Previous Message Ronan Dunklau 2014-03-18 08:31:06 Re: Triggers on foreign tables