Re: pgbench randomness initialization

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: pgbench randomness initialization
Date: 2016-04-07 11:01:44
Message-ID: alpine.DEB.2.10.1604071242420.11001@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Andres,

> If you run the test for longer... Or explicitly iterate over IVs. At the
> very least we need to make pgbench output the IV used, to have some
> chance of repeating tests.

Note that I'm not against providing a way to repeat tests "exactly", and I
have suggested two means: environment variable and/or option.

> [...] That comparison pretty much invalidates any point you're making,
> it's that bad.

At least it is simple, if simplistic.

Here is another one: I knew a financial institution which needed to
evaluate the VAR of exotic financial products every night. They relied on
MC for that. Alas, it was not converging quickly enough, results were
unstable, so they took your advice: they froze the seed. Day after day the
results were mostly the same, the VAR was stable one morning to the other,
the management is happy, the risks were under control... That was in the
mid 2000s:-)

>> However, from a stastistical perspective this is just heresy: you may do a
>> change which improves one given run at the expense of all possible others
>> and you would not know it: Say for instance that there are two different
>> behaviors depending on something, then you will check against one of them
>> only.
>
> Meh. That assumes that we're doing a huge number of pgbench runs;

A number of, not necessarily "huge". Or averaging a lot of intermediate
values and having a hard look at the distribution, not just the final tps
number.

> but usually people do maybe a handful. Tops. If you're trying to defend
> against scenarios like that you need to design your tests so that you'll
> encounter such problems by running longer.

People usually do a lot of things, does not mean that it is "right".

>> So I have no mathematical doubt that changing the seed is the right
>> default setting, thus I think that the current behavior is fine.
>> However I'm okay if someone wants to control the randomness for some
>> reason (maybe having "less sure" results, but quickly), so it could be
>> allowed somehow.
>
> There might be some statistics arguments,

Yep, there is.

> but I think they're pretty ignoring reality.

Hmmm. If reality wants to ignore mathematics, usually it looses, so this
will not be with my blessing. Note that as a committer you do not need me
to freeze the seed. I'm just providing an opinion backed by mathematical
proofs.

--
Fabien.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2016-04-07 11:23:07 Re: PATCH: use foreign keys to improve join estimates v1
Previous Message Kevin Grittner 2016-04-07 10:42:32 Re: WIP: Detecting SSI conflicts before reporting constraint violations