Re: pgbench - add pseudo-random permutation function

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-11 16:31:44
Message-ID: CAEZATCWa1P7hVwJCxCDEDOFvVERO2J51XBz_6gbsozR_ton1fw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 11 Mar 2021 at 00:58, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
> Maybe Dean Rasheed can help because of his math background --- CC'ing him.
>

Reading the thread I can see how such a function might be useful to
scatter non-uniformly random values.

The implementation looks plausible too, though it adds quite a large
amount of new code. The main thing that concerns me is justifying the
code. With this kind of thing, it's all too easy to overlook corner
cases and end up with trivial sequences in certain special cases. I'd
feel better about that if we were implementing a standard algorithm
with known pedigree.

Thinking about the use case for this, it seems that it's basically
designed to turn a set of non-uniform random numbers (produced by
random_exponential() et al.) into another set of non-uniform random
numbers, where the non-uniformity is scattered so that the more/less
common values aren't all clumped together.

I'm wondering if that's something that can't be done more simply by
passing the non-uniform random numbers through the uniform random
number generator to scatter them uniformly across some range -- e.g.,
given an integer n, return the n'th value from the sequence produced
by random(), starting from some initial seed -- i.e., implement
nth_random(lb, ub, seed, n). That would actually be pretty
straightforward to implement using O(log(n)) time to execute (see the
attached python example), though it wouldn't generate a permutation,
so it'd need a bit of thought to see if it met the requirements.

Regards,
Dean

Attachment Content-Type Size
erand48.py text/x-python 926 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-03-11 17:01:56 Re: Huge memory consumption on partitioned table with FKs
Previous Message Robert Haas 2021-03-11 16:31:32 Re: New IndexAM API controlling index vacuum strategies