Re: pgbench - add pseudo-random permutation function

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-04-05 12:07:25
Message-ID: alpine.DEB.2.22.394.2104051009040.2033293@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Dean,

>> I think that permute should only use integer operations. I'd suggest to
>> use one of the integer variants instead of going through a double
>> computation and casting back to int. The internal state is based on
>> integers, I do not see the added value of going through floats, possibly
>> enduring floating point issues (undeflow, rounding, normalization,
>> whatever) on the way, whereas from start to finish we just need ints.
>
> This is the already-established coding pattern used in getrand() to
> pick a random number uniformly in some range that's not necessarily a
> power of 2.

Indeed. I'm not particularly happy with that one either.

> Floating point underflow and normalisation issues are not possible
> because erand48() takes a 48-bit integer N and uses ldexp() to store
> N/2^48 in a double, which is an exact operation since IEEE doubles
> have something like 56-bit mantissas.

Double mantissa size is 52 bits.

> This is then turned back into an integer in the required range by
> multiplying by the desired maximum value, so there's never any risk of
> underflow or normalisation issues.

ISTM that there are significant issues when multiplying with an integer,
because the integer is cast to a double before multiplying, so if the int
is over 52 bits then it is coldly truncated and some values are just lost
in the process and will never be drawn. Probably not too many of them, but
some of them anyway.

> I guess that there may be rounding variations once the required
> maximum value exceeds something like 2^56 (although the comment in
> getrand() is much more conservative than that), so it's possible that
> a pgbench script calling random() with (ub-lb) larger than that might
> give different results on different platforms.

Dunno. This may be the same issue I'm pointing out above.

> For the non-uniform random functions, that effect might well kick in
> sooner. I'm not aware of any field complaints about that though,
> possibly because real-world data sizes are significantly smaller than
> that.
>
> In practice, permute() is likely to take its input from one of the
> non-uniform random functions, so it won't be permute() that first
> introduces rounding issues.

Sure. I'd like permute to be immune to that.

>> See attached v27 proposal.
>
> This update has a number of flaws. For example, this:

Indeed:-)

> +static uint64
> +randu64(RandomState * state)
> +{
> + uint64 r1 = pg_jrand48((*state).xseed),
> + r2 = pg_jrand48((*state).xseed);
> + return r1 << 51 | r2 << 13 | r1 >> 13;
> +}
>
> It still uses a 48-bit RandomState, so it doesn't improve on getrand()
> in that respect.

Sure. I'm pretty unhappy with that one, but I was not trying to address
that. My idea that randu64 would be replace with something better at some
point. My intention was "64-bits pseudo-random", my implementation does
not work, ok:-)

> It replaces a single erand48() call with 2 jrand48() calls, which
> comes at a cost in terms of performance. (In fact, changing the number
> of rounds in the previous version of permute() from 4 to 6 has a
> smaller performance impact than this -- more about that below.)

Sure, same remark as above, I was not trying to address that pointB.

> jrand48() returns a signed 32-bit integer, which has a 50% chance of
> being negative, so when that is cast to a uint64, there is a 50%
> chance that the 32 most significant bits will be 1.

Argh.

> When the various parts are OR'ed together, that will then mask out any
> randomness from the other parts. For example, 50% of the time, the
> jrand48() value used for r1 will be negative, and so 32 bits in the
> middle of the final result will all be set.

Argh. I hesitated to use xor. I should not have:-)

> So overall, the results will be highly non-uniform, with less
> randomness and poorer performance than erand48().

Indeed, bad choice. I wanted to used the unsigned version but it is not
implemented, and swichted to the signed version without thinking of some
of the implications.

> In addition, it returns a result in the range [0,2^64), which is not
> really what's wanted. For example:
>
> + /* Random offset */
> + r = randu64(&random_state2);
> + v = (v + r) % size;
>
> The previous code used r = getrand(0, size-1), which gave a uniformly
> random offset. However, the new code risks introducing additional
> non-uniformity when size is not a power of 2.

ISTM that the overall non uniformity is worse with the float approach as
opposed to the int approach.

Conceptually, the same kind of bias is expected whether you get through
floats or through ints, because the underlying power-of-two maths is the
same, so what makes the difference in reducing non-uniformity is using
more bits. Basically, when enough bits are used the same number of values
should appear n vs n+1 times.

When not enough bits are provided, things get ugly: for instance, with
size = 2^53, even if the floats were fully the 52-bit float pseudo-random
mantissa (they are really 48 with erand48) would result in only even
numbers to be produced, whereas with ints all numbers are produced. With
erand48, when size is above 48 bits ISTM that last bits are always zeros
with the double approach. I'm not counting lost values because of size
truncation when converting it to double.

> Finally, worst of all, this random offset is no longer bijective, due
> to 64-bit integer wrap-around. For example, suppose that size=100 and
> r=(2^64-10), then the following 2 values both map to the same result:
>
> v = 20 -> (v + r) % size
> = (20 + (2^64 - 10)) % 100
> = (2^64 + 10) % 100
> = (10) % 100
> = 10
>
> v = 4 -> (v + r) % size
> = (4 + (2^64 - 10)) % 100
> = (2^64 - 6) % 100
> = (18446744073709551610) % 100
> = 10
>
> So not only are the results no longer uniformly random, they might not
> even form a permutation.

Indeed, this one is pretty fun! Probably the right formula for this
approach is "(v + r % size) % size", which is kind of a mouthful.

I fully agree that my v27 implementation is butched on many dimensions,
some of them intentional and temporary (use jrand48 twice) and some of
them accidental (not considering int overflows, being optimistic on signed
to unsigned casts…).

I still disagree though that going through floating point is the right
thing to do, because of some of the issues I outlined above (eg truncation
and rounding for above 48/52-bits sizes). Basically I think that an
algorithm dealing with integers should not have to resort to floating
point computations unless it is actually required. This is not the case
for permute, were v26 is using doubles as glorified 48-bit integers, that
could be extended to 52-bit integers, but no more. The only benefit I see
is using implicitly the internal 104-bit rounding by truncation on
multiply, but I do not think that implicitely reducing the practical int
values to 52 bits is worth it, and that the same quality (bias) can be
achieved for 63 bits integers by keeping them as ints are writing the
right formula, which I fully failed to demonstrate in v27.

> I did some more testing of the previous version (v26), this time
> looking at much larger sizes, all the way up to the maximum, which is
> 2^63-1 since it comes from a signed int64. In general, the results
> were very good, though I did notice some slight non-uniformity in the
> way adjacent inputs were separated from another when the size was just
> under a power of 2. I think that's the hardest case for this
> algorithm, because there's very little overlap between the 2 halves.

Yes, less values are steered twice per round. However, as for adjacent
values for large sizes, I'm wondering whether this may have more to do
with the 48 bit limitations, so that lower bits are not really xored for
instance. Not sure.

> Increasing the number of rounds from 4 to 6 ironed out that
> non-uniformity (and as mentioned above, is still cheaper than using
> randu64() with 4 rounds), so I think we should go with that.

There is a quality-cost tradeoff. With the previous version I convinced
myself that 4 rounds were a good compromise (not perfect, but ok for
keeping the cost low on practical sizes).

With this version, I'll admit that I do not have an opinion.

> You may wish to submit a separate patch to replace pgbench's use of
> *rand48() with something else, and that would be discussed on its own
> merits, but I don't see why that should hold up adding permute().

I'll see.

Attached a v28 which I hope fixes the many above issues and stays with
ints. The randu64 is still some kind of a joke, I artificially reduced the
cost by calling jrand48 once and extending it to 64 bits, so it could give
an idea of the cost endured if a 64-bit prng was used.

Now you are the committer, you can do as you please, I'm just stating my
(mathematical) opinions about using floating point computations for that.
I think that apart from this point of principle/philosophy the permute
performance and implementation are reasonable, and better than my initial
version because it avoids int128 computations and the large prime number
business.

--
Fabien.

Attachment Content-Type Size
pgbench-prp-func-28.patch text/x-diff 16.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sait Talha Nisanci 2021-04-05 12:07:29 Re: [EXTERNAL] Re: Crash in record_type_typmod_compare
Previous Message torikoshia 2021-04-05 12:03:12 Re: Get memory contexts of an arbitrary backend process