Re: Issue with the PRNG used by Postgres

From: Parag Paul <parag(dot)paul(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Issue with the PRNG used by Postgres
Date: 2024-04-10 15:26:39
Message-ID: CAA=PXp074Aab6xjyMfQoYhLUFFsDzT+0X=-JKbxyryXa9SH0eQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

hi Tom,
First of all thanks for you response. I did not misread it. The 0.5 is
added to the result of the multiplication which then uses C integer
casting, which does not round off, but just drops the decimal portion.

status->cur_delay += (int) (status->cur_delay *
pg_prng_double(&pg_global_prng_state) +
0.5);

So, if RNG generated 0.0000001 and cur_delay =1000.
Result will be
1000 + int(1000*0.000001 + 5) = (int)(1000 + (0.1+.5)) = (int)1000.6 = 1000
<-- back to the same value
and there is no change after that starts happening, if the RNG is in the
low hamming index state. If avalanche happens soon, then it will correct it
self, but in the mean time, we have a stuck_spin_lock PANIC that could
bring down a production server.
-Parag

On Wed, Apr 10, 2024 at 8:09 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > I'm not convinced that we should try to improve the RNG, but surely we
> > need to put parentheses around pg_prng_double(&pg_global_prng_state) +
> > 0.5. IIUC, the current logic is making us multiply the spin delay by a
> > value between 0 and 1 when what was intended was that it should be
> > multiplied by a value between 0.5 and 1.5.
>
> No, I think you are misreading it, because the assignment is += not =.
> The present coding is
>
> /* increase delay by a random fraction between 1X and 2X */
> status->cur_delay += (int) (status->cur_delay *
> pg_prng_double(&pg_global_prng_state)
> + 0.5);
>
> which looks fine to me. The +0.5 is so that the conversion to integer
> rounds rather than truncating.
>
> In any case, I concur with Andres: if this behavior is anywhere near
> critical then the right fix is to not be using spinlocks.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-04-10 15:33:37 Re: Speed up clean meson builds by ~25%
Previous Message Kartyshov Ivan 2024-04-10 15:12:00 Re: [HACKERS] make async slave to wait for lsn to be replayed