Re: DSM segment handle generation in background workers

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: DSM segment handle generation in background workers
Date: 2018-11-14 02:24:35
Message-ID: 20181114022435.GC1096408@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 12, 2018 at 09:39:01AM -0500, Tom Lane wrote:
> Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> > On Mon, Nov 12, 2018 at 9:34 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
> >> Compared to the old code, the new code requires more wall time to visit every
> >> possible seed value. New code xor's the PID bits into the fastest-changing
> >> timestamp bits, so only about twenty bits can vary within any given one-second
> >> period. (That assumes a PID space of twenty or fewer bits; fifteen bits is
> >> the Linux default.) Is that aspect of the change justified?
>
> > Hmm, right. How about applying pg_bswap32() to one of the terms, as
> > an easy approximation of reversing the bits?

That seems adequate, yes. But why not just restore the old algorithm in the
new function? (I'm not opposed to revising the algorithm, but I think a
revision should have explicit goals. The ease of pg_bswap32() is not itself a
worthy goal, because the old BackendRun() algorithm was also quite easy.)

> I doubt that's a good idea; to a first approximation, it would mean that
> half the seed depends only on the PID and the other half only on the
> timestamp. Maybe we could improve matters a little by left-shifting the
> PID four bits or so, but I think we still want it to mix with some
> rapidly-changing time bits.
>
> I'm not really sure that we need to do anything though. Basically,
> what we've got here is a tradeoff between how many bits change over
> a given timespan and how unpredictable those bits are. I don't see
> that one of those is necessarily more important than the other.

What counts is the ease of predicting a complete seed. HEAD's algorithm has
~13 trivially-predictable bits, and the algorithm that stood in BackendRun()
from 98c5065 until 197e4af had no such bits. You're right that the other 19
bits are harder to predict than any given 19 bits under the old algorithm, but
the complete seed remains more predictable than it was before 197e4af.

Goal I recommend: if connections arrive in a Poisson distribution of unknown
lambda, maximize the number of distinct seeds.

nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Denny Herssein 2018-11-14 03:20:12 Ltree Extension Enhancements
Previous Message Michael Paquier 2018-11-14 01:45:57 Re: wal_dump output on CREATE DATABASE