Quick Links

Re: Change GUC hashtable to use simplehash?

From:	John Naylor <johncnaylorls(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Gurjeet Singh <gurjeet(at)singh(dot)im>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Change GUC hashtable to use simplehash?
Date:	2023-11-23 10:41:08
Message-ID:	CANWCAZYDNjTh=8FxgZKmS8HEE=wcmLMoJx+Vvg3+xgAaGY=2oQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Nov 23, 2023 at 5:34 AM Andres Freund <andres(at)anarazel(dot)de> wrote:

> It's worth noting that the limited range of the input values means that
> there's a lot of bias toward some bits being set ('a' to 'z' all start with
> 0b011).

We can take advantage of the limited range with a single additional
instruction: After "ch |= 0x20", do "ch -= ('a' - 1)". That'll shrink
letters and underscores to the range [1,31], which fits in 5 bits.
(Other characters are much less common in a guc name). That increases
randomness and allows 12 chars to be xor'd in before the first bits
rotate around.

> If, which I mildly doubt, we can't afford to call murmurhash32 for every
> character, we could just call it for 32/5 input characters together. Or we
> could just load up to 8 characters into an 64bit integer, can call
> murmurhash64.

I'll play around with this idea, as well.

> The fact that string_hash() is slow due to the strlen(), which causes us to
> process the input twice and which is optimized to also handle very long
> strings which typically string_hash() doesn't encounter, seems problematic far
> beyond this case. We use string_hash() in a *lot* of places, and that strlen()
> does regularly show up in profiles. We should fix that.

> I think we ought to adjust our APIs around this:

> 1) The accumulator state of the hash functions should be exposed, so one can
> accumulate values into the hash state, without reducing the internal state
> to a single 32/64 bit variable.

If so, it might make sense to vendor a small, suitably licensed hash
function that already has these APIs.

While on the subject, it'd be good to have a clear separation between
in-memory and on-disk usage, so we can make breaking changes in the
former.

In response to

Re: Change GUC hashtable to use simplehash? at 2023-11-22 22:34:32 from Andres Freund

Responses

Re: Change GUC hashtable to use simplehash? at 2023-11-29 13:31:21 from John Naylor

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2023-11-23 10:45:07	Re: [Proposal] global sequence implemented by snowflake ID
Previous Message	Hayato Kuroda (Fujitsu)	2023-11-23 10:18:59	[Proposal] global sequence implemented by snowflake ID