Re: Change GUC hashtable to use simplehash?

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gurjeet Singh <gurjeet(at)singh(dot)im>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Change GUC hashtable to use simplehash?
Date: 2023-12-06 00:39:00
Message-ID: CANWCAZZYAZUH_3GYUtETmLOcd-5oD+kVC3cX889F7tzAQwtOsA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 5, 2023 at 1:57 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> On Mon, 2023-12-04 at 12:12 +0700, John Naylor wrote:

> There's already a patch to use simplehash, and the API is a bit
> cleaner, and there's a minor performance improvement. It seems fairly
> non-controversial -- should I just proceed with that patch?

I won't object if you want to commit that piece now, but I hesitate to
call it a performance improvement on its own.

- The runtime measurements I saw reported were well within the noise level.
- The memory usage starts out better, but with more entries is worse.

> > From my point of view, it would at least be useful for C-strings,
> > where we don't have the length available up front.
>
> That's good news.
>
> By the way, is there any reason that we would need hash_bytes(s,
> strlen(s)) == cstring_hash(s)?

"git grep cstring_hash" found nothing, so not sure what you're asking.

> Each approach has its own optimization techniques. In (a), we can use
> the |= 0x20 trick, and for equality do a memcmp() check first.

I will assume you are referring to semantics, but on the odd chance
readers take this to mean the actual C library call, that wouldn't be
an optimization, that'd be a pessimization.

> As a tangential point, we may eventually want to provide a more
> internationalized definition of "case insensitive" for GUC names. That
> would be slightly easier with (b) than with (a), but we can cross that
> bridge if and when we come to it.

The risk/reward ratio seems pretty bad.

> It seems you are moving toward (a) whereas my patches moved toward (b).
> I am fine with either approach but I wanted to clarify which approach
> we are using.

I will make my case:

> In the abstract, I kind of like approach (b) because we don't need to
> be as special/clever with the hash functions.

In the abstract, I consider (b) to be a layering violation. As a
consequence, the cleverness in (b) is not confined to one or two
places, but is smeared over a whole bunch of places. I find it hard to
follow.

Concretely, it also adds another pointer to the element struct. That's
not good for a linear open-addressing array, which simplehash has.

Further, remember the equality function is important as well. In v3,
it was "strcmp(a,b)==0", which is a holdover from the dynahash API.
One of the advantages of the simplehash API is that we can 1) use an
equality function, which should be slightly cheaper than a full
comparison function, and 2) we have the option to inline it. (It
doesn't make sense in turn, to jump to a shared lib page and invoke an
indirect function call.) Once we've done that, it's already "special",
so it's not a stretch to make it do what we want to begin with. If a
nicer API is important, why not use it?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-12-06 00:41:06 Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
Previous Message Jeff Davis 2023-12-05 23:55:11 Re: Faster "SET search_path"