Re: allowing broader use of simplehash

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: allowing broader use of simplehash
Date: 2019-12-12 19:51:40
Message-ID: 20191212195140.xmfdweada7nxj6uq@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-12-11 10:50:16 -0500, Robert Haas wrote:
> On Tue, Dec 10, 2019 at 4:59 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > 3) For lots of one-off uses of hashtables that aren't performance
> > critical, we want a *simple* API. That IMO would mean that key/value
> > end up being separately allocated pointers, and that just a
> > comparator is provided when creating the hashtable.
>
> I think the simplicity of the API is a key point. Some things that are
> bothersome about dynahash:
>
> - It knows about memory contexts and insists on having its own.

Which is a waste, in a good number of cases.

> - You can't just use a hash table in shared memory; you have to
> "attach" to it first and have an object in backend-private memory.

I'm not quite sure there's all that good an alternative to this,
tbh. For efficiency it's useful to have backend-local state, I
think. And I don't really see how to have that without needing to attach.

> - The usual way of getting a shared hash table is ShmemInitHash(), but
> that means that the hash table has its own named chunk and that it's
> in the main shared memory segment. If you want to put it inside
> another chunk or put it in DSM or whatever, it doesn't work.

I don't think it's quite realistic for the same implementation - although
the code could partially be shared and just specialized for both cases -
to be used for DSM and "normal" shared memory. That's however not an
excuse to have drastically different interfaces for both.

> - It knows about LWLocks and if it's a shared table it needs its own
> tranche of them.
> - hash_search() is hard to wrap your head around.
>

> One thing I dislike about simplehash is that the #define-based
> interface is somewhat hard to use. It's not that it's a bad design.

I agree. It's the best I could come up taking the limitations of C into
account, when focusing on speed and type safety. I really think this
type of hack is a stopgap measure, and we ought to upgrade to a subset
of C++.

> It's just you have to sit down and think for a while to figure out
> which things you need to #define in order to get it to do what you
> want. I'm not sure that's something that can or needs to be fixed, but
> it's something to consider. Even dynahash, as annoying as it is, is in
> some ways easier to get up and running.

I have been wondering about providing one simplehash wrapper in a
central place that uses simplehash to store a {key*, value*}, and has a
creation interface that just accepts a comparator. Plus a few wrapper
creation functions for specific types (e.g. string, oid, int64). While
we'd not want to use that for really performance critical paths, for 80%
of the cases it'd be sufficient.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-12-12 19:54:20 Re: global / super barriers (for checksums)
Previous Message Andres Freund 2019-12-12 19:33:26 Re: allowing broader use of simplehash