Quick Links

Re: Hash tables in dynamic shared memory

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hash tables in dynamic shared memory
Date:	2016-10-04 22:22:09
Message-ID:	20161004222209.tx3aiee2h4ai4lfv@alap3.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

> It's impossible to write a general purpose hash table that will be
> suitable for every use case, considering all the combinations of
> design trade offs and subsets of functionality someone might want.

Very much agreed.

> There is other work being done in this space: I'm aware of Andres's
> current thread[1] on Robin Hood-style closed hashing tables with
> macro-ised size, hash and comparator operations à la C++ container
> templates, and I'm following along with interest.

> He vigorously
> rejects chaining hash tables as the natural enemies of memory
> hardware.

I'd not go quite that far. There are good arguments for open addressing,
and there's good arguments for open chaining. The latter is a lot more
convincing if you want concurrent access with partition based locking,
for example...

> I guess Andres's simplehash should already work in DSA memory nicely
> anyway since it doesn't have any internal pointers IIUC (?).

The data access doesn't have pointers, but the "metadata" does have a
pointer to the data. But that's a very solvable problem. But
incremental resizing and granular locking aren't going to happen for it,
so it'd really only be suitable for presenting a read-only (or possibly
read-mostly) view.

> Potential use cases for DHT include caches, in-memory database objects
> and working state for parallel execution.

Is there a more concrete example, i.e. a user we'd convert to this at
the same time as introducing this hashtable?

> There are a couple of things I'm not happy about in this version of
> DHT: the space overhead per item is too high (hash + wasted padding +
> next pointer), and the iterator system is overly complicated. I have
> another version in development which gets rid of the hash and padding
> and sometimes gets rid of the next pointer to achieve zero overhead,
> and I'm working on a simpler iteration algorithm that keeps the
> current guarantees but doesn't need to reorder bucket chains. More on
> that, with some notes on performance and a testing module, soon.

FWIW, my experimentation showed that getting rid of the hash itself is
quite often *NOT* a worthwhile tradeof. Comparing keys and recomputing
hashes is often quite expensive (e.g. for GROUP BY it's where the
majority of time is spent after the simplehash patch).

Greetings,

Andres Freund

In response to

Hash tables in dynamic shared memory at 2016-10-04 21:40:45 from Thomas Munro

Responses

Re: Hash tables in dynamic shared memory at 2016-10-04 23:11:52 from Thomas Munro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2016-10-04 22:39:14	Idea: Tell compiler palloc() et al return aligned values
Previous Message	Gilles Darold	2016-10-04 22:12:31	Re: proposal: psql \setfileref