| From: | Andres Freund <andres(at)anarazel(dot)de> |
|---|---|
| To: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
| Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Hash tables in dynamic shared memory |
| Date: | 2016-10-04 22:22:09 |
| Message-ID: | 20161004222209.tx3aiee2h4ai4lfv@alap3.anarazel.de |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
> It's impossible to write a general purpose hash table that will be
> suitable for every use case, considering all the combinations of
> design trade offs and subsets of functionality someone might want.
Very much agreed.
> There is other work being done in this space: I'm aware of Andres's
> current thread[1] on Robin Hood-style closed hashing tables with
> macro-ised size, hash and comparator operations à la C++ container
> templates, and I'm following along with interest.
> He vigorously
> rejects chaining hash tables as the natural enemies of memory
> hardware.
I'd not go quite that far. There are good arguments for open addressing,
and there's good arguments for open chaining. The latter is a lot more
convincing if you want concurrent access with partition based locking,
for example...
> I guess Andres's simplehash should already work in DSA memory nicely
> anyway since it doesn't have any internal pointers IIUC (?).
The data access doesn't have pointers, but the "metadata" does have a
pointer to the data. But that's a very solvable problem. But
incremental resizing and granular locking aren't going to happen for it,
so it'd really only be suitable for presenting a read-only (or possibly
read-mostly) view.
> Potential use cases for DHT include caches, in-memory database objects
> and working state for parallel execution.
Is there a more concrete example, i.e. a user we'd convert to this at
the same time as introducing this hashtable?
> There are a couple of things I'm not happy about in this version of
> DHT: the space overhead per item is too high (hash + wasted padding +
> next pointer), and the iterator system is overly complicated. I have
> another version in development which gets rid of the hash and padding
> and sometimes gets rid of the next pointer to achieve zero overhead,
> and I'm working on a simpler iteration algorithm that keeps the
> current guarantees but doesn't need to reorder bucket chains. More on
> that, with some notes on performance and a testing module, soon.
FWIW, my experimentation showed that getting rid of the hash itself is
quite often *NOT* a worthwhile tradeof. Comparing keys and recomputing
hashes is often quite expensive (e.g. for GROUP BY it's where the
majority of time is spent after the simplehash patch).
Greetings,
Andres Freund
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andres Freund | 2016-10-04 22:39:14 | Idea: Tell compiler palloc() et al return aligned values |
| Previous Message | Gilles Darold | 2016-10-04 22:12:31 | Re: proposal: psql \setfileref |