Re: Use simplehash.h instead of dynahash in SMgr

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use simplehash.h instead of dynahash in SMgr
Date: 2021-06-22 01:55:10
Message-ID: CAApHDvq+bv13-M5JZN5Swfmy5gVWwCXZw-6hP=R6Hi1RitK0hA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 22 Jun 2021 at 03:43, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I kind of wonder if we really need four different hash table
> implementations (this being the third "generic" one, plus hash join
> has its own, and I may have forgotten others). Should we instead
> think about revising simplehash to gain the benefits of this patch?

hmm, yeah. I definitely agree with trying to have as much reusable
code as we can when we can. It certainly reduces maintenance and bugs
tend to be found more quickly too. It's a very worthy cause.

I did happen to think of this when I was copying swathes of code out
of simplehash.h. However, I decided that the two implementations are
sufficiently different that if I tried to merge them both into one .h
file, we'd have some unreadable and unmaintainable mess. I just don't
think their DNA is compatible enough for the two to be mated
successfully. For example, simplehash uses open addressing and
generichash does not. This means that things like iterating over the
table works completely differently. Lookups in generichash need to
perform an extra step to fetch the actual data from the segment
arrays. I think it would certainly be possible to merge the two, but
I just don't think it would be easy code to work on if we did that.

The good thing is that that the API between the two is very similar
and it's quite easy to swap one for the other. I did make changes
around memory allocation due to me being too cheap to zero memory when
I didn't need to and simplehash not having any means of allocating
memory without zeroing it.

I also think that there's just no one-size-fits-all hash table type.
simplehash will not perform well when the size of the stored element
is very large. There's simply too much memcpying to move data around
during insert/delete. simplehash will also have terrible iteration
performance in sparsely populated tables. However, simplehash will be
pretty much unbeatable for lookups where the element type is very
small, e.g single Datum, or an int. The CPU cache efficiency there
will be pretty much unbeatable.

I tried to document the advantages of each in the file header
comments. I should probably also add something to simplehash.h's
comments to mention generichash.h

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Japin Li 2021-06-22 02:07:37 Re: Fix for segfault in logical replication on master
Previous Message David Rowley 2021-06-22 01:38:33 Re: Use simplehash.h instead of dynahash in SMgr