Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From: James Coleman <jtc331(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Date: 2020-04-29 14:26:12
Message-ID: CAAaqYe-Rg4EMKGWw-r3PHsLhCVWDBWsK6nwHjugnWBk5631DUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 28, 2020 at 7:05 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2020-04-28 18:22:20 -0400, James Coleman wrote:
> > I cc'd Andres given his commit introduced simplehash, so I figured
> > he'd probably have a few pointers on when each one might be useful.
> > [...]
> > Do you have any thoughts on what the trade-offs/use-cases etc. are for
> > dynahash versus simple hash?
> >
> > From reading the commit message in b30d3ea824c it seems like simple
> > hash is faster and optimized for CPU cache benefits. The comments at
> > the top of simplehash.h also discourage it's use in non
> > performance/space sensitive uses, but there isn't anything I can see
> > that explicitly tries to discuss when dynahash is useful, etc.
>
> Benefits of dynahash (chained hashtable):
> - supports partitioning, useful for shared memory accessed under locks
> - better performance for large entries, as they don't need to be moved
> around in case of hash conflicts
> - stable pointers to hash table entries
>
> Benefits of simplehash (open addressing hash table):
> - no indirect function calls, known structure sizes, due to "templated"
> code generation (these show up substantially in profiles for dynahash)
> - considerably faster for small entries due to previous point, and due
> open addressing hash tables having better cache behaviour than chained
> hashtables
> - once set-up the interface is type safe and easier to use
> - no overhead of a separate memory context etc
>
>
> > Given the performance notes in that commit message, I thinking
> > switching to simple hash is worth it.
>
> Seems plausible to me.
>
>
> > But I also wonder if there might be some value in a README or comments
> > addition that would be a guide to what the various hash
> > implementations are useful for. If there's interest, I could try to
> > type something short up so that we have something to make the code
> > base a bit more discoverable.
>
> That'd make sense to me.

Cool, I'll work on that as I have time then.

One question: what is the reasoning behind having SH_STORE_HASH? The
only things I could imagine would be a case where you have external
pointers to some set of values or need to be able to use the hash for
other reasons besides the hash table (and so can avoid calculating it
twice), but maybe I'm missing something.

Thanks,
James

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonah H. Harris 2020-04-29 14:33:46 Re: Proposing WITH ITERATIVE
Previous Message Jürgen Purtz 2020-04-29 14:13:46 Re: Additional Chapter for Tutorial