Re: Use compiler intrinsics for bit ops in hash

From: David Fetter <david(at)fetter(dot)org>
To: Jesse Zhang <sbjesse(at)gmail(dot)com>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Use compiler intrinsics for bit ops in hash
Date: 2020-01-31 15:59:18
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 15, 2020 at 03:45:12PM -0800, Jesse Zhang wrote:
> On Tue, Jan 14, 2020 at 2:09 PM David Fetter <david(at)fetter(dot)org> wrote:
> > > The changes in hash AM and SIMPLEHASH do look like a net positive
> > > improvement. My biggest cringe might be in pg_bitutils:
> > >
> > > 1. Is ceil_log2_64 dead code?
> >
> > Let's call it nascent code. I suspect there are places it could go, if
> > I look for them. Also, it seemed silly to have one without the other.
> >
> While not absolutely required, I'd like us to find at least one
> place and start using it. (Clang also nags at me when we have
> unused functions).

Done in the expanded patches attached.

> > On Tue, Jan 14, 2020 at 12:21:41PM -0800, Jesse Zhang wrote:
> > > 4. It seems like you *really* would like an operation like LZCNT in x86
> > > (first appearing in Haswell) that is well defined on zero input. ISTM
> > > the alternatives are:
> > >
> > > a) Special case 1. That seems straightforward, but the branching cost
> > > on a seemingly unlikely condition seems to be a lot of performance
> > > loss
> > >
> > > b) Use architecture specific intrinsic (and possibly with CPUID
> > > shenanigans) like __builtin_ia32_lzcnt_u64 on x86 and use the CLZ
> > > intrinsic elsewhere. The CLZ GCC intrinsic seems to map to
> > > instructions that are well defined on zero in most ISA's other than
> > > x86, so maybe we can get away with special-casing x86?
> i. We can detect LZCNT instruction by checking one of the
> "extended feature" (EAX=80000001) bits using CPUID. Unlike the
> "basic features" (EAX=1), extended feature flags have been more
> vendor-specific, but fortunately it seems that the feature bit
> for LZCNT is the same [1][2].
> ii. We'll most likely still need to provide a fallback
> implementation for processors that don't have LZCNT (either
> because they are from a different vendor, or an older Intel/AMD
> processor). I wonder if simply checking for 1 is "good enough".
> Maybe a micro benchmark is in order?

I'm not sure how I'd run one on the architectures we support. What
I've done here is generalize our implementation to be basically like
LZCNT and TZCNT at the cost of a brief branch that might go away at

> 2. (My least favorite) use inline asm (a la our popcount
> implementation).

Yeah, I'd like to fix that, but I kept the scope of this one
relatively narrow.

David Fetter <david(at)fetter(dot)org>
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres:

Attachment Content-Type Size
v4-0001-de-long-ify.patch text/x-diff 57.2 KB
v4-0002-Spread-bitutils-into-hashing.patch text/x-diff 11.4 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-01-31 16:23:08 Re: standby apply lag on inactive servers
Previous Message Tom Lane 2020-01-31 15:13:19 Re: Marking some contrib modules as trusted extensions