Re: Using POPCNT and other advanced bit manipulation instructions

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Subject: Re: Using POPCNT and other advanced bit manipulation instructions
Date: 2019-01-08 12:08:15
Message-ID: CA+q6zcWaOS50A16_-RsisnFaA3RDkW0=BUoZJgVgL=27+j6tFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Fri, Jan 4, 2019 at 1:38 PM David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>
> On Thu, 20 Dec 2018 at 23:56, Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
> > I've checked for Clang 6, it turns out that indeed it generates popcnt without
> > any macro, but only in one place for bloom_prop_bits_set. After looking at this
> > function it seems that it would be benefitial to actually use popcnt there too.
>
> Yeah, that's the pattern that's mentioned in
> https://lemire.me/blog/2016/05/23/the-surprising-cleverness-of-modern-compilers/
> It would need to be changed to call the popcount function. This
> existing makes me a bit more worried that some extension could be
> using a similar pattern and end up being compiled with -mpopcnt due to
> pg_config having that CFLAG. That's all fine until the binary makes
> it's way over to a machine without that instruction.

It surprises me, that it's not that obvious how to disable this feature for
clang. I guess one should be able to turn it off by invoking opt manually:

clang -S -mpopcnt -emit-llvm *.c
opt -S -mattr=+popcnt <all the options without -loop-idiom> *.ll
llc -mattr=+popcnt *.optimized.ll
clang -mpopcnt *optimized.s

But for some reason this doesn't work for me (popcnt is not appearing in
the first place).

> > > I am able to measure performance gains from the patch. In a 3.4GB
> > > table containing a single column with just 10 statistics targets, I
> > > got the following times after running ANALYZE on the table.
> >
> > I've tested it too a bit, and got similar results when the patched version is
> > slightly faster. But then I wonder if popcnt is the best solution here, since
> > after some short research I found a paper [1], where authors claim that:
> >
> > Maybe surprisingly, we show that a vectorized approach using SIMD
> > instructions can be twice as fast as using the dedicated instructions on
> > recent Intel processors.
> >
> >
> > [1]: https://arxiv.org/pdf/1611.07612.pdf
>
> I can't imagine that using the number_of_ones[] array processing
> 8-bits at a time would be slower than POPCNT though.

Yeah, probably you're right. If I understand correctly even with the lookup
table in the cache the access would be a bit slower than a POPCNT instruction.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2019-01-08 12:20:29 Re: commitfest: When are you assigned patches to review?
Previous Message Michael Banck 2019-01-08 12:03:25 Re: Offline enabling/disabling of data checksums