Re: Using POPCNT and other advanced bit manipulation instructions

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Subject: Re: Using POPCNT and other advanced bit manipulation instructions
Date: 2019-01-04 12:38:31
Message-ID: CAKJS1f951NWLpt9T1JdMmfRvo3PCFMLVVuXi1EqV_p5ZULNMJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for looking at this.

On Thu, 20 Dec 2018 at 23:56, Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
> I've checked for Clang 6, it turns out that indeed it generates popcnt without
> any macro, but only in one place for bloom_prop_bits_set. After looking at this
> function it seems that it would be benefitial to actually use popcnt there too.

Yeah, that's the pattern that's mentioned in
https://lemire.me/blog/2016/05/23/the-surprising-cleverness-of-modern-compilers/
It would need to be changed to call the popcount function. This
existing makes me a bit more worried that some extension could be
using a similar pattern and end up being compiled with -mpopcnt due to
pg_config having that CFLAG. That's all fine until the binary makes
it's way over to a machine without that instruction.

> > I am able to measure performance gains from the patch. In a 3.4GB
> > table containing a single column with just 10 statistics targets, I
> > got the following times after running ANALYZE on the table.
>
> I've tested it too a bit, and got similar results when the patched version is
> slightly faster. But then I wonder if popcnt is the best solution here, since
> after some short research I found a paper [1], where authors claim that:
>
> Maybe surprisingly, we show that a vectorized approach using SIMD
> instructions can be twice as fast as using the dedicated instructions on
> recent Intel processors.
>
>
> [1]: https://arxiv.org/pdf/1611.07612.pdf

I can't imagine that using the number_of_ones[] array processing
8-bits at a time would be slower than POPCNT though.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-01-04 12:48:57 Re: Statement-level Triggers For Uniqueness Checks
Previous Message Alvaro Herrera 2019-01-04 12:34:44 Re: pg_dump multi VALUES INSERT