Re: Using POPCNT and other advanced bit manipulation instructions

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Using POPCNT and other advanced bit manipulation instructions
Date: 2019-02-14 04:44:08
Message-ID: 14024.1550119448@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> On Thu, Feb 14, 2019 at 4:38 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I'd be inclined to rip out all of the run-time-detection logic here;
>> I doubt any of it is buying anything that's worth the price of an
>> indirect call.

> No view on that but apparently there were Intel Atom and AMD C chips
> sold in the early part of this decade that lack POPCNT so I suspect
> the distros can't ship software that requires it with no fallback.

Ah, I was not looking at the business with the optional -mpopcnt
compiler flag. I agree that we probably should not assume that
code compiled with that will run anywhere. But it's silly to build
all this infrastructure and then throw away the opportunity to
optimize for anything but late-model Intel.

A survey of the buildfarm results so far says that __builtin_clz
and __builtin_ctz exist just about everywhere, and even
__builtin_popcount is available on some non-Intel architectures.
It is reasonable to assume that those builtins are faster than
the C equivalents if they exist. It's reasonable to assume that
even on old-school Intel hardware.

The way this should have been done is to have a separate file
that's compiled with -mpopcnt if the compiler has that (and
has the builtins), and for the mainline file to have "slow"
versions that use the less-optimized builtins if available,
and only fall back to raw C code if not HAVE__BUILTIN_WHATEVER.

Also, in

#if defined(HAVE__GET_CPUID) && defined(HAVE__BUILTIN_POPCOUNT)

static bool
pg_popcount_available(void)
{
unsigned int exx[4] = { 0, 0, 0, 0 };

#if defined(HAVE__GET_CPUID)
__get_cpuid(1, &exx[0], &exx[1], &exx[2], &exx[3]);
#elif defined(HAVE__CPUID)
__cpuid(exx, 1);
#else
#error cpuid instruction not available
#endif

return (exx[2] & (1 << 23)) != 0; /* POPCNT */
}
#endif

it's obvious to the naked eye that the __cpuid() and #error
branches are unreachable because of the outer #if. I don't
think that was the design intention.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-02-14 05:24:50 Re: Using POPCNT and other advanced bit manipulation instructions
Previous Message Thomas Munro 2019-02-14 04:20:19 Re: Using POPCNT and other advanced bit manipulation instructions