Re: Use POPCNT on MSVC

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use POPCNT on MSVC
Date: 2021-08-04 03:36:00
Message-ID: CAApHDvrONNcYxGV6C0O3ZmaL0BvXBWY+rBOCBuYcQVUOURwhkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 3 Aug 2021 at 22:43, John Naylor <john(dot)naylor(at)enterprisedb(dot)com> wrote:
> 1. the __popcnt64() intrinsic is put inside pg_popcount64_asm(), which is a bit of a misnomer since it's not assembly. Renaming s/_asm/_fast/ would help it look better. But then looking around, other platforms have intrinsics coded, but for some reason they're put in pg_popcount64_slow(), where the compiler will emit instructions we could easily write ourselves in C (and without #ifdefs) since without the right CFLAGS these intrinsics won't emit a popcnt instruction. Is MSVC different in that regard? If so, it might be worth a comment.

Yeah, the names no longer fit so well after stuffing the MSVC
intrinsic into the asm function. The reason I did it that way is down
to what I read in the docs. Namely:

"If you run code that uses these intrinsics on hardware that doesn't
support the popcnt instruction, the results are unpredictable."

So, it has to go somewhere where we're sure the CPU supports POPCNT
and that seemed like the correct place.

From what I understand of GCC and __builtin_popcountl(), the code
it'll output will depend on the -mpopcnt flag. So having
__builtin_popcountl() does not mean the popcnt instruction will be
used. The Microsoft documentation indicates that's not the case for
__popcnt().

> 2. (defined(_MSC_VER) && defined(_WIN64) lead to a runtime check for the CPUID, which is fine, but now next to it HAVE_X86_64_POPCNTQ looks strange because the latter symbol comes from a specific configure test -- the two don't seem equivalent, but maybe they are because of what MSVC does? That would be nice to spell out here.

The problem there is that we define HAVE_X86_64_POPCNTQ based on the
outcome of configure so it does not get set for MSVC. Maybe it's
worth coming up with some other constant that can be defined or we
could just do:

#if defined(_MSC_VER) && defined(_WIN64)
#define HAVE_X86_64_POPCNTQ
#endif

> I know the present state of affairs works a bit different than what you proposed a couple years ago and that something had to change for portability reasons I didn't quite understand, but I think if we change things here we should at least try to have things fit together a bit more nicely.

I think the reason for the asm is that __builtin_popcountl does not
mean popcnt will be used. Maybe we could have done something like
compile pg_bitutils.c with -mpopcnt, and have kept the run-time check,
but the problem there is that means that the compiler might end up
using that instruction in some other function in that file that we
don't want it to. It looks like my patch in [1] did pass the -mpopcnt
flag which Tom fixed.

> (Side note, but sort of related to #1 above: non-x86 platforms have to indirect through a function pointer even though they have no fast implementation to make it worth their while. It would be better for them if the "slow" implementation was called static inline or at least a direct function call, but that's a separate thread.)

hmm yeah. I see there's a few more usages of pg_popcount() than when I
looked last. It would be fairly easy to get rid of the
pg_popcount64/pg_popcount32 function call in that. A separate patch
though.

David

[1] https://www.postgresql.org/message-id/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Nancarrow 2021-08-04 03:41:11 Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Previous Message Amit Kapila 2021-08-04 03:17:30 Re: Failed transaction statistics to measure the logical replication progress