From: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> |
Subject: | call popcount32/64 directly on non-x86 platforms |
Date: | 2021-08-11 17:11:22 |
Message-ID: | CAFBsxsE7otwnfA36Ly44zZO+b7AEWHRFANxR1h1kxveEV=ghLQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Currently, all platforms must indirect through a function pointer to call
popcount on a word-sized input, even though we don't arrange for a fast
implementation on non-x86 to make it worthwhile.
0001 moves some declarations around so that "slow" popcount functions are
called directly on non-x86 platforms.
0002 was an idea to simplify and unify the coding for the slow functions.
Also attached is a test module for building microbenchmarks.
On a Power8 machine using gcc 4.8, and running
time ./inst/bin/psql -c 'select drive_popcount(100000, 1024)'
I get
master: 647ms
0001: 183ms
0002: 228ms
So 0001 is a clear winner on that platform. 0002 is still good, but slower
than 0001 for some reason, and it turns out that on master, gcc does emit a
popcnt instruction from the intrinsic:
0000000000000000 <pg_popcount32_slow>:
0: f4 02 63 7c popcntw r3,r3
4: b4 07 63 7c extsw r3,r3
8: 20 00 80 4e blr
...
The gcc docs mention a flag for this, but I'm not sure why it seems not to
need it:
Maybe that's because the machine I used was ppc64le, but I'm not sure a ppc
binary built like this is portable to other hardware. For that reason,
maybe 0002 is a good idea.
--
John Naylor
EDB: http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
popcount-test-module.patch | application/x-patch | 3.1 KB |
v1-0001-Use-direct-function-calls-for-pg_popcount-32-64-o.patch | application/x-patch | 7.1 KB |
v1-0002-Replace-intrinsics-in-pg_popcount-_slow-with-pure.patch | application/x-patch | 3.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Suraj Khamkar | 2021-08-11 17:34:10 | Re: Tab completion for CREATE SCHEMAAUTHORIZATION |
Previous Message | Robert Haas | 2021-08-11 16:39:51 | Re: Next Steps with Hash Indexes |