From: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | tgl(at)sss(dot)pgh(dot)pa(dot)us |
Cc: | andres(at)anarazel(dot)de, alvherre(at)2ndquadrant(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, andrew(at)tao11(dot)riddles(dot)org(dot)uk, david(dot)rowley(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Using POPCNT and other advanced bit manipulation instructions |
Date: | 2019-02-15 01:46:01 |
Message-ID: | 20190215.104601.261252390.horiguchi.kyotaro@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
At Thu, 14 Feb 2019 16:45:38 -0500, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in <822(dot)1550180738(at)sss(dot)pgh(dot)pa(dot)us>
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2019-02-14 15:47:13 -0300, Alvaro Herrera wrote:
> >> Hah, I just realized you have to add -mlzcnt in order for these builtins
> >> to use the lzcnt instructions. It goes from something like
> >>
> >> bsrq %rax, %rax
> >> xorq $63, %rax
>
> > I'm confused how this is a general count leading zero operation? Did you
> > use constants or something that allowed ot infer a range in the test? If
> > so the compiler probably did some optimizations allowing it to do the
> > above.
>
> No. If you compile
>
> int myclz(unsigned long long x)
> {
> return __builtin_clzll(x);
> }
>
> at -O2, on just about any x86_64 gcc, you will get
>
> myclz:
> .LFB1:
> .cfi_startproc
> bsrq %rdi, %rax
> xorq $63, %rax
> ret
> .cfi_endproc
>
I understand that the behavior of __builtin_c[tl]z(0) is
undefined from the reason, they convert to bs[rf]. So if we use
these builtins, additional check is required.
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
> Built-in Function: int __builtin_clz (unsigned int x)
> Returns the number of leading 0-bits in x, starting at the most
> significant bit position. If x is 0, the result is undefined.
>
> Built-in Function: int __builtin_ctz (unsigned int x)
> Returns the number of trailing 0-bits in x, starting at the
> least significant bit position. If x is 0, the result is
> undefined.
And even worse lzcntx is accidentially "fallback"s to bsrx on
unsupported CPUs, which leads to bogus results.
__builtin_clzll(8) = 3, which should be 60.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2019-02-15 02:08:41 | Re: pg11.1: dsa_area could not attach to segment |
Previous Message | Andreas Karlsson | 2019-02-15 01:41:13 | Re: proposal: pg_restore --convert-to-text |