Re: use CLZ instruction in AllocSetFreeIndex()

From: David Fetter <david(at)fetter(dot)org>
To: John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: use CLZ instruction in AllocSetFreeIndex()
Date: 2019-12-28 02:16:48
Message-ID: 20191228021648.GK32763@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 27, 2019 at 07:02:02PM -0500, John Naylor wrote:
> On Fri, Dec 27, 2019 at 11:05 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > John Naylor <john(dot)naylor(at)2ndquadrant(dot)com> writes:
> > > On Fri, Dec 27, 2019 at 9:54 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > >> ... but couldn't the
> > >> right shift be elided in favor of changing the constant we
> > >> subtract clz's result from? Shifting off those bits separately
> > >> made sense in the old implementation, but assuming that CLZ is
> > >> more or less constant-time, it doesn't with CLZ.
> >
> > > That makes sense -- I'll look into doing that.
> >
> > Actually, we could apply that insight to both code paths.
> > In the existing path, that requires assuming
> > ALLOCSET_NUM_FREELISTS+ALLOC_MINBITS <= 17, but that's OK.
> > (Nowadays I'd probably add a StaticAssert about that.)
>
> I tried that in the attached files and got these results:
>
> current 6.14s
> clz 4.52s
> clz no right shift 3.15s
> lookup table 5.56s
> lookup table no right shift 7.34s
>
> Here, "lookup table" refers to using the pg_leftmost_one_pos[] array
> and incrementing the result. Removing the shift operation from the CLZ
> case is clearly an improvement, and the main body goes from
>
> movabsq $34359738367, %rax
> addq %rax, %rdi
> shrq $3, %rdi
> bsrl %edi, %eax
> xorl $-32, %eax
> addl $33, %eax
>
> to
>
> decl %edi
> bsrl %edi, %eax
> xorl $-32, %eax
> addl $30, %eax
>
> The lookup table case is less clear. Removing the shift results in
> assembly that looks more like the C code and is slower for me. The
> standard lookup table code uses some magic constants and does its own
> constant folding by shifting 11 (8 + 3). In the absence of testing on
> platforms that will actually exercise this path, it seems the
> open-coded path should keep the shift for now. Thoughts?

It's probably worth doing the things you've found unambiguous gains
for as a patch, putting it on the next commitfest, and seeing what the
commitfest.cputube.org machinery has to say about it.

Maybe it'd be worth trying out a patch that enables CLZ for Windows,
but that seems like a separate issue.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-12-28 02:46:09 Re: Server crash with Master-Slave configuration.
Previous Message Maciek Sakrejda 2019-12-28 01:11:38 Re: Duplicate Workers entries in some EXPLAIN plans