Re: [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeremy Kerr <jk(at)ozlabs(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex
Date: 2009-07-20 18:00:00
Message-ID: 4A64B0A0.80107@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Jeremy Kerr <jk(at)ozlabs(dot)org> writes:
>> Rather than testing single bits in a loop, change AllocSetFreeIndex to
>> use the __builtin_clz() function to calculate the chunk index.
>
>> This requires a new check for __builtin_clz in the configure script.
>
>> Results in a ~2% performance increase on sysbench on PowerPC.
>
> I did some performance testing on this by extracting the
> AllocSetFreeIndex function into a standalone test program that just
> executed it a lot of times in a loop. And there's a problem: on
> x86_64 it is not much of a win. The code sequence that gcc generates
> for __builtin_clz is basically
>
> bsrl %eax, %eax
> xorl $31, %eax
>
> and it turns out that Intel hasn't seen fit to put a lot of effort into
> the BSR instruction. It's constant time, all right, but on most of
> their CPUs that constant time is like 8 or 16 times slower than an ADD;
> cf http://www.intel.com/Assets/PDF/manual/248966.pdf

hmm interesting - I don't have the exact numbers any more but that
patch(or a previous version of it) definitly showed a noticable
improvement when I tested with sysbench on a current generation Intel
Nehalem...

Stefan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua Brindle 2009-07-20 18:05:38 Re: [PATCH] SE-PgSQL/tiny rev.2193
Previous Message Jaime Casanova 2009-07-20 17:44:25 Re: pg_stat_activity.application_name