Off-Topic: Float point division academia? (was: Re: Spinlock backoff algorithm)

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magne Mæhre <Magne(dot)Mahre(at)sun(dot)com>
Subject: Off-Topic: Float point division academia? (was: Re: Spinlock backoff algorithm)
Date: 2007-11-16 13:59:37
Message-ID: 473DA249.1040303@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Martijn van Oosterhout wrote:
> On Wed, Nov 14, 2007 at 06:57:18PM -0800, Josh Berkus wrote:
>
>> The question is, for our most common platforms (like AMD and Intel) is the FPU
>> notably slower/more contended than integer division? I'd the impression that
>> it was, but my knowledge of chip architectures is liable to be out of date.
>>
>> Can we have a hardware geek speak up?
>>
> I did some searching and the best figures I can find for integer
> division is 15-40 cycles whereas for floating point the best I found was 5
> cycles. The FP units on modern processer as very highly optimsed, but
> the figures quoted are for pipelined division, which is not what we're
> doing here.
>
I agree with Tom that the although the user might be experiencing odd
issues as PostgreSQL was not designed for 32 light-weight cores like the
Niagara, that this particular issue is unlikely to be the primary issue.

In response to the academic discussion above, did you also look up the
time to convert from INT -> FLOAT, and then FLOAT -> INT?

Also, if divided by a constant, there are automatic optimization tricks
performed, such as:

$ cat a.c
int f(int i)
{
return i / 7;
}
$ gcc -O3 -S a.c
$ cat a.s
...
.globl f
.type f, @function
f:
.LFB2:
movl %edi, %eax
movl $-1840700269, %edx
imull %edx
leal (%rdx,%rdi), %eax
sarl $31, %edi
sarl $2, %eax
subl %edi, %eax
ret

No divide. No need to convert from INT -> FLOAT -> INT.

Here is non-conclusive evidence both that integer division by a constant
may still be faster on a fairly modern processor (AMD X2 3800+), and
that, as Tom believes, this issue is a waste of time:

$ cat a.c
#define MAX_RANDOM_VALUE (0x7FFFFFFF)

int old_f(int i)
{
return ((double) i / (double) MAX_RANDOM_VALUE) + 0.5;
}

int new_f(int i)
{
return (i + MAX_RANDOM_VALUE - 1) / MAX_RANDOM_VALUE;
}
$ cat old.c
int old_f(int);
int new_f(int);

int main()
{
for (int i = 0; i < 100000000; i++)
old_f(50);
return 0;
}
$ cat new.c
int old_f(int);
int new_f(int);

int main()
{
for (int i = 0; i < 100000000; i++)
new_f(50);
return 0;
}
$ gcc -o old -O3 -std=c99 old.c a.c
$ gcc -o new -O3 -std=c99 new.c a.c
$ time ./old
./old 1.58s user 0.00s system 99% cpu 1.590 total
$ time ./old
./old 1.31s user 0.00s system 99% cpu 1.314 total
$ time ./old
./old 1.14s user 0.00s system 99% cpu 1.142 total
$ time ./old
./old 1.46s user 0.00s system 99% cpu 1.461 total
$ time ./new
./new 0.68s user 0.00s system 99% cpu 0.682 total
$ time ./new
./new 0.70s user 0.01s system 99% cpu 0.703 total
$ time ./new
./new 0.70s user 0.00s system 99% cpu 0.705 total
$ time ./new
./new 0.70s user 0.00s system 99% cpu 0.703 total

I say non-conclusive, because:
1) my CPU is probably doing branch prediction and may possibly be doing
out-of-order execution. Since it has more integer units that floating
point units, this would work in the favour of integers. The contention
case described by the original poster does not allow for multiple
integer units to be in use at the same time.
2) as Tom has already said, 100 million divides in either 0.7 seconds or
1.5 seconds, is still orders of magnitude more than the contended case
would ever be called.

For the future, please consider doing integer division when working with
integers and the rvalue is a constant. For this case, it doesn't seem
like it is worth it to risk breaking the code.

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2007-11-16 15:37:58 pg_ctl -t N register ??
Previous Message Zdenek Kotala 2007-11-16 13:46:13 AllocSetStats uses fprintf instead of elog