Re: Spinlocks, yet again: analysis and proposed patches

From: "Michael Paesold" <mpaesold(at)gmx(dot)at>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>, "Stephen Frost" <sfrost(at)snowman(dot)net>
Subject: Re: Spinlocks, yet again: analysis and proposed patches
Date: 2005-09-14 07:41:46
Message-ID: 017b01c5b8ff$c3280bc0$0f01a8c0@zaphod
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> "Michael Paesold" <mpaesold(at)gmx(dot)at> writes:
>> To have other data, I have retested the patches on a single-cpu Intel P4
>> 3GHz w/ HT (i.e. 2 virtual cpus), no EM64T. Comparing to the 2,4
>> dual-Xeon
>> results it's clear that this is in reality only one cpu. While the
>> runtime
>> for N=1 is better than the other system, for N=4 it's already worse. The
>> situation with the patches is quite different, though. Unfortunatly.
>
>> CVS tip from 2005-09-12:
>> 1: 36s 2: 77s (cpu ~85%) 4: 159s (cpu ~98%)
>
>> only slock-no-cmpb:
>> 1: 36s 2: 81s (cpu ~79%) 4: 177s (cpu ~94%)
>> (doesn't help this time)
>
> Hm. This is the first configuration we've seen in which slock-no-cmpb
> was a loss. Could you double-check that result?

The first tests were compiled with
CFLAGS='-O2 -mcpu=pentium4 -march=pentium4'. I had redone the tests with
just CFLAGS='-O2' yesterday. The difference was just about a second, but the
result from the patch was the same. The results for N=4 and N=8 show the
positive effect more clearly.

configure: CFLAGS='-O2' --enable-casserts
On RHEL 4.1, gcc (GCC) 3.4.3 20050227 (Red Hat 3.4.3-22.1)

CVS tip from 2005-09-12:
1: 37s 2: 78s 4: 159s 8: 324

only slock-no-cmpb:
1: 37s 2: 82s (5%) 4: 178s (12%) 8: 362 (12%)

configure: --enable-casserts

(Btw. I have always done "make clean ; make ; make install" between tests)

Best Regards,
Michael Paesold

> I can't see any reasonable way to do runtime switching of the cmpb test
> --- whatever logic we put in to control it would cost as much or more
> than the cmpb anyway :-(. I think that has to be a compile-time choice.
> From my perspective it'd be acceptable to remove the cmpb only for
> x86_64, since only there does it seem to be a really significant win.
> On the other hand it seems that removing the cmpb is a net win on most
> x86 setups too, so maybe we should just do it and accept that there are
> some cases where it's not perfect.

How many test cases do we have yet?
Summary of the effects without the cmpb instruction seems to be:

8-way Opteron: better
Dual/HT Xeon w/o EM64T: better
Dual/HT EM64T: better for N<=cpus, worse for N>cpus (Stephen's)
HT P4 w/o EM64T: worse (stronger for N>cpus)

Have I missed other reports that did test the slock-no-cmpb.patch alone?
Two of the systems with positive effects are x86_64, one is an older
high-end Intel x86 chip. The negative effect is on a low-cost Pentium 4 with
only hyper threading. According to the mentions thread's title, this was an
optimization for hyperthreading, not regular multi-cpus.

We could have more data, especially on newer and high-end systems. Could
some of you test the slock-no-cmpb.patch? You'll need an otherwise idle
system to get repeatable results.

http://archives.postgresql.org/pgsql-hackers/2005-09/msg00565.php
http://archives.postgresql.org/pgsql-hackers/2005-09/msg00566.php

I have re-attached the relevant files from Tom's posts because in the
archive it's not clear anymore what should go into which file. See
instructions in the first messages above.

The patch applies to CVS tip with
patch -p1 < slock-no-cmpb.patch

Best Regards,
Michael Paesold

Attachment Content-Type Size
slock-no-cmpb.patch application/octet-stream 869 bytes
test_setup.sql application/octet-stream 1.1 KB
test_run_small.sql application/octet-stream 205 bytes
startn.sh application/octet-stream 145 bytes

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paesold 2005-09-14 08:01:30 Re: postgresql CVS callgraph data from dbt2
Previous Message Greg Stark 2005-09-14 07:30:05 Re: Spinlocks, yet again: analysis and proposed patches