Re: Spinlocks, yet again: analysis and proposed patches

From: Mark Wong <markw(at)osdl(dot)org>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Kreen <marko(at)l-t(dot)ee>, pgsql-hackers(at)postgresql(dot)org, Michael Paesold <mpaesold(at)gmx(dot)at>
Subject: Re: Spinlocks, yet again: analysis and proposed patches
Date: 2005-11-03 16:03:48
Message-ID: 200511031605.jA3G57W6031430@smtp.osdl.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 01 Nov 2005 07:32:32 +0000
Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On Mon, 2005-10-31 at 16:10 -0800, Mark Wong wrote:
> > On Thu, 20 Oct 2005 23:03:47 +0100
> > Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> >
> > > On Wed, 2005-10-19 at 14:07 -0700, Mark Wong wrote:
> > > > >
> > > > > This isn't exactly elegant coding, but it provides a useful improvement
> > > > > on an 8-way SMP box when run on 8.0 base. OK, lets be brutal: this looks
> > > > > pretty darn stupid. But it does follow the CPU optimization handbook
> > > > > advice and I did see a noticeable improvement in performance and a
> > > > > reduction in context switching.
> > >
> > > > > I'm not in a position to try this again now on 8.1beta, but I'd welcome
> > > > > a performance test result from anybody that is. I'll supply a patch
> > > > > against 8.1beta for anyone wanting to test this.
> > > >
> > > > Ok, I've produce a few results on a 4 way (8 core) POWER 5 system, which
> > > > I've just set up and probably needs a bit of tuning. I don't see much
> > > > difference but I'm wondering if the cacheline sizes are dramatically
> > > > different from Intel/AMD processors. I still need to take a closer look
> > > > to make sure I haven't grossly mistuned anything, but I'll let everyone
> > > > take a look:
> > >
> > > Well, the Power 5 architecture probably has the lowest overall memory
> > > delay you can get currently so in some ways that would negate the
> > > effects of the patch. (Cacheline is still 128 bytes, AFAICS). But it's
> > > clear the patch isn't significantly better (like it was with 8.0 when we
> > > tried this on the 8-way Itanium in Feb).
> > >
> > > > cvs 20051013
> > > > http://www.testing.osdl.org/projects/dbt2dev/results/dev4-014/19/
> > > > 2501 notpm
> > > >
> > > > cvs 20051013 w/ lw.patch
> > > > http://www.testing.osdl.org/projects/dbt2dev/results/dev4-014/20/
> > > > 2519 notpm
> > >
> > > Could you re-run with wal_buffers = 32 ? (Without patch) Thanks
> >
> > Ok, sorry for the delay. I've bumped up the wal_buffers to 2048 and
> > redid the disk layout. Here's where I'm at now:
> >
> > cvs 20051013
> > http://www.testing.osdl.org/projects/dbt2dev/results/dev4-014/40/
> > 3257 notpm
> >
> > cvs 20051013 w/ lw.patch
> > http://www.testing.osdl.org/projects/dbt2dev/results/dev4-014/42/
> > 3285 notpm
> >
> > Still not much of a difference with the patch. A quick glance over the
> > iostat data suggests I'm still not i/o bound, but the i/o wait is rather
> > high according to vmstat. Will try to see if there's anything else
> > obvious to get the load up higher.
>
> OK, thats fine. I'm glad there's some gain, but not much yet. I think we
> should leave out doing any more tests on lw.patch for now.
>
> Concerned about the awful checkpointing. Can you bump wal_buffers to
> 8192 just to make sure? Thats way too high, but just to prove it.
>
> We need to rdeuce the number of blocks to be written at checkpoint.
>
> bgwriter_all_maxpages 5 -> 15
> bgwriter_all_percent 0.333
> bgwriter_delay 200
> bgwriter_lru_maxpages 5 -> 7
> bgwriter_lru_percent 1
>
> shared_buffers set lower to 100000
> (which should cause some amusement on-list)

Okay, here goes, all with the same source base w/ the lw.patch:

http://www.testing.osdl.org/projects/dbt2dev/results/dev4-014/44/
only increased wal_buffers to 8192 from 2048
3242 notpm

http://www.testing.osdl.org/projects/dbt2dev/results/dev4-014/43/
only increased bgwriter_all_maxpages to 15, and bgwriter_lru_maxpages to 7
3019 notpm (but more interesting graph)

http://www.testing.osdl.org/projects/dbt2dev/results/dev4-014/45/
Same as the previously listen run with hared_buffers lowered to 10000
2503 notpm

Mark

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-11-03 16:11:42 Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags
Previous Message Alvaro Herrera 2005-11-03 16:01:24 Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags