Re: Proposal of tunable fix for scalability of 8.4

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Scott Carey <scott(at)richrelevance(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)sun(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Proposal of tunable fix for scalability of 8.4
Date: 2009-03-18 07:48:38
Message-ID: 1237362518.3953.181.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On Sat, 2009-03-14 at 12:09 -0400, Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> > WALInsertLock is also quite high on Jignesh's list. That I've seen
> > become the bottleneck on other tests too.
>
> Yeah, that's been seen to be an issue before. I had the germ of an idea
> about how to fix that:
>
> ... with no lock, determine size of WAL record ...
> obtain WALInsertLock
> identify WAL start address of my record, advance insert pointer
> past record end
> *release* WALInsertLock
> without lock, copy record into the space just reserved
>
> The idea here is to allow parallelization of the copying of data into
> the buffers. The hold time on WALInsertLock would be very short. Maybe
> it could even become a spinlock, though I'm not sure, because the
> "advance insert pointer" bit is more complicated than it looks (you have
> to allow for the extra overhead when crossing a WAL page boundary).
>
> Now the fly in the ointment is that there would need to be some way to
> ensure that we didn't write data out to disk until it was valid; in
> particular how do we implement a request to flush WAL up to a particular
> LSN value, when maybe some of the records before that haven't been fully
> transferred into the buffers yet? The best idea I've thought of so far
> is shared/exclusive locks on the individual WAL buffer pages, with the
> rather unusual behavior that writers of the page would take shared lock
> and only the reader (he who has to dump to disk) would take exclusive
> lock. But maybe there's a better way. Currently I don't believe that
> dumping a WAL buffer (WALWriteLock) blocks insertion of new WAL data,
> and it would be nice to preserve that property.

Yeh, that's just what we'd discussed previously:
http://markmail.org/message/gectqy3yzvjs2hru#query:Reworking%20WAL%
20locking+page:1+mid:gectqy3yzvjs2hru+state:results

Are you thinking of doing this for 8.4? :-)

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Simon Riggs 2009-03-18 07:53:53 Re: Proposal of tunable fix for scalability of 8.4
Previous Message Tom Lane 2009-03-18 03:24:59 Re: Extremely slow intarray index creation and inserts.