Re: Proposal of tunable fix for scalability of 8.4

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Scott Carey <scott(at)richrelevance(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)sun(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Proposal of tunable fix for scalability of 8.4
Date: 2009-03-14 16:09:49
Message-ID: 5077.1237046989@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> WALInsertLock is also quite high on Jignesh's list. That I've seen
> become the bottleneck on other tests too.

Yeah, that's been seen to be an issue before. I had the germ of an idea
about how to fix that:

... with no lock, determine size of WAL record ...
obtain WALInsertLock
identify WAL start address of my record, advance insert pointer
past record end
*release* WALInsertLock
without lock, copy record into the space just reserved

The idea here is to allow parallelization of the copying of data into
the buffers. The hold time on WALInsertLock would be very short. Maybe
it could even become a spinlock, though I'm not sure, because the
"advance insert pointer" bit is more complicated than it looks (you have
to allow for the extra overhead when crossing a WAL page boundary).

Now the fly in the ointment is that there would need to be some way to
ensure that we didn't write data out to disk until it was valid; in
particular how do we implement a request to flush WAL up to a particular
LSN value, when maybe some of the records before that haven't been fully
transferred into the buffers yet? The best idea I've thought of so far
is shared/exclusive locks on the individual WAL buffer pages, with the
rather unusual behavior that writers of the page would take shared lock
and only the reader (he who has to dump to disk) would take exclusive
lock. But maybe there's a better way. Currently I don't believe that
dumping a WAL buffer (WALWriteLock) blocks insertion of new WAL data,
and it would be nice to preserve that property.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Matteo Beccati 2009-03-15 02:38:17 Re: Query performance over a large proportion of data
Previous Message Tom Lane 2009-03-14 15:52:05 Re: 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4