Skip site navigation (1) Skip section navigation (2)

Re: Proposal of tunable fix for scalability of 8.4

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Matthew Wakeling <matthew(at)flymine(dot)org>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Proposal of tunable fix for scalability of 8.4
Date: 2009-03-18 11:20:16
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
Matthew Wakeling wrote:
> On Sat, 14 Mar 2009, Heikki Linnakangas wrote:
>> It's going require some hard thinking to bust that bottleneck. I've 
>> sometimes thought about maintaining a pre-calculated array of 
>> in-progress XIDs in shared memory. GetSnapshotData would simply 
>> memcpy() that to private memory, instead of collecting the xids from 
>> ProcArray.
> Shifting the contention from reading that data to altering it. But that 
> would probably be quite a lot fewer times, so it would be a benefit.

It's true that it would shift work from reading (GetSnapshotData) to 
modifying (xact end) the ProcArray. Which could actually be much worse: 
when modifying, you hold an ExclusiveLock, but readers only hold a 
SharedLock. I don't think it's that bad in reality since at transaction 
end you would only need to remove your own xid from an array. That 
should be very fast, especially if you know exactly where in the array 
your own xid is.

> On Sat, 14 Mar 2009, Tom Lane wrote:
>> Now the fly in the ointment is that there would need to be some way to
>> ensure that we didn't write data out to disk until it was valid; in
>> particular how do we implement a request to flush WAL up to a particular
>> LSN value, when maybe some of the records before that haven't been fully
>> transferred into the buffers yet?  The best idea I've thought of so far
>> is shared/exclusive locks on the individual WAL buffer pages, with the
>> rather unusual behavior that writers of the page would take shared lock
>> and only the reader (he who has to dump to disk) would take exclusive
>> lock.  But maybe there's a better way.  Currently I don't believe that
>> dumping a WAL buffer (WALWriteLock) blocks insertion of new WAL data,
>> and it would be nice to preserve that property.
> The writers would need to take a shared lock on the page before 
> releasing the lock that marshals access to the "how long is the log" 
> data. Other than that, your idea would work.
> An alternative would be to maintain a concurrent linked list of WAL 
> writes in progress. An entry would be added to the tail every time a new 
> writer is generated, marking the end of the log. When a writer finishes, 
> it can remove the entry from the list very cheaply and with very little 
> contention. The reader (who dumps the WAL to disc) need only look at the 
> head of the list to find out how far the log is completed, because the 
> list is guaranteed to be in order of position in the log.

A linked list or an array of in-progress writes was my first thought as 
well. But the real problem is: how does the reader wait until all WAL up 
to X have been written? It could poll, but that's inefficient.

   Heikki Linnakangas

In response to


pgsql-performance by date

Next:From: Heikki LinnakangasDate: 2009-03-18 11:33:59
Subject: Re: Performance of archive logging in a PITR restore
Previous:From: Simon RiggsDate: 2009-03-18 07:53:53
Subject: Re: Proposal of tunable fix for scalability of 8.4

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group