Is this still a TODO?
Simon Riggs wrote:
> On Mon, 2007-07-30 at 20:20 +0100, Simon Riggs wrote:
> > Jignesh Shah's scalability testing on Solaris has revealed further
> > tuning opportunities surrounding the start and end of a transaction.
> > Tuning that should be especially important since async commit is likely
> > to allow much higher transaction rates than were previously possible.
> > There is strong contention on the ProcArrayLock in Exclusive mode, with
> > the top path being CommitTransaction(). This becomes clear as the number
> > of connections increases, but it seems likely that the contention can be
> > caused in a range of other circumstances. My thoughts on the causes of
> > this contention are that the following 3 tasks contend with each other
> > in the following way:
> > CommitTransaction(): takes ProcArrayLock Exclusive
> > but only needs access to one ProcArray element
> > waits for
> > GetSnapshotData():ProcArrayLock Shared
> > ReadNewTransactionId():XidGenLock Shared
> > which waits for
> > GetNextTransactionId()
> > takes XidGenLock Exclusive
> > ExtendCLOG(): takes ClogControlLock Exclusive, WALInsertLock Exclusive
> > two possible place where I/O is required
> > ExtendSubtrans(): takes SubtransControlLock()
> > one possible place where I/O is required
> > Avoids lock on ProcArrayLock: atomically updates one ProcArray element
> > or more simply:
> > CommitTransaction() -- i.e. once per transaction
> > waits for
> > GetSnapshotData() -- i.e. once per SQL statement
> > which waits for
> > GetNextTransactionId() -- i.e. once per transaction
> > This gives some goals for scalability improvements and some proposals.
> > (1) and (2) are proposals for 8.3 tuning, the others are directions for
> > further research.
> > Goal: Reduce total time that GetSnapshotData() waits for
> > GetNextTransactionId()
> The latest patch for lazy xid allocation reduces the number of times
> GetNextTransactionId() is called by eliminating the call entirely for
> read only transactions. That will reduce the number of waits and so will
> for most real world cases increase the scalability of Postgres.
> Right-mostly workloads will be slightly less scalable, so we should
> expect our TPC-C numbers to be slightly worse than our TPC-E numbers.
> We should retest to see whether the bottleneck has been moved
> sufficiently to allow us to avoid doing techniques (1), (2), (3), (5) or
> (6) at all.
> > 1. Increase size of Clog-specific BLCKSZ
> > Clog currently uses BLCKSZ to define the size of clog buffers. This can
> > be changed to use CLOG_BLCKSZ, which would then be set to 32768.
> > This will naturally increase the amount of memory allocated to the clog,
> > so we need not alter CLOG_BUFFERS above 8 if we do this (as previously
> > suggested, with successful results). This will also reduce the number of
> > ExtendClog() calls, which will probably reduce the overall contention
> > also.
> > 2. Perform ExtendClog() as a background activity
> > Background process can look at the next transactionid once each cycle
> > without holding any lock. If the xid is almost at the point where a new
> > clog page would be allocated, then it will allocate one prior to the new
> > page being absolutely required. Doing this as a background task would
> > mean that we do not need to hold the XidGenLock in exclusive mode while
> > we do this, which means that GetSnapshotData() and CommitTransaction()
> > would also be less likely to block. Also, if any clog writes need to be
> > performed when the page is moved forwards this would also be performed
> > in the background.
> > 3. Consider whether ProcArrayLock should use a new queued-shared lock
> > mode that puts a maximum wait time on ExclusiveLock requests. It would
> > be fairly hard to implement this well as a timer, but it might be
> > possible to place a limit on queue length. i.e. allow Share locks to be
> > granted immediately if a Shared holder already exists, but only if there
> > is a queue of no more than N exclusive mode requests queued. This might
> > prevent the worst cases of exclusive lock starvation.
> (4) is a general concern that remains valid.
> > 4. Since shared locks are currently queued behind exclusive requests
> > when they cannot be immediately satisfied, it might be worth
> > reconsidering the way LWLockRelease works also. When we wake up the
> > queue we only wake the Shared requests that are adjacent to the head of
> > the queue. Instead we could wake *all* waiting Shared requestors.
> > e.g. with a lock queue like this:
> > (HEAD) S<-S<-X<-S<-X<-S<-X<-S
> > Currently we would wake the 1st and 2nd waiters only.
> > If we were to wake the 3rd, 5th and 7th waiters also, then the queue
> > would reduce in length very quickly, if we assume generally uniform
> > service times. (If the head of the queue is X, then we wake only that
> > one process and I'm not proposing we change that). That would mean queue
> > jumping right? Well thats what already happens in other circumstances,
> > so there cannot be anything intrinsically wrong with allowing it, the
> > only question is: would it help?
> > We need not wake the whole queue, there may be some generally more
> > beneficial heuristic. The reason for considering this is not to speed up
> > Shared requests but to reduce the queue length and thus the waiting time
> > for the Xclusive requestors. Each time a Shared request is dequeued, we
> > effectively re-enable queue jumping, so a Shared request arriving during
> > that point will actually jump ahead of Shared requests that were unlucky
> > enough to arrive while an Exclusive lock was held. Worse than that, the
> > new incoming Shared requests exacerbate the starvation, so the more
> > non-adjacent groups of Shared lock requests there are in the queue, the
> > worse the starvation of the exclusive requestors becomes. We are
> > effectively randomly starving some shared locks as well as exclusive
> > locks in the current scheme, based upon the state of the lock when they
> > make their request. The situation is worst when the lock is heavily
> > contended and the workload has a 50/50 mix of shared/exclusive requests,
> > e.g. serializable transactions or transactions with lots of
> > subtransactions.
> > Goal: Reduce the total time that CommitTransaction() waits for
> > GetSnapshotData()
> > 5. Reduce the time that GetSnapshotData holds ProcArray lock. To do
> > this, we split the ProcArrayLock into multiple partitions (as suggested
> > by Alvaro). There are comments in GetNewTransactionId() about having one
> > spinlock per ProcArray entry. This would be too many and we could reduce
> > contention by having one lock for each N ProcArray entries. Since we
> > don't see too much contention with 100 users (default) it would seem
> > sensible to make N ~ 120. Striped or contiguous? If we stripe the lock
> > partitions then we will need multiple partitions however many users we
> > have connected, whereas using contiguous ranges would allow one lock for
> > low numbers of users and yet enough locks for higher numbers of users.
> > 6. Reduce the number of times ProcArrayLock is called in Exclusive mode.
> > To do this, optimise group commit so that all of the actions for
> > multiple transactions are executed together: flushing WAL, updating CLOG
> > and updating ProcArray, whenever it is appropriate to do so. There's no
> > point in having a group commit facility that optimises just one of those
> > contention points when all 3 need to be considered. That needs to be
> > done as part of a general overhaul of group commit. This would include
> > making TransactionLogMultiUpdate() take CLogControlLock once for each
> > page that it needs to access, which would also reduce contention from
> > TransactionIdCommitTree().
> > (1) and (2) can be patched fairly easily for 8.3. I have a prototype
> > patch for (1) on the shelf already from 6 months ago.
> > (3), (4) and (5) seem like changes that would require significant
> > testing time to ensure we did it correctly, even though the patches
> > might be fairly small. I'm thinking this is probably an 8.4 change, but
> > I can get test versions out fairly quickly I think.
> > (6) seems definitely an 8.4 change.
> Simon Riggs
> 2ndQuadrant http://www.2ndQuadrant.com
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
+ If your life is a hard drive, Christ can be your backup. +
In response to
pgsql-hackers by date
|Next:||From: Tom Lane||Date: 2008-03-12 01:01:32|
|Subject: Re: BUG #4027: backslash escaping not disabled in plpgsql |
|Previous:||From: Greg Sabino Mullane||Date: 2008-03-12 00:08:22|
|Subject: Re: Autovacuum vs statement_timeout|