Re: Group commit, revised

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Peter Geoghegan <peter(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Group commit, revised
Date: 2012-01-26 02:10:47
Message-ID: CA+TgmoaPyQKEaoFz8HkDGvRDbOmRpkGo69zjODB5=7Jh3hbPQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 25, 2012 at 3:11 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> I've been thinking, what exactly is the important part of this group commit
> patch that gives the benefit? Keeping the queue sorted isn't all that
> important - XLogFlush() requests for commits will come in almost the correct
> order anyway.
>
> I also don't much like the division of labour between groupcommit.c and
> xlog.c. XLogFlush() calls GroupCommitWaitForLSN(), which calls back
> DoXLogFlush(), which is a bit hard to follow. (that's in my latest version;
> the original patch had similar division but it also cut across processes,
> with the wal writer actually doing the flush)
>
> It occurs to me that the WALWriteLock already provides much of the same
> machinery we're trying to build here. It provides FIFO-style queuing, with
> the capability to wake up the next process or processes in the queue.
> Perhaps all we need is some new primitive to LWLock, to make it do exactly
> what we need.
>
> Attached is a patch to do that. It adds a new mode to
> LWLockConditionalAcquire(), LW_EXCLUSIVE_BUT_WAIT. If the lock is free, it
> is acquired and the function returns immediately. However, unlike normal
> LWLockConditionalAcquire(), if the lock is not free it goes to sleep until
> it is released. But unlike normal LWLockAcquire(), when the lock is
> released, the function returns *without* acquiring the lock.
>
> I modified XLogFlush() to use that new function for WALWriteLock. It tries
> to get WALWriteLock, but if it's not immediately free, it waits until it is
> released. Then, before trying to acquire the lock again, it rechecks
> LogwrtResult to see if the record was already flushed by the process that
> released the lock.
>
> This is a much smaller patch than the group commit patch, and in my
> pgbench-tools tests on my laptop, it has the same effect on performance. The
> downside is that it touches lwlock.c, which is a change at a lower level.
> Robert's flexlocks patch probably would've been useful here.

I think you should break this off into a new function,
LWLockWaitUntilFree(), rather than treating it as a new LWLockMode.
Also, instead of adding lwWaitOnly, I would suggest that we generalize
lwWaiting and lwExclusive into a field lwWaitRequest, which can be set
to 1 for exclusive, 2 for shared, 3 for wait-for-free, etc. That way
we don't have to add another boolean every time someone invents a new
type of wait - not that that should hopefully happen very often. A
side benefit of this is that you can simplify the test in
LWLockRelease(): keep releasing waiters until you come to an exclusive
waiter, then stop. This possibly saves one shared memory fetch and
subsequent test instruction, which might not be trivial - all of this
code is extremely hot. We probably want to benchmark this carefully
to make sure that it doesn't cause a performance regression even just
from this:

+ if (!proc->lwWaitOnly)
+ lock->releaseOK = false;

I know it sounds crazy, but I'm not 100% sure that that additional
test there is cheap enough not to matter. Assuming it is, though, I
kind of like the concept: I think there must be other places where
these semantics are useful.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-01-26 02:25:33 Re: Should I implement DROP INDEX CONCURRENTLY?
Previous Message Robert Haas 2012-01-26 01:42:05 Re: show Heap Fetches in EXPLAIN for index-only scans