improving concurrent transactin commit rate

From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: improving concurrent transactin commit rate
Date: 2009-03-24 23:52:42
Message-ID: 20090324235242.GO32672@frubble.xen.chris-lamb.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I had an idea while going home last night and still can't think why it's
not implemented already as it seems obvious.

The conceptual idea is to have at most one outstanding flush for the
log going through the filesystem at any one time. The effect, as far
as I can think through, would be to trade latency for bandwidth. In
commit heavy situations you're almost always going to be starved for
rotational latency with the log while the full bandwidth of the log
device is rarely going to be much of a problem.

I don't understand PG well enough to know if/how this could be
implemented; I've had a look through transam/xlog.c and sort of
understand what's going on but will have missed all the subtleties of
its operation. So, please take what I say below with a little salt!

The way I'm imagining it working is as follows; when a flush gets issued
the code does:

global Lock l;
global int writtento = 0, flushedto = 0;
/* where are we known to have written data up to currently */
writtento = max(writtento,myrecord);
/* try and acquire the flush lock */
if (!conditionalacquire (l)) {
/* lock already taken, block ourself until they finish by acquiring it */
acquire (lock);
/* if somebody "later" in the queue got unblocked then their flush is OK for us and we're winning */
if (myrecord <= flushedto) {
goto out;
}
}
/* flush needed, record the latest write's position in the queue */
local int curat = writtento;
/* actually perform the flush */
fdatasync (log_fd);
/* record where we're done flushing to so others can finish early */
flushedto = curat;
out:
/* send the next process off */
release (l);

To simplify; I've assumed that access to globals is always atomic,
locking would obviously need to be different in a real implementation.

In the case of a single client the performance hit is going to be in
a disk flush anyway; as this is likely to be a somewhat expensive
operation I'm hoping that taking a lock here isn't going to matter
much. Two clients is going to be worse (I think) as it's going to wait
for the first client to finish flushing before sending the second flush
request off. Three clients and more will be a win; the two clients will
wait while the first flush completes and then they'll both flush at the
same time. This would appear to speed things up by n-2 times where n is
the number of clients waiting to commit.

What have I missed?

If this has been explored in the literature I'd appreciate any pointers;
I had a search but couldn't find anything---I'm not sure what the
terminology would be for this sort of thing anyway.

--
Sam http://samason.me.uk/

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2009-03-24 23:55:47 Re: GIN fast insert
Previous Message Brent Wood 2009-03-24 23:22:06 Re: Proper entry of polygon type data