Quick Links

Re: Group commit, revised

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <peter(at)2ndquadrant(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Group commit, revised
Date:	2012-01-30 23:35:53
Message-ID:	CA+U5nM+Yj7scbELbftqPi=Zn1Q6SDM+PDgM0npkiRRrc_tS-xg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Jan 30, 2012 at 8:04 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> So, what's the approach you're working on?

I've had a few days leave at end of last week, so no time to fully
discuss the next steps with the patch. That's why you were requested
not to commit anything.

You've suggested there was no reason to have the WALwriter be
involved, which isn't the case and made other comments about latches
that weren't correct also.

The plan here is to allow WAL flush and clog updates to occur
concurrently. Which allows the clog contention and update time to be
completely hidden behind the wait for the WAL flush. That is only
possible if we have the WALwriter involved since we need two processes
to be actively involved.

It's a relatively minor change and uses code that is already committed
and working, not some just invented low level stuff that might not
work right. You might then ask, why the delay? Just simply because my
absence has prevented moving forwards. We'll have a patch tomorrow.

The theory behind this is clear, but needs some explanation.

There are 5 actions that need to occur at commit
1) insert WAL record
2) optionally flush WAL record
3) mark the clog AND set LSN from (1) if we skipped (2)
4) optionally wait for sync rep
5) remove the proc from the procarray

Dependencies between those actions are these
Step (3) must always happen before (5) otherwise we get race
conditions in visibility checking.
Step (4) must always happen before (5) otherwise we also get race
conditions in disaster cases.
Step (1) must always happen before (2) if it happens
Step (1) must always happen before (3) if we skipped (2)

Notice that step (2) and step (3) are actually independent of each other.

So an improved design for commit is to
2) request flush up to LSN, but don't wait
3) mark the clog and set LSN
4) wait for LSN once, either for walwriter or walsender to release us

This is free of race conditions as long as step (3) marks each clog
page with a valid LSN, just as we would do for asynchronous commit.

Marking the clog with an LSN ensures that we issue XLogFlush(LSN) on
the clog page before it is written, so we always get WAL flushed to
the desired LSN before the clog mark appears on disk.

Does this cause any other behaviour? No, because the LSN marked on the
clog is always flushed by the time we hit step (5), so there is no
delay in any hint bit setting, or any other effect.

So step (2) requests the flush, which is then performed by WALwriter.
Backend then performs (3) while the flush takes place and then waits
at step (4) to be woken

We only wait once in step 4, rather than waiting for flush at step (2)
and then waiting again at step (4).

So we use the existing code path for TransactionIdAsyncCommitTree()
yet we wait at step (4) using the SyncRep code.

Step 5 happens last, as always.

There are two benefits to this approach
* The clog update happens "for free" since it is hidden behind a
longer running task
* The implementation uses already tested and robust code for SyncRep
and AsyncCommit

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Re: Group commit, revised at 2012-01-30 20:04:24 from Heikki Linnakangas

Responses

Re: Group commit, revised at 2012-01-31 07:43:30 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Noah Misch	2012-01-30 23:48:47	Re: foreign key locks, 2nd attempt
Previous Message	Soules, Craig	2012-01-30 23:04:02	Issues with C++ exception handling in an FDW