Analysis of ganged WAL writes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Curtis Faith <curtis(at)galtair(dot)com>, Hannu Krosing <hannu(at)tm(dot)ee>, Pgsql-Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Analysis of ganged WAL writes
Date: 2002-10-06 00:16:19
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I do not think the situation for ganging of multiple commit-record
writes is quite as dire as has been painted. There is a simple error
in the current code that is easily corrected: in XLogFlush(), the
wait to acquire WALWriteLock should occur before, not after, we try
to acquire WALInsertLock and advance our local copy of the write
request pointer. (To be exact, xlog.c lines 1255-1269 in CVS tip
ought to be moved down to before line 1275, inside the "if" that
tests whether we are going to call XLogWrite.)

Given that change, what will happen during heavy commit activity
is like this:

1. Transaction A is ready to commit. It calls XLogInsert to insert
its commit record into the WAL buffers (thereby transiently acquiring
WALInsertLock) and then it calls XLogFlush to write and sync the
log through the commit record. XLogFlush acquires WALWriteLock and
begins issuing the needed I/O request(s).

2. Transaction B is ready to commit. It gets through XLogInsert
and then blocks on WALWriteLock inside XLogFlush.

3. Transactions C, D, E likewise insert their commit records
and then block on WALWriteLock.

4. Eventually, transaction A finishes its I/O, advances the "known
flushed" pointer past its own commit record, and releases the

5. Transaction B now acquires WALWriteLock. Given the code change I
recommend, it will choose to flush the WAL *through the last queued
commit record as of this instant*, not the WAL endpoint as of when it
started to wait. Therefore, this WAL write will handle all of the
so-far-queued commits.

6. More transactions F, G, H, ... arrive to be committed. They will
likewise insert their COMMIT records into the buffer and block on

7. When B finishes its write and releases WALWriteLock, it will have
set the "known flushed" pointer past E's commit record. Therefore,
transactions C, D, E will fall through their tests without calling
XLogWrite at all. When F gets the lock, it will conclude that it
should write the data queued up to that time, and so it will handle
the commit records for G, H, etc. (The fact that lwlock.c will release
waiters in order of arrival is important here --- we want C, D, E to
get out of the queue before F decides it needs to write.)

It seems to me that this behavior will provide fairly effective
ganging of COMMIT flushes under load. And it's self-tuning; no need
to fiddle with weird parameters like commit_siblings. We automatically
gang as many COMMITs as arrive during the time it takes to write and
flush the previous gang of COMMITs.


regards, tom lane


Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2002-10-06 00:25:35 Re: New lock types
Previous Message Alvaro Herrera 2002-10-05 23:53:51 Re: New lock types