Group Commit

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Group Commit
Date: 2007-03-26 10:39:16
Message-ID: 4607A2D4.5090005@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

It's been known for years that commit_delay isn't very good at giving us
group commit behavior. I did some experiments with this simple test
case: "BEGIN; INSERT INTO test VALUES (1); COMMIT;", with different
numbers of concurrent clients and with and without commit_delay.

Summary for the impatient:
1. Current behavior sucks.
2. commit_delay doesn't help with # of clients < ~10. It does help with
higher numbers, but it still sucks.
3. I'm working on a patch.

I added logging to show how many commit records are flushed on each
fsync. The output with otherwise unpatched PG head looks like this, with
5 clients:

LOG: Flushed 4 out of 5 commits
LOG: Flushed 1 out of 5 commits
LOG: Flushed 4 out of 5 commits
LOG: Flushed 1 out of 5 commits
LOG: Flushed 4 out of 5 commits
LOG: Flushed 1 out of 5 commits
LOG: Flushed 4 out of 5 commits
LOG: Flushed 1 out of 5 commits
LOG: Flushed 3 out of 5 commits
LOG: Flushed 2 out of 5 commits
LOG: Flushed 3 out of 5 commits
LOG: Flushed 2 out of 5 commits
LOG: Flushed 3 out of 5 commits
LOG: Flushed 2 out of 5 commits
LOG: Flushed 3 out of 5 commits
...

Here's what's happening:

1. Client 1 issues fsync (A)
2. Clients 2-5 write their commit record, and try to fsync, but they
have to wait for fsync (A) to finish.
3. fsync (A) finishes, freeing client 1.
4. One of clients 2-5 starts the next fsync (B), which will flush
commits of clients 2-5 to disk
5. Client 1 begins new transaction, inserts commit record and tries to
fsync. Needs to wait for previous fsync (B) to finish
6. fsync B finishes, freeing clients 2-5
7. Client 1 issues fsync (C)
8. ...

The 2-3-2-3 pattern can be explained with similar unfortunate
"resonance", but with two clients instead of client 1 in the above
possibly running in separate cores (test was run on a dual-core laptop).

I also draw a diagram illustrating the above, attached.

I wrote a quick & dirty patch for this that I'm going to refine further,
but wanted to get the results out for others to look at first. I'm not
posting the patch yet, but it basically adds some synchronization to the
WAL flushes. It introduces a counter of inserted but not yet flushed
commit records. Instead of the commit_delay, the counter is checked. If
it's smaller than NBackends, the process waits until count reaches
NBackends, or a timeout expires. There's two significant differences to
commit_delay here:
1. Instead of waiting for commit_delay to expire, processes are woken
and fsync is started immediately when we know there's no more commit
records coming that we should wait for. Even though commit_delay is
given in microseconds, the real granularity of the wait can be as high
as 10 ms, which is in the same ball park as the fsync itself.
2. commit_delay is not used when there's less than commit_siblings
non-idle backends in the system. With very short transactions, it's
worthwhile to wait even if that's the case, because a client can begin
and finish a transaction in much shorter time than it takes to fsync.
This is what makes the commit_delay to not work at all in my test case
with 2 clients.

Here's a spreadsheet with the results of the tests I ran:
http://community.enterprisedb.com/groupcommit-comparison.ods

It contains a graph that shows that the patch works very well for this
test case. It's not very good for real life as it is, though. An obvious
flaw is that if you have a longer-running transaction, effect 1. goes
away. Instead of waiting for NBackends commit records, we should try to
guess the number of transactions that are likely to finish in a
reasonably short time. I'm thinking of keeping a running average of
commits per second, or # of transactions that finish while an fsync is
taking place.

Any thoughts?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
image/gif 18.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2007-03-26 10:58:21 Re: tsearch2 regression test failures
Previous Message Tatsuo Ishii 2007-03-26 10:34:45 Re: Server-side support of all encodings