Re: improving concurrent transactin commit rate

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Sam Mason <sam(at)samason(dot)me(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: improving concurrent transactin commit rate
Date: 2009-03-25 03:23:36
Message-ID: alpine.GSO.2.01.0903242255570.16570@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 24 Mar 2009, Sam Mason wrote:

> The conceptual idea is to have at most one outstanding flush for the
> log going through the filesystem at any one time.

Quoting from src/backend/access/transam/xlog.c, inside XLogFlush:

"Since fsync is usually a horribly expensive operation, we try to
piggyback as much data as we can on each fsync: if we see any more data
entered into the xlog buffer, we'll write and fsync that too, so that the
final value of LogwrtResult.Flush is as large as possible. This gives us
some chance of avoiding another fsync immediately after."

The logic implementing that idea takes care of bunching up flushes for WAL
data that also happens to be ready to go at that point. You can see this
most easily by doing inserts into a system that's limited by a slow fsync,
like a single disk without write cache where you're bound by RPM speed.
If you have, say, a 7200RPM disk, no one client can commit faster than 120
times/second. But if you have 10 clients all pushing small inserts in,
it's fairly easy to see >500 transactions/second, because a bunch of
commits will get batched up during the time the last fsync is waiting for
the disk to finish.

The other idea you'll already find implemented in there is controlled by
commit_delay. If there are more than commit_siblings worth of open
transactions at the point where a commit is supposed to happen, that will
pause commit_delay microseconds in hopes that other transactions will jump
onboard via the mechanism described above. In practice, it's very hard to
tune that usefully. You can use it to help bunch together commits a bit
better into bigger batches on a really busy system (where not having more
than one commit ready is unexpected), it's not much help outside of that
context.

Check out the rest of the comments in xlog.c, there's a lot in there
that's not really covered in the README. If you turn on WAL_DEBUG and
XLOG_DEBUG you can actually watch some of this happen. I found time spent
reading the source to that file and src/backend/storage/buffer/bufmgr.c to
be really well spent, some of the most interesting parts of the codebase
to understand from a low-level performance tuning perspective are in those
two.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2009-03-25 06:29:21 New trigger option of pg_standby
Previous Message Greg Stark 2009-03-25 02:21:25 Re: improving concurrent transactin commit rate