Re: Group commit, revised

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Group commit, revised
Date: 2012-01-21 03:13:56
Message-ID: CAEYLb_XfjM2qQtdo9oRHJ+SnTaBteevTq=v6guFRRDffE1Fh6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 20 January 2012 22:30, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Maybe we should have a heuristic to split a large flush into smaller chunks.
> The WAL segment boundary would be a quite natural split point, for example,
> because when crossing the file boundary you have to issue separate fsync()s
> for the files anyway. But I'm happy with leaving that out for now, it's not
> any worse than it used to be without group commit anyway.

Let's quantify how much of a problem that is first.

> The patch added some CHECK_FOR_INTERRUPTS() calls into various places in the
> commit/abort codepaths, but that's all dead code; they're all within a
> HOLD/RESUME_INTERRUPTS blocks.

Fair enough, but do you think it's acceptable to say "well, we can't
have errors within critical blocks anyway, and nor can the driver, so
just assume that the driver will successfully service the request"?

> Furthermore, the WAL writer didn't have a latch before this patch; it's not
> many lines of code to initialize the latch and set up the signal handler for
> it, but it already has a semaphore that's ready to use.

Uh, yes it did - WalWriterDelay is passed to WaitLatch(). It didn't
use the process latch as I did (which is initialised anyway), though I
believe it should have (on general principal, to avoid invalidation
issues when generic handlers are registered, plus because the process
latch is already initialised and available), which is why I changed
it. Whatever you do with group commit, you're going to want to look at
the changes made to the WAL Writer in my original patch outside of the
main loop, because there are one or two fixes for it included
(registering a usr1 signal handler and saving errno in existing
handlers), and because we need an alternative way of power saving if
you're not going to include the mechanism originally proposed - maybe
something similar to what has been done for the BGWriter in my patch
for that. At 5 wake-ups per second by default, the process is by a
wide margin the biggest culprit (except BGWriter, which is also 5 by
default, but that is addressed by my other patch that you're
reviewing). I want to fix that problem too, and possibly investigate
if there's something to be done about the checkpointer (though that
only has a 5 second timeout, so it's not a major concern). In any
case, we should encourage the idea that auxiliary processes will use
the proc latch, unless perhaps they only use a local latch like the
avlauncher does, imho.

Why did you remove the new assertions in unix_latch.c/win32_latch.c? I
think you should keep them, as well as my additional comments on latch
timeout invalidation issues in latch.h which are untouched in your
revision (though this looks to be a rough revision, so I shouldn't
read anything into that either way I suppose). In general, we should
try and use the process latch whenever we can.

> I wonder if we should rename the file into "xlogflush.c" or something like
> that, to make it clear that this works with any XLOG flushes, not just
> commits? Group commit is the usual term for this feature, so we should
> definitely mention that in the comments, but it might be better to rename
> the files/functions to emphasize that this is about WAL flushing in general.

Okay.

> This probably needs some further cleanup to fix things I've broken, and I
> haven't done any performance testing, but it's progress. Do you have a shell
> script or something that you used for the performance tests that I could
> run? Or would you like to re-run the tests you did earlier with this patch?

No, I'm using pgbench-tools, and there's no reason to think that you
couldn't get similar results on ordinary hardware, which is all I used
- obviously you'll want to make sure that you're using a file system
that supports granular fsyncs, like ext4. All of the details,
including the config for pgbench-tools, are in my original e-mail. I
have taken the time to re-run the benchmark and update the wiki with
that new information - I'd call it a draw.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2012-01-21 03:37:22 Re: Group commit, revised
Previous Message Peter Geoghegan 2012-01-21 02:16:34 Re: REVIEW: pg_stat_statements with query tree based normalization