Quick Links

Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

From:	Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)
Date:	2012-05-30 01:29:19
Message-ID:	CAEYLb_XZQPVW+jVeLV26sPkQO6Z=7VezJC3SvyDkQ3=UFLWc4A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 29 May 2012 17:58, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, May 29, 2012 at 12:47 PM, Peter Geoghegan <peter(at)2ndquadrant(dot)com> wrote:
>> Why do you think that doing this for all XLogFlush() callsites might
>> be problematic?
>
> Well, consider the one in the background writer, for example. That's
> just a periodic flush, so I see no benefit in having it acquire the
> lock and then wait some more. It already did wait.

No benefit for the BGWriter, no, but a benefit for the cluster as a
whole, which can see not only an improvement in transaction
throughput, but also of average and worst-case latency. The whole idea
of a group commit feature is that one backend altruistically takes
other backends with it. Besides, for the background writer or
checkpointer, that operate in terms of minutes, seconds and
milliseconds, having their transaction block an extra 2 milliseconds
is hardly a real problem. Both the BGWriter and checkpointer do an
amount of work that is adaptive per cycle anyway.

> And what about the case where we're flushing while holding WALInsertLock because the
> buffer's full? Clearly waiting is useless in that case - nobody can
> join the group commit for exactly the same reason that we're doing the
> flush in the first place: no buffer space.

Maybe you could have the leader check that condition, and not wait
accordingly. If the buffer is full, chances are very high that there
already is a leader, either sleeping or following through with an
XLogWrite(), whose work is well underway.

The general solution that you've proposed - adding an actually_wait
bool parameter to XLogFlush() - won't work, unless, for example, the
"buffer is full" call happens to become the leader, which is generally
quite unlikely. Otherwise it's stuck waiting behind WALWriteLock along
with everyone else, on average for a duration of half (CommitDelay +
avg flushtime), while the leader sleeps and then works. If something
is only effective a small minority of the time, why bother? I doubt
that you can reasonably have the BGWriter or whatever cut in line
ahead of an ever-growing queue of backends calling XLogFlush(), if
that's what you're thinking - now all those backends have to wait an
additional period of up to however long the flush takes.

In any case, the standard for changing the exact behaviour of
commit_delay/commit_siblings ought to be quite low, because those
settings are already so heavily laden with problems that it is
somewhat doubtful that they've ever been used by someone in production
sensibly. Obviously having a delay that turns out to be in some way
suboptimal is obviously something I'd hope to avoid entirely, but I've
done a good job of considerably improving that situation. If we get
this into 9.2, perhaps a more adaptive implementation can follow in
9.3.

--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Re: Uh, I change my mind about commit_delay + commit_siblings (sort of) at 2012-05-29 16:58:30 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jeff Janes	2012-05-30 03:21:03	Re: hash index concurrency
Previous Message	Tatsuo Ishii	2012-05-30 00:58:16	Re: pg_dump and thousands of schemas