On 16 January 2012 08:11, Heikki Linnakangas
> Impressive results. How about uploading the PDF to the community wiki?
Sure. http://wiki.postgresql.org/wiki/Group_commit .
> I think it might be simpler if it wasn't the background writer that's
> responsible for "driving" the group commit queue, but the backends
> themselves. When a flush request comes in, you join the queue, and if
> someone else is already doing the flush, sleep until the driver wakes you
> up. If no-one is doing the flush yet (ie. the queue is empty), start doing
> it yourself. You'll need a state variable to keep track who's driving the
> queue, but otherwise I think it would be simpler as there would be no
> dependency on WAL writer.
I think this replaces one problem with another. You've now effectively
elevated a nominated backend to the status of an auxiliary process -
do you intend to have the postmaster look after it, as with any other
auxiliary process? I'm not sure that that is a more difficult problem
to solve, but I suspect so. At least my proposal can have any one of
the backends, both currently participating in group commit and yet to,
wake up the WAL Writer.
> I tend think of the group commit facility as a bus. Passengers can hop on
> board at any time, and they take turns on who drives the bus. When the first
> passengers hops in, there is no driver so he takes the driver seat. When the
> next passenger hops in, he sees that someone is driving the bus already, so
> he sits down, and places a big sign on his forehead stating the bus stop
> where he wants to get off, and goes to sleep. When the driver has reached
> his own bus stop, he wakes up all the passengers who wanted to get off at
> the same stop or any of the earlier stops . He also wakes up the
> passenger whose bus stop is the farthest from the current stop, and gets off
> the bus. The woken-up passengers who have already reached their stops can
> immediately get off the bus, and the one who has not notices that no-one is
> driving the bus anymore, so he takes the driver seat.
>  in a real bus, a passenger would not be happy if he's woken up too late
> and finds himself at the next stop instead of the one where he wanted to go,
> but for group commit, that is fine.
> In this arrangement, you could use the per-process semaphore for the
> sleep/wakeups, instead of latches. I'm not sure if there's any difference,
> but semaphores are more tried and tested, at least.
Yes, and I expect that this won't be the last time someone uses a bus
analogy in relation to this!
The proposed patch is heavily based on sync rep, which I'd have
imagined was more tried and tested than any proposed completely
alternative implementation, as it is basically a generalisation of
exactly the same principle, WAL Writer changes notwithstanding. I
would have imagined that that aspect would be particularly approved
> wal_writer_delay is still needed for controlling how often asynchronous
> commits are flushed to disk.
That had occurred to me of course, but has anyone ever actually
tweaked wal_writer_delay with adjusting the behaviour of asynchronous
commits in mind? I'm pretty sure that the answer is no. I have a
slight preference for obsoleting it as a consequence of introducing
group commit, but I don't think that it matters that much.
>> Auxiliary processes cannot use group commit. The changes made prevent
>> them from availing of commit_siblings/commit_delay parallelism,
>> because it doesn't exist anymore.
> Auxiliary processes have PGPROC entries too. Why can't they participate?
It was deemed to be a poor design decision to effectively create a
dependency on the WAL Writer among other auxiliary processes, as to do
so would perhaps compromise the way in which the postmaster notices
and corrects isolated failures. Maybe I'll revisit that assessment,
but I am not convinced that it's worth the very careful analysis of
the implications of such an unprecedented dependency, without there
being some obvious advantage. It it's a question of their being
deprived of commit_siblings "group commit", well, we know from
experience that people didn't tend to touch it a whole lot anyway.
>> Group commit is sometimes throttled, which seems appropriate - if a
>> backend requests that the WAL Writer flush an LSN deemed too far from
>> the known flushed point, that request is rejected and the backend goes
>> through another path, where XLogWrite() is called.
> Hmm, if the backend doing the big flush gets the WALWriteLock before a bunch
> of group committers, the group committers will have to wait until the big
> flush is finished, anyway. I presume the idea of the throttling is to avoid
> the situation where a bunch of small commits need to wait for a huge flush
> to finish.
Exactly. Of course, you're never going to see that situation with
pgbench. I don't have much data to inform exactly what the right
trade-off is here, or some generic approximation of it across
platforms and hardware - other people will know more about this than I
do. While I have a general sense that the cost of flushing a single
page of data is the same as flushing a relatively much larger amount
of data, I cannot speak to much of an understanding of what that trade
off might be for larger amounts of data, where the question of
modelling some trade-off between throughput and latency arises,
especially with all the baggage that the implementation carries such
as whether or not we're using full_page_writes, hardware and so on.
Something simple will probably work well.
> Perhaps the big flusher should always join the queue, but use some heuristic
> to first flush up to the previous commit request, to wake up others quickly,
> and do another flush to flush its own request after that.
Maybe, but we should decide what a big flusher looks like first. That
way, if we can't figure out a way to do what you describe with it in
time for 9.2, we can at least do what I'm already doing.
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
In response to
pgsql-hackers by date
|Next:||From: Peter Geoghegan||Date: 2012-01-17 14:46:07|
|Subject: Re: xlog location arithmetic|
|Previous:||From: Andres Freund||Date: 2012-01-17 14:04:47|
|Subject: Re: 9.3 feature proposal: vacuumdb -j #|