Re: Spread checkpoint sync

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Spread checkpoint sync
Date: 2011-01-15 20:53:18
Message-ID: 1295124798.3282.5.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 2011-01-15 at 09:15 -0500, Robert Haas wrote:
> On Sat, Jan 15, 2011 at 8:55 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > On Sat, 2011-01-15 at 05:47 -0500, Greg Smith wrote:
> >> Robert Haas wrote:
> >> > On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> >> >
> >> > > One of the ideas Simon and I had been considering at one point was adding
> >> > > some better de-duplication logic to the fsync absorb code, which I'm
> >> > > reminded by the pattern here might be helpful independently of other
> >> > > improvements.
> >> > >
> >> >
> >> > Hopefully I'm not stepping on any toes here, but I thought this was an
> >> > awfully good idea and had a chance to take a look at how hard it would
> >> > be today while en route from point A to point B. The answer turned
> >> > out to be "not very", so PFA a patch that seems to work. I tested it
> >> > by attaching gdb to the background writer while running pgbench, and
> >> > it eliminate the backend fsyncs without even breaking a sweat.
> >> >
> >>
> >> No toe damage, this is great, I hadn't gotten to coding for this angle
> >> yet at all. Suffering from an overload of ideas and (mostly wasted)
> >> test data, so thanks for exploring this concept and proving it works.
> >
> > No toe damage either, but are we sure we want the de-duplication logic
> > and in this place?
> >
> > I was originally of the opinion that de-duplicating the list would save
> > time in the bgwriter, but that guess was wrong by about two orders of
> > magnitude, IIRC. The extra time in the bgwriter wasn't even noticeable.
>
> Well, the point of this is not to save time in the bgwriter - I'm not
> surprised to hear that wasn't noticeable. The point is that when the
> fsync request queue fills up, backends start performing an fsync *for
> every block they write*, and that's about as bad for performance as
> it's possible to be. So it's worth going to a little bit of trouble
> to try to make sure it doesn't happen. It didn't happen *terribly*
> frequently before, but it does seem to be common enough to worry about
> - e.g. on one occasion, I was able to reproduce it just by running
> pgbench -i -s 25 or something like that on a laptop.
>
> With this patch applied, there's no performance impact vs. current
> code in the very, very common case where space remains in the queue -
> 999 times out of 1000, writing to the fsync queue will be just as fast
> as ever. But in the unusual case where the queue has been filled up,
> compacting the queue is much much faster than performing an fsync, and
> the best part is that the compaction is generally massive. I was
> seeing things like "4096 entries compressed to 14". So clearly even
> if the compaction took as long as the fsync itself it would be worth
> it, because the next 4000+ guys who come along again go through the
> fast path. But in fact I think it's much faster than an fsync.
>
> In order to get pathological behavior even with this patch applied,
> you'd need to have NBuffers pending fsync requests and they'd all have
> to be different. I don't think that's theoretically impossible, but
> Greg's research seems to indicate that even on busy systems we don't
> come even a little bit close to the circumstances that would cause it
> to occur in practice. Every other change we might make in this area
> will further improve this case, too: for example, doing an absorb
> after each fsync would presumably help, as would the more drastic step
> of splitting the bgwriter into two background processes (one to do
> background page cleaning, and the other to do checkpoints, for
> example). But even without those sorts of changes, I think this is
> enough to effectively eliminate the full fsync queue problem in
> practice, which seems worth doing independently of anything else.

You've persuaded me.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2011-01-15 21:07:56 Include WAL in base backup
Previous Message Tom Lane 2011-01-15 20:37:44 Re: LOCK for non-tables