Re: Load distributed checkpoint

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgreSQL(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Subject: Re: Load distributed checkpoint
Date: 2006-12-27 22:54:57
Message-ID: 1167260097.3633.60.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Wed, 2006-12-27 at 23:26 +0100, Martijn van Oosterhout wrote:
> On Wed, Dec 27, 2006 at 09:24:06PM +0000, Simon Riggs wrote:
> > On Fri, 2006-12-22 at 13:53 -0500, Bruce Momjian wrote:
> >
> > > I assume other kernels have similar I/O smoothing, so that data sent to
> > > the kernel via write() gets to disk within 30 seconds.
> > >
> > > I assume write() is not our checkpoint performance problem, but the
> > > transfer to disk via fsync().
> >
> > Well, its correct to say that the transfer to disk is the source of the
> > problem, but that doesn't only occur when we fsync(). There are actually
> > two disk storms that occur, because of the way the fs cache works. [Ron
> > referred to this effect uplist]
>
> As someone looking from the outside:
>
> fsync only works on one file, so presumably the checkpoint process is
> opening each file one by one and fsyncing them.

Yes

> Does that make any
> difference here? Could you adjust the timing here?

Thats the hard bit with io storm 2. When you fsync a file you don't
actually know how many blocks you're writing, plus there's no way to
slow down those writes by putting delays between them (although its
possible your controller might know how to do this, I've never heard of
one that does).

If we put a delay after each fsync, that will space out all the ones
that don't need spacing out and do nothing to the ones that most need
it. Unfortunately.

IMHO there isn't any simple scheme that works all the time, for all OS
settings, default configurations and mechanisms.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-12-27 23:05:07 Re: TupleDescs and refcounts and such, again
Previous Message Andrew Dunstan 2006-12-27 22:53:28 Re: pg_hba.conf hostname todo

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2006-12-27 23:19:31 Re: [BUGS] BUG #2846: inconsistent and confusing handling of underflows,
Previous Message Roman Kononov 2006-12-27 22:48:34 Re: [BUGS] BUG #2846: inconsistent and confusing handling of underflows,