Re: Sorting writes during checkpoint

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Sorting writes during checkpoint
Date: 2008-07-04 08:37:10
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, 2008-05-05 at 00:23 -0400, Tom Lane wrote:
> Greg Smith <gsmith(at)gregsmith(dot)com> writes:
> > On Sun, 4 May 2008, Tom Lane wrote:
> >> Well, I tried a pgbench test similar to that one --- on smaller hardware
> >> than was reported, so it was a bit smaller test case, but it should have
> >> given similar results.
> > ... If
> > you're not offloading to another device like that, the OS-level elevator
> > sorting will handle sorting for you close enough to optimally that I doubt
> > this will help much (and in fact may just get in the way).
> Yeah. It bothers me a bit that the patch forces writes to be done "all
> of file A in order, then all of file B in order, etc". We don't know
> enough about the disk layout of the files to be sure that that's good.
> (This might also mean that whether there is a win is going to be
> platform and filesystem dependent ...)

No action on this seen since last commitfest, but I think we should do
something with it, rather than just ignore it.

Agree with all comments myself, so proposed solution is to implement
this as an I/O elevator hook. Standard elevator is to issue them in
order as they come, additional elevator in contrib is file/block sorted.
That will make testing easier and will also give Itagaki his benefit,
while allowing on-going research. If this solution's good enough for
Linux it ought to be good enough for us.

Note that if we do this for checkpoint we should also do this for
FlushRelationBuffers(), used during heap_sync(), for exactly the same

Would suggest calling it bulk_io_hook() or similar.

Further observation would be that if there was an effect then it would
be at the block-device level, i.e. tablespace. Sorting the writes so
that we issued one tablespace at a time might at least help the I/O
elevators/disk caches to work with the whole problem at once. We might
get benefit on one tablespace but not on another.

Sorting by file might have inadvertently shown benefit at the tablespace
level on a larger server with spread out data whereas on Tom's test
system I would guess just a single tablespace was used.

Anyway, I note that we don't have an easy way of sorting by tablespace,
but I'm sure it would be possible to look up the tablespace for a file
within a plugin.

Simon Riggs
PostgreSQL Training, Services and Support

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Teodor Sigaev 2008-07-04 08:41:26 Re: [PATCHES] Multi-column GIN
Previous Message Heikki Linnakangas 2008-07-04 08:26:01 Multi-column GIN

Browse pgsql-patches by date

  From Date Subject
Next Message Zdenek Kotala 2008-07-04 08:38:04 Re: page macros cleanup
Previous Message Heikki Linnakangas 2008-07-04 08:26:01 Multi-column GIN