Re: Sorting writes during checkpoint

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Sorting writes during checkpoint
Date: 2008-05-05 05:37:28
Message-ID: Pine.GSO.4.64.0805050118001.24473@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Mon, 5 May 2008, Tom Lane wrote:

> It bothers me a bit that the patch forces writes to be done "all of file
> A in order, then all of file B in order, etc". We don't know enough
> about the disk layout of the files to be sure that that's good. (This
> might also mean that whether there is a win is going to be platform and
> filesystem dependent ...)

I think most platform and filesystem implementations have disk location
correlated enough with block order that this particular issue isn't a
large one. If the writes are mainly going to one logical area (a single
partition or disk array), it should be a win as long as the sorting step
itself isn't introducing a delay. I am concered that in a more
complicated case than pgbench, where the writes are spread across multiple
arrays say, that forcing writes in order may slow things down.

Example: let's say there's two tablespaces mapped to two arrays, A and B,
that the data is being written to at checkpoint time. In the current
case, that I/O might be AABAABABBBAB, which is going to keep both arrays
busy writing. The sorted case would instead make that AAAAAABBBBBB so
only one array will be active at a time. It may very well be the case
that the improvement from lowering seeks on the writes to A and B is less
than the loss coming from not keeping both continuously busy.

I think I can simulate this by using a modified pgbench script that works
against an accounts1 and accounts2 with equal frequency, where 1&2 are
actually on different tablespaces on two disks.

> Right, that's in the ground rules for commitfests: if the submitter can
> respond to complaints before the fest is over, we'll reconsider the
> patch.

The small optimization I was trying to suggest was that you just bounce
this type of patch automatically to the "rejected for <x>" section of the
commitfest wiki page in cases like these. The standard practice on this
sort of queue is to automatically reclassify when someone has made a pass
over the patch, leaving the original source to re-open with more
information. That keeps the unprocessed part of the queue always
shrinking, and as long as people know that they can get it reconsidered by
submitting new results it's not unfair to them.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zeugswetter Andreas OSB sIT 2008-05-05 10:38:54 Re: statement timeout vs dump/restore
Previous Message Tom Lane 2008-05-05 04:23:55 Re: Sorting writes during checkpoint

Browse pgsql-patches by date

  From Date Subject
Next Message Alvaro Herrera 2008-05-05 15:09:32 Re: configure option for XLOG_BLCKSZ
Previous Message Tom Lane 2008-05-05 04:23:55 Re: Sorting writes during checkpoint