Re: Sorted writes in checkpoint

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Greg Smith <gsmith(at)gregsmith(dot)com>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Subject: Re: Sorted writes in checkpoint
Date: 2008-03-11 20:05:01
Message-ID: 200803112005.m2BK51325629@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches


Added to TODO:

* Consider sorting writes during checkpoint

http://archives.postgresql.org/pgsql-hackers/2007-06/msg00541.php

---------------------------------------------------------------------------

ITAGAKI Takahiro wrote:
> Greg Smith <gsmith(at)gregsmith(dot)com> wrote:
>
> > On Mon, 11 Jun 2007, ITAGAKI Takahiro wrote:
> > > If the kernel can treat sequential writes better than random writes, is
> > > it worth sorting dirty buffers in block order per file at the start of
> > > checkpoints?
>
> I wrote and tested the attached sorted-writes patch base on Heikki's
> ldc-justwrites-1.patch. There was obvious performance win on OLTP workload.
>
> tests | pgbench | DBT-2 response time (avg/90%/max)
> ---------------------------+---------+-----------------------------------
> LDC only | 181 tps | 1.12 / 4.38 / 12.13 s
> + BM_CHECKPOINT_NEEDED(*) | 187 tps | 0.83 / 2.68 / 9.26 s
> + Sorted writes | 224 tps | 0.36 / 0.80 / 8.11 s
>
> (*) Don't write buffers that were dirtied after starting the checkpoint.
>
> machine : 2GB-ram, SCSI*4 RAID-5
> pgbench : -s400 -t40000 -c10 (about 5GB of database)
> DBT-2 : 60WH (about 6GB of database)
>
>
> > I think it has the potential to improve things. There are three obvious
> > and one subtle argument against it I can think of:
> >
> > 1) Extra complexity for something that may not help. This would need some
> > good, robust benchmarking improvements to justify its use.
>
> Exactly. I think we need a discussion board for I/O performance issues.
> Can I use Developers Wiki for this purpose? Since performance graphs and
> result tables are important for the discussion, so it might be better
> than mailing lists, that are text-based.
>
>
> > 2) Block number ordering may not reflect actual order on disk. While
> > true, it's got to be better correlated with it than writing at random.
> > 3) The OS disk elevator should be dealing with this issue, particularly
> > because it may really know the actual disk ordering.
>
> Yes, both are true. However, I think there is pretty high correlation
> in those orderings. In addition, we should use filesystem to assure
> those orderings correspond to each other. For example, pre-allocation
> of files might help us, as has often been discussed.
>
>
> > Here's the subtle thing: by writing in the same order the LRU scan occurs
> > in, you are writing dirty buffers in the optimal fashion to eliminate
> > client backend writes during BuferAlloc. This makes the checkpoint a
> > really effective LRU clearing mechanism. Writing in block order will
> > change that.
>
> The issue will probably go away after we have LDC, because it writes LRU
> buffers during checkpoints.
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center
>

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Sabino Mullane 2008-03-11 20:08:52 Re: Autovacuum vs statement_timeout
Previous Message Bruce Momjian 2008-03-11 19:48:32 Re: Command tags in create/drop scripts

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2008-03-11 20:28:49 Re: TransactionIdIsInProgress() cache
Previous Message Bruce Momjian 2008-03-11 19:58:48 Re: trace_checkpoint parameter patch