Re: Load distributed checkpoint

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Subject: Re: Load distributed checkpoint
Date: 2006-12-27 04:10:10
Message-ID: 200612270410.kBR4AA725876@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

ITAGAKI Takahiro wrote:
>
> Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
> > I assume write() is not our checkpoint performance problem, but the
> > transfer to disk via fsync(). Perhaps a simple solution is to do the
> > write()'s of all dirty buffers as we do now at checkpoint time, but
> > delay 30 seconds and then do fsync() on all the files.
>
> I think there are two platforms that have different problems in checkpoints.
> It's in fsync() on one platform, and in write() on another. It is complex
> depending on OS, the amount of memory, disks, writeback-cache and so on.
>
> > I think the basic difference between this and the proposed patch is that
> > we do not put delays in the buffer write() or fsync() phases --- we just
> > put a delay _between_ the phases, and wait for the kernel to smooth it
> > out for us. The kernel certainly knows more about what needs to get to
> > disk, so it seems logical to let it do the I/O smoothing.
>
> Both proposals do not conflict each other. Also, solutions for either
> platform do not have bad effect on the other platform. Can we employ
> both of them?
>
> I tested your proposal but it did not work on write-critical machine.
> However, if the idea works well on BSD or some platforms, we would be
> better off buying it.
>
> [pgbench results]
> ...
> 566.973777
> 327.158222 <- (1) write()
> 560.773868 <- (2) sleep
> 544.106645 <- (3) fsync()

OK, so you are saying that performance dropped only during the write(),
and not during the fsync()? Interesting. I would like to know the
results of a few tests just like you reported them above:

1a) write spread out over 30 seconds
1b) write with no delay

2a) sleep(0)
2b) sleep(30)

3) fsync

I would like to know the performance at each stage for each combination,
e.g. when using 1b, 2a, 3, performance during the write() phase was X,
during the sleep it was Y, and during the fsync it was Z. (Of course,
sleep(0) has no stage timing.)

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2006-12-27 04:12:05 Re: [HACKERS] Patch(es) to expose n_live_tuples and
Previous Message Tom Lane 2006-12-27 03:43:27 Re: Patch(es) to expose n_live_tuples and

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2006-12-27 04:12:05 Re: [HACKERS] Patch(es) to expose n_live_tuples and
Previous Message Tom Lane 2006-12-27 03:43:27 Re: Patch(es) to expose n_live_tuples and