Re: Load distributed checkpoint

From: "Inaam Rana" <inaamrana(at)gmail(dot)com>
To: "Ron Mayer" <rm_pg(at)cheapcomplexdevices(dot)com>
Cc: "Takayuki Tsunakawa" <tunakawa(at)soft(dot)fujitsu(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Load distributed checkpoint
Date: 2006-12-08 12:17:37
Message-ID: 833c669b0612080417r5c3fefbaja6b63857c8ce2890@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On 12/7/06, Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com> wrote:
>
> Takayuki Tsunakawa wrote:
> > Hello, Itagaki-san
> >> Checkpoint consists of the following four steps, and the major
> >> performance
> >> problem is 2nd step. All dirty buffers are written without interval
> >> in it.
> >> 1. Query information (REDO pointer, next XID etc.)
> >> 2. Write dirty pages in buffer pool
> >> 3. Flush all modified files
> >> 4. Update control file
> >
> > Hmm. Isn't it possible that step 3 affects the performance greatly?
> > I'm sorry if you have already identified step 2 as disturbing
> > backends.
> >
> > As you know, PostgreSQL does not transfer the data to disk when
> > write()ing. Actual transfer occurs when fsync()ing at checkpoints,
> > unless the filesystem cache runs short. So, disk is overworked at
> > fsync()s.
>
> It seems to me that virtual memory settings of the OS will determine
> if step 2 or step 3 causes much of the actual disk I/O.
>
> In particular, on Linux, things like /proc/sys/vm/dirty_expire_centisecs

dirty_expire_centisecs will have little, if any, effect on a box with
consistent workload. Under uniform load bgwriter will keep pushing the
buffers to fs cache which will result in eviction/flushing of pages to disk.
That the pages will age quickly can lower the cap of dirty pages but it
won't/can't handle sudden spike at checkpoint time.

and dirty_writeback_centisecs

Again on a system that encounters IO chokes on checkpoints pdflush is
presumably working like crazy at that time. Reducing the gap between its
wakeup calls will have probably very little impact on the checkpoint
performance.

and possibly dirty_background_ratio

I have seen this to put a real cap on number of dirty pages during normal
running. As regards checkpoints, this again seems to have little effect.

The problem while dealing with checkpoints is that we are dealing with two
starkly different type of IO loads. The larger the number of shared_buffers
the greater the spike in IO activity at checkpoint. AFAICS no specific vm
tunables can smooth out checkpoint spikes by itself. There has to be some
intelligence in the bgwriter to even the load out.

would affect this. If those numbers are high, ISTM most write()s
> from step 2 would wait for the flush in step 3. If I understand
> correctly, if the dirty_expire_centisecs number is low, most write()s
> from step 2 would happen before step 3 because of the pdflush daemons.
> I expect other OS's would have different but similar knobs to tune this.
>
> It seems to me that the most portable way postgresql could force
> the I/O to be balanced would be to insert otherwise unnecessary
> fsync()s into step 2; but that it might (not sure why) be better
> to handle this through OS-specific tuning outside of postgres.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Aaron Bono 2006-12-08 14:09:10 Re: [HACKERS] Case Preservation disregarding case
Previous Message Martijn van Oosterhout 2006-12-08 11:05:45 Re: EXPLAIN ANALYZE

Browse pgsql-patches by date

  From Date Subject
Next Message Kevin Grittner 2006-12-08 15:26:27 Re: Load distributed checkpoint
Previous Message Martijn van Oosterhout 2006-12-08 10:43:59 Re: Load distributed checkpoint