Re: Load distributed checkpoint

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Load distributed checkpoint
Date: 2006-12-07 16:03:05
Message-ID: 4577E6D9.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

>>> On Thu, Dec 7, 2006 at 12:05 AM, in message
<20061207144843(dot)6269(dot)ITAGAKI(dot)TAKAHIRO(at)oss(dot)ntt(dot)co(dot)jp>, ITAGAKI Takahiro
<itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:
>
> We offen encounters performance gap during checkpoint. The reason is write
> bursts. Storage devices are too overworked in checkpoint, so they can not
> supply usual transaction processing.

When we first switched our web site to PostgreSQL, this was one of our biggest problems. Queries which normally run in a few milliseconds were hitting the 20 second limit we impose in our web application. These were happening in bursts which suggested that they were caused by checkpoints. We adjusted the background writer configuration and nearly eliminated the problem.

bgwriter_all_maxpages | 600
bgwriter_all_percent | 10
bgwriter_delay | 200
bgwriter_lru_maxpages | 200
bgwriter_lru_percent | 20

Between the xfs caching and the batter backed cache in the RAID controller, the disk writes seemed to settle out pretty well.

> Checkpoint consists of the following four steps, and the major performance
> problem is 2nd step. All dirty buffers are written without interval in it.
>
> 1. Query information (REDO pointer, next XID etc.)
> 2. Write dirty pages in buffer pool
> 3. Flush all modified files
> 4. Update control file
>
> I suggested to write pages with sleeping in 2nd step, using normal activity
> of the background writer. It is something like cost- based vacuum delay.
> Background writer has two pointers, 'ALL' and 'LRU', indicating where to
> write out in buffer pool. We can wait for the ALL clock- hand going around
> to guarantee all pages to be written.
>
> Here is pseudo- code for the proposed method. The internal loop is just the
> same as bgwriter's activity.
>
> PrepareCheckPoint(); -- do step 1
> Reset num_of_scanned_pages by ALL activity;
> do {
> BgBufferSync(); -- do a part of step 2
> sleep(bgwriter_delay);
> } while (num_of_scanned_pages < shared_buffers);
> CreateCheckPoint(); -- do step 3 and 4

Would the background writer be disabled during this extended checkpoint? How is it better to concentrate step 2 in an extended checkpoint periodically rather than consistently in the background writer?

> We may accelerate background writer to reduce works at checkpoint instead of
> the method, but it introduces another performance problem; Extra pressure
> is always put on the storage devices to keep the number of dirty pages low.

Doesn't the file system caching logic combined with a battery backed cache in the controller cover this, or is your patch to help out those who don't have battery backed controller cache? What would the impact of your patch be on environments like ours? Will there be any affect on PITR techniques, in terms of how current the copied WAL files would be?

> I'm working about adjusting the progress of checkpoint to checkpoint timeout
> and wal segments limitation automatically to avoid overlap of two
> checkpoints.
> I'll post a patch sometime soon.
>
> Comments and suggestions welcome.
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2006-12-07 16:14:54 Re: old synchronized scan patch
Previous Message Andrew Dunstan 2006-12-07 12:25:12 Re: pgsql: Fix planning of SubLinks to ensure that

Browse pgsql-patches by date

  From Date Subject
Next Message Devrim GUNDUZ 2006-12-07 16:32:46 Re: 8.2.0 pdf
Previous Message Tom Lane 2006-12-07 15:38:10 Re: 8.2rc1 (much) slower than 8.2dev?