Re: [PATCHES] Load distributed checkpoint patch

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Takayuki Tsunakawa" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Load distributed checkpoint patch
Date: 2006-12-22 01:07:52
Message-ID: 458ADB87.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

>>> On Wed, Dec 20, 2006 at 6:05 AM, in message
<03be01c7242f$2b4ce130$19527c0a(at)OPERAO>, "Takayuki Tsunakawa"
<tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com> wrote:
>
> I consider that smoothing the load (more meaningfully, response
time)
> has higher priority over checkpoint punctuality in a practical
sense,
> because the users of a system benefit from good steady response and
> give good reputation to the system.

I agree with that.

> If the checkpoint processing is
> not punctual, crash recovery would take longer time. But which
would
> you give higher priority, the unlikely event (=crash of the system)
or
> likely event (=peek hours of the system)? I believe the latter
should
> be regarded.

I'm still with you here.

> The system can write dirty buffers after the peek hours
> pass.

I don't see that in our busiest environment.

We have 3,000 "directly connected" users, various business partner
interfaces, and public web entry doing OLTP in 72 databases distributed
around the state, with real-time replication to central databases which
are considered derived copies. If all the pages modified on the central
databases were held in buffers or cache until after peak hours, query
performance would suffer -- assuming it would all even fit in cache. We
must have a way for dirty pages to be written under load while
responding to hundreds of thousands of queries per hour without
disturbing "freezes" during checkpoints.

On top of that, we monitor database requests on the source machines,
and during "idle time" we synchronize the data with all of the targets
to identify, log, and correct "drift". So even if we could shift all
our disk writes to the end of the day, that would have its own down
side, in extending our synchronization cycle.

I raise this only to be sure that such environments are considered with
these changes, not to discourage improvements in the checkpoint
techniques. We have effectively eliminated checkpoint problems in our
environment with a combination of battery backed controller cache and
aggressive background writer configuration. When you have a patch which
seems to help those who still have problems, I'll try to get time
approved to run a transaction replication stream onto one of our servers
(in "catch up mode") while we do a web "stress test" by playing back
requests from our production log. That should indicate how the patch
will affect us.

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2006-12-22 01:48:49 Re: column ordering, was Re: [PATCHES] Enums patch v2
Previous Message Takayuki Tsunakawa 2006-12-22 00:24:11 Re: Load distributed checkpoint

Browse pgsql-patches by date

  From Date Subject
Next Message Gregory Stark 2006-12-22 01:48:49 Re: column ordering, was Re: [PATCHES] Enums patch v2
Previous Message Takayuki Tsunakawa 2006-12-22 00:24:11 Re: Load distributed checkpoint