Re: BUG #7801: Streaming failover checkpoints much slower than master, pg_xlog space problems during db load

From: "Krznarich, Brian" <KrznarichBrian(at)bfusa(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #7801: Streaming failover checkpoints much slower than master, pg_xlog space problems during db load
Date: 2013-01-08 21:36:21
Message-ID: 4B3A2632C3BFC249BACF6888FE6D25A60144FA@EXCMBX02PAKR.bfusa.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 1/8/2013 2:48 PM, Simon Riggs wrote:
> On 8 January 2013 19:24, <briank(at)openroadtech(dot)com> wrote:
>
>> Simply stated, pg_xlog grows out of control on a streaming-replication
>> backup server with a high volume of writes on the master server. This occurs
>> only with checkpoint_completion_target>0 and very large (eg. 8GB)
>> shared_buffers. pg_xlog on the master stays a fixed size (1.2G for me).
> All of this appears to be working as designed.
>
> It will issue a restartpoint every checkpoint_timeout seconds on the standby.
>
> checkpoint_segments is ignored on standby.
The documentation does not seem to agree with the last point.
"In standby mode, a restartpoint is also triggered if
checkpoint_segments log segments have been replayed since last
restartpoint and at least one checkpoint record has been replayed."

This is precisely the problem. The failover should not go
checkpoint_timeout*checkpoint_completion_target seconds without
executing a restartpoint, in spite of the fact that thousands of WAL
segments are stacking up in pg_xlog.

With checkpoint_completion_target=0, the standby server will happily
execute restartpoints much faster than checkpoint_timeout if it is
necessary. Once checkpoint_completion_target>0, no attention is paid
to the backlog of WAL data.

I honestly do not understand postgresql well enough to understand why
large vs. small shared_buffers changes this behavior, but it does. If
shared_buffers is not extremely large, it seems postgresql is forced to
execute restartpoints more frequently?

In general it seems like it should be safe to use the same
postgresql.conf on the master and the standby server, but this would
clearly be an exception. One wouldn't expect a 10GB pg_xlog on a
standby where the master has no such problem.

Thank you for your assistance.

Brian

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Hari Babu 2013-01-09 04:02:58 Re: Review of "pg_basebackup and pg_receivexlog to use non-blocking socket communication", was: Re: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Previous Message Scott Mead 2013-01-08 19:50:27 Re: BUG #7800: Welcome email with login ifnormation NOT received