Quick Links

Re: BUG #7801: Streaming failover checkpoints much slower than master, pg_xlog space problems during db load

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	briank(at)openroadtech(dot)com
Cc:	pgsql-bugs(at)postgresql(dot)org
Subject:	Re: BUG #7801: Streaming failover checkpoints much slower than master, pg_xlog space problems during db load
Date:	2013-01-08 19:48:31
Message-ID:	CA+U5nMKdf7odZzYNnoRkkCZmJpGEy=OQbU9Nan_zva_Rtzi2vw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On 8 January 2013 19:24, <briank(at)openroadtech(dot)com> wrote:

> Simply stated, pg_xlog grows out of control on a streaming-replication
> backup server with a high volume of writes on the master server. This occurs
> only with checkpoint_completion_target>0 and very large (eg. 8GB)
> shared_buffers. pg_xlog on the master stays a fixed size (1.2G for me).

All of this appears to be working as designed.

It will issue a restartpoint every checkpoint_timeout seconds on the standby.

checkpoint_segments is ignored on standby.

> Detail: I have two new/fast servers using streaming replication, with 9.2.2
> from the PostgreSQL yum repository. The servers are connected via 10G
> network, so the failover receives data as fast as it is created.
>
> It appears that to trigger this problem you need large shared_buffers (mine
> is set to 8GB) and checkpoint_completion_target > 0. (.8 or .9 will do). I
> have checkpoint_timeout=5min. (though setting this to 30s *does not*
> alleviate the problem). The failover machine doesn't seem to be concerned
> with checkpoint_timeout or checkpoint_segments.
>
> When doing large writes (like pg_restore) the failover machine's pg_xlog
> does not clear files. The reason appears to be that it is not
> checkpointing(verified by log_checkpoints=on and examining the log). If I
> run 'checkpoint;' manually on the failover in psql(hot_standby=on), it
> immediately cleans up pg_xlog(after a pause to actually perform the
> checkpoint).
>
> To generate the data,you can just run
> pgbench -i -s1000
> on the master server, then watch pg_xlog grow on the failover.
>
> This produces a ~10GB of data, then creates indices. If I run du on the
> failover after the data is loaded, I see:
>
> # du -s base pg_xlog | sort -n
> 10928276 pg_xlog
> 13144536 base
>
> I have a 10GB pg_xlog. The load completes in less than 5 minutes, which may
> be relevant here, since eventually the failover will perform a checkpoint.
> On the primary server, pg_xlog caps at 1.2GB.
>
> If checkpoint_completion_target=0, or shared_buffers=256MB, pg_xlog grows to
> around 1GB and stays there.
>
> This guess may be off target, but it appears that the database only
> checkpoints every approximately checkpoint_completion_target*
> checkpoint_timeout seconds. So, for a .9 completion target and the default
> 5min timeout, the failover will go for 4+ minutes without executing a single
> checkpoint, regardless of how much data arrives (verified with
> log_checkpoints=on).
>
> All the while the master server is spitting out 'checkpoints are occurring
> too frequently' messages, which I assume is to be expected during a reload.
>
> This means that the failover machine needs 2x the database size to not crash
> during a database reload. We run VMs with limited disk allocation where
> this is not the case. We periodically need to dump/restore, and I'm
> concerned this simple operation will crash our failover machines.
>
> I can avoid the problem by setting checkpoint_completion_target=0, which I
> will do for now.
>
> Here are settings that differ from the default initdb install:
>
> shared_buffers = 8GB
> work_mem = 64MB
> maintenance_work_mem = 256MB
> max_stack_depth = 8MB
> wal_level = hot_standby
> synchronous_commit = off
> wal_buffers = 64MB
> checkpoint_segments = 10
> checkpoint_completion_target = 0
> max_wal_senders = 2
> hot_standby = on
> effective_cache_size = 8GB
> log_checkpoints = on

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

BUG #7801: Streaming failover checkpoints much slower than master, pg_xlog space problems during db load at 2013-01-08 19:24:01 from briank

Responses

Re: BUG #7801: Streaming failover checkpoints much slower than master, pg_xlog space problems during db load at 2013-01-08 21:36:21 from Krznarich, Brian

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Scott Mead	2013-01-08 19:50:27	Re: BUG #7800: Welcome email with login ifnormation NOT received
Previous Message	Scott Mead	2013-01-08 19:47:41	Re: BUG #7800: Welcome email with login ifnormation NOT received