Re: restartpoints stop generating on streaming replication slave

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Mathieu Fenniak <mathieu(dot)fenniak(at)replicon(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: restartpoints stop generating on streaming replication slave
Date: 2012-08-22 14:10:24
Message-ID: CAHGQGwEMTn8gCFEr6iBwkSWF3jA_dWmt0FpzcJzEZ-egcBM4aw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 22, 2012 at 5:52 AM, Mathieu Fenniak
<mathieu(dot)fenniak(at)replicon(dot)com> wrote:
> Hi all,
>
> I've been investigating an issue with our PostgreSQL 9.1.1 (Linux x86-64
> CentOS 5.8) database where restartpoints suddenly stop being generated on
> the slave after working correctly for a week or two. The symptom of the
> problem is that the pg_xlog directory on the slave doesn't get cleaned up,
> and the log_checkpoints output (eg. restartpoint starting: time) stops
> appearing.
>
> I was able to extract a core dump of the bgwriter process while it was in
> BgWriterNap. I inspected ckpt_start_time and last_checkpoint_time;
> ckpt_start_time was 1345578533 (... 19:48:53 GMT) and last_checkpoint_time
> was 1345578248 (... 19:44:08 GMT). Based upon these values, I concluded
> that it's performing checkpoints but missing the "if (ckpt_performed)"
> condition (ie. CreateRestartPoint returns false); it's then setting
> last_checkpoint_time to now - 5 minutes + 15 seconds.
>
> There seems to be two causes of a false retval in CreateRestartPoint; the
> first is if !RecoveryInProgress(), and the second is if "the last checkpoint
> record we've replayed is already our last restartpoint". The first
> condition doesn't seem likely; does anyone know how we might be hitting the
> second condition? We have continuous traffic on the master server in the
> range of 1000 txn/s, and the slave seems to be completely up-to-date, so I
> don't understand how we could be hitting this condition.

To check whether you really hit either of the above two conditions, could you
set log_min_messages to DEBUG2 on the standby? If you hit either, you'll
get the log message like "skipping restartpoint......".

Could you execute pg_controldata on both master and standby, and check
whether their "Latest checkpoint location" are the same?

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2012-08-22 14:11:50 Re: multi-master pgbench?
Previous Message Amit Kapila 2012-08-22 14:08:33 Re: [WIP] Performance Improvement by reducing WAL for Update Operation