Re: Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "Sander, Ingo (NSN - DE/Munich)" <ingo(dot)sander(at)nsn(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Streaming Replication: Checkpoint_segment and wal_keep_segments on standby
Date: 2010-05-31 09:37:18
Message-ID: 4C03834E.7090504@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 30/05/10 06:04, Fujii Masao wrote:
> On Fri, May 28, 2010 at 11:12 AM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Thu, May 27, 2010 at 11:13 PM, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>>>> I guess this happens because the frequency of checkpoint on the standby is
>>>> too lower than that on the master. In the master, checkpoint occurs for every
>>>> consumption of three segments because of "checkpoint_segments = 3". On the
>>>> other hand, in the standby, only checkpoint_timeout has effect, so checkpoint
>>>> occurs for every 30 minutes because of "checkpoint_timeout = 30min".
>>>>
>>>> The walreceiver should signal the bgwriter to start checkpoint if it has
>>>> received more than checkpoint_segments WAL files, like normal processing?
>>>
>>> Is this also an issue when using log shipping, or just with SR?
>>
>> When using log shipping, checkpoint_segments always doesn't trigger a
>> checkpoint. So recovery after the standby crashes might take unexpectedly
>> long since redo starting point might be old.
>>
>> But in file-based log shipping, since WAL files don't accumulate in
>> pg_xlog directory on the standby, even if the frequency of checkpoint
>> is very low, pg_xlog will not be filled with many WAL files. That
>> accumulation occurs only when using SR.
>>
>> If we should avoid low frequency of checkpoint itself rather than
>> accumulation of WAL files, the bgwriter instead of the walreceiver
>> should check if we've consumed too much WAL, I think. Thought?
>
> I attached the patch, which changes the startup process so that it signals
> bgwriter to perform a restartpoint if we've already replayed too much WAL
> files. This leads checkpoint_segments to trigger a restartpoint.

The central question is whether checkpoint_segments should trigger
restartpoints or not. When PITR and restartpoints were introduced, the
answer was "no", on the grounds that when you're doing recovery you're
presumably replaying the logs much faster than they were generated, and
you don't want to slow down the recovery by checkpointing too often.

Now that we have bgwriter active during recovery, and streaming
replication which retains the streamed WALs so that we now risk running
out of disk space with long checkpoint_timeout, it's time to reconsider
that.

I think we have three options:

1) Leave it as it is, checkpoint_segments doesn't do anything during
recovery/standby mode

2) Change it so that checkpoint_segments does take effect during
recover/standby

3) Change it so that checkpoint_segments takes effect during streaming
replication, but not during recovery otherwise

I'm leaning towards 3), it still seems reasonable to not slow down
recovery when recovering from archive, but the potential for out of disk
space warrants doing 3.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2010-05-31 10:03:44 Re: [COMMITTERS] pgsql: In walsender, don't sleep if there's outstanding WAL waiting to
Previous Message Simon Riggs 2010-05-31 08:33:16 Re: PG 9.0 release timetable