Re: Skip checkpoint on promoting from streaming replication

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, masao(dot)fujii(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Skip checkpoint on promoting from streaming replication
Date: 2013-01-24 17:44:26
Message-ID: CA+U5nMLfun0KcusiEn7iB1o-HYcGQ8B0VG0HE7U9byJtf9r6Sg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 24 January 2013 16:52, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
> On 24.01.2013 18:24, Simon Riggs wrote:
>>
>> On 6 January 2013 21:58, Simon Riggs<simon(at)2ndquadrant(dot)com> wrote:
>>>
>>> I've been torn between the need to remove the checkpoint for speed and
>>>
>>> being worried about the implications of doing so.
>>>
>>> We promote in multiple use cases. When we end a PITR, or are
>>> performing a switchover, it doesn't really matter how long the
>>> shutdown checkpoint takes, so I'm inclined to leave it there in those
>>> cases. For failover, we need fast promotion.
>>>
>>> So my thinking is to make pg_ctl promote -m fast
>>> be the way to initiate a fast failover that skips the shutdown
>>> checkpoint.
>>>
>>> That way all existing applications work the same as before, while new
>>> users that explicitly choose to do so will gain from the new option.
>>
>>
>> Here's a patch to skip checkpoint when we do
>>
>> pg_ctl promote -m fast
>>
>> We keep the end of recovery checkpoint in all other cases.
>
>
> Hmm, there seems to be no way to do a "fast" promotion with a trigger file.

True. I thought we were moving away from trigger files to use of "promote"

> I'm a bit confused why there needs to be special mode for this. Can't we
> just always do the "fast" promotion? I agree that there's no urgency when
> you're doing PITR, but shouldn't do any harm either. Or perhaps always do
> "fast" promotion when starting up from standby mode, and "slow" otherwise.
>
> Are we comfortable enough with this to skip the checkpoint after crash
> recovery?

I'm not. Maybe if we get no bugs we can make it do this always, in next release.

It;s fast when it needs to be and safe otherwise.

> I may be missing something, but it looks like after a "fast" promotion, you
> don't request a new checkpoint. So it can take quite a while for the next
> checkpoint to be triggered by checkpoint_timeout/segments. That shouldn't be
> a problem, but I feel that it'd be prudent to request a new checkpoint
> immediately (not necessarily an "immediate" checkpoint, though).

I thought of that and there is a long comment to explain why I didn't.

Two problems:

1) an immediate checkpoint can cause a disk/resource usage spike,
which is definitely not what you need just when a spike of connections
and new SQL hits the system.

2) If we did that, we would have an EndOfRecovery record, some other
records and then a Shutdown checkpoint.
As I right this, (2) is wrong, because we shouldn't do a a Shutdown
checkpoint anyway.

But I still think (1) is a valid concern.

>> The only thing left from Kyotaro's patch is a single line of code -
>> the call to ReadCheckpointRecord() that checks to see if the WAL
>> records for the last two restartpoints is on disk, which was an
>> important line of code.
>
>
> Why's that important, just for paranoia? If the last two restartpoints have
> disappeared, something's seriously wrong, and you will be in trouble e.g if
> you crash at that point. Do we need to be extra paranoid when doing a "fast"
> promotion?

The check is cheap, so what do we gain by skipping the check?

>> Patch implements a new record type XLOG_END_OF_RECOVERY that behaves
>> on replay like a shutdown checkpoint record. I put this back in from
>> my patch because I believe its important that we have a clear place
>> where the WAL history changes timelineId. WAL format change bump
>> required.
>
>
> Agreed, such a WAL record is essential.
>
> At replay, an end-of-recovery record should be a signal to the hot standby
> mechanism that there are no transactions running in the master at that
> point, same as a shutdown checkpoint.

I had a reason why I didn't do that, but it seems to have slipped my mind.

If I can't remember, I'll add it.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2013-01-24 17:51:46 LATERAL, UNNEST and spec compliance
Previous Message Andres Freund 2013-01-24 17:39:33 Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]