Re: Immediate standby promotion

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-09-01 11:14:41
Message-ID: CAHGQGwF02Jrsv1m=GOiNJCRykSi+xk9CcXdGTu=38uEU2VYTAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 1, 2014 at 3:23 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, Aug 14, 2014 at 1:49 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>
>> Hi,
>>
>> I'd like to propose to add new option "--immediate" to pg_ctl promote.
>> When this option is set, recovery ignores any WAL data which have not
>> been replayed yet and exits immediately. Patch attached.
>>
>> This promotion is faster than normal one but can cause data loss.
>

Thanks for reviewing the patch!

> I think there is one downside as well for this proposal that
> apart from data loss, it can lead to uncommitted data occupying
> space in database which needs to be later cleaned by vacuum.
> This can happen with non-immediate promote as well, but the
> chances with immediate are more. So the gain we got by doing
> immediate promotion can lead to slow down of operations in some
> cases. It might be useful if we mention this in docs.

Yep, the immediate promotion might be more likely to cause
the recovery to end before replaying WAL data of VACUUM. But, OTOH,
I think that the immediate promotion might be more likely to cause
the recovery to end before replaying WAL data which will generate
garbage data. So I'm not sure if it's worth adding that note to the doc.

>
> Few comments about patch:
>
> 1.
> On standby we will see below message:
>
> LOG: received promote request
>
> User will always see above message irrespective of whether it
> is immediate promote or any other mode of promote. I think it will
> be better to distinguish between different modes and display the
> appropriate message.

Agreed. So I'm thinking to change the code as follows.

if (immediate_promote)
ereport(LOG, (errmsg("received immediate promote request")));
else
ereport(LOG, (errmsg("received promote request")));

Or we should name the normal promotion?

>
> 2.
> StartupXLOG()
> {
> ..
> + if (immediate_promote)
> + break;
> ..
> }
>
> Why are you doing this check after pause
> (recoveryApplyDelay/recoveryPausesHere) for recovery?
>
> Why can't we do it after ReadRecord()?

We can do that check either after ReadRecord() or after pause.
I preferred to add the check after pause because immediate promotion
would be likely to be requested while recovery is being paused.
In this case, if we do that check after ReadRecord(), we need to read
one more WAL record that actually we don't need.

BTW, in the current patch, when immediate promotion is requested while
recovery is being paused, the recovery keeps being paused until it's
manually resumed. But immediate promotion should cause even paused
recovery to end immediately?

> 3.
> ! * of "promote" and "immediate_promote"
> shouldn't in above sentence 'or' is more appropriate?

Yep.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2014-09-01 11:15:39 postgres_fdw behaves oddly
Previous Message Sawada Masahiko 2014-09-01 11:13:32 Re: Concurrently option for reindexdb