Re: patch proposal

From: Venkata B Nagothi <nag1010(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch proposal
Date: 2016-10-26 03:32:44
Message-ID: CAEyp7J-dcGAqpxtDfFzV+JjKu96BiUEwGLf1o_1gSAQZy6kJNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached is the patch which introduces the new parameter
"recovery_target_incomplete" -

Currently this patch addresses two scenarios -

*Scenario 1 :*

Provide options to DBA when the recovery target is not reached and has
stopped mid-way due to unavailability of WALs

When performing recovery to a specific recovery target,
"recovery_target_incomplete" parameter can be used to either promote, pause
or shutdown when recovery does not reach the specified recovery target and
reaches end-of-the-wal.

*Scenario 2 :*

Generates Errors, Hints when the specified recovery target is prior to the
backup's current position (xid, time and lsn). This behaviour is integrated
with the parameters "recovery_target_time","recovery_target_lsn" and
"recovery_target_xid".

I would like to know if the approach this patch incorporates sounds ok ?

My other comments are inline

On Mon, Aug 29, 2016 at 3:17 PM, Venkata B Nagothi <nag1010(at)gmail(dot)com>
wrote:

>
> On Fri, Aug 26, 2016 at 10:58 PM, Stephen Frost <sfrost(at)snowman(dot)net>
> wrote:
>
>> * Venkata B Nagothi (nag1010(at)gmail(dot)com) wrote:
>> > On Fri, Aug 26, 2016 at 12:30 PM, Stephen Frost <sfrost(at)snowman(dot)net>
>> wrote:
>> > > * Venkata B Nagothi (nag1010(at)gmail(dot)com) wrote:
>> > > > This behaviour will be similar to that of
>> recovery_target="immediate" and
>> > > > can be aliased.
>> > >
>> > > I don't believe we're really going at this the right way. Clearly,
>> > > there will be cases where we'd like promotion at the end of the WAL
>> > > stream (as we currently have) even if the recovery point is not found,
>> > > but if the new option's "promote" is the same as "immediate" then we
>> > > don't have that.
>> >
>> > My apologies for the confusion. Correction - I meant, "promote" option
>> > would promote the database once the consistent point is reached at the
>> > end-of-the-WAL.
>>
>> "consistent point" and "end-of-the-WAL" are not the same.
>>
>> > > recovery_target = immediate, other
>> > >
>> > > recovery_action_target_found = promote, pause, shutdown
>> >
>> > This is currently supported by the existing parameter called
>> > "recovery_target_action"
>>
>> Right, sure, we could possibly use that, or create an alias, etc.
>>
>> > > recovery_action_target_not_found = promote, pause, shutdown
>> >
>> > This is exactly what newly proposed parameter will do.
>>
>> Then it isn't the same as 'recovery_target = immediate'.
>>
>> > > One question to ask is if we need to support an option for xid and
>> time
>> > > related to when we realize that we won't find the recovery target. If
>> > > consistency is reached at a time which is later than the recovery
>> target
>> > > for time, what then? Do we go through the rest of the WAL and perform
>> > > the "not found" action at the end of the WAL stream? If we use that
>> > > approach, then at least all of the recovery target types are handled
>> the
>> > > same, but I can definitely see cases where an administrator might
>> prefer
>> > > an "error" option.
>> >
>> > Currently, this situation is handled by recovery_target_inclusive
>> > parameter.
>>
>> No, that's not the same.
>>
>> > Administrator can choose to stop the recovery at any consistent
>> > point before or after the specified recovery target. Is this what you
>> were
>> > referring to ?
>>
>> Not quite.
>>
>> > I might need a bit of clarification here, the parameter i am proposing
>> > would be effective only if the specified recovery target is not reached
>> and
>> > may not be effective if the recovery goes beyond recovery target point.
>>
>> Ok, let's consider this scenario:
>>
>> Backup started at: 4/22D3B8E0, time: 2016-08-26 08:29:08
>> Backup completed (consistency) at: 4/22D3B8EF, 2016-08-26 08:30:00
>> recovery_target is not set
>> recovery_target_time = 2016-08-26 08:29:30
>> recovery_target_inclusive = false
>>
>> In such a case, we will necessairly commit transactions which happened
>> between 8:29:30 and 8:30:00 and only stop (I believe..) once we've
>> reached consistency. We could possibly detect that case and throw an
>> error instead. The same could happen with xid.
>>
>
> Yes, currently, PG just recovers until the consistency is reached. It
> makes sense to throw an error instead.
>

This has not been addressed yet. Currently, the patch only considers
generating an error if the recovery target position is prior the current
backup's position and this is irrespective of weather backup_label is
present or not.
I will share my updates on how this can be addressed.

>
>
>> Working through more scenarios would be useful, I believe. For example,
>> what if the xid or time specified happened before the backup started?
>>
>> Basically, what I was trying to get at is that we might be able to
>> detect that we'll never find a given recovery target point without
>> actually replaying all of WAL and wondering if we should provide options
>> to control what to do in such a case.
>>
>
> Ah, Yes, I got the point and I agree. Thanks for the clarification.
>
> Yes, good idea and it is important to ensure PG errors out and warn the
> administrator with appropriate message indicating that... "a much earlier
> backup must be used..."
> if the specified recovery target xid, name or time is earlier than the
> backup time.
>

This scenario has been addressed in this patch.

I have a question regarding the following comment -

/*
* Override any inconsistent requests. Not that this is a change of
* behaviour in 9.5; prior to this we simply ignored a request to
pause if
* hot_standby = off, which was surprising behaviour.
*/
if (recoveryTargetAction == RECOVERY_TARGET_ACTION_PAUSE &&
recoveryTargetActionSet &&
!EnableHotStandby)
recoveryTargetAction = RECOVERY_TARGET_ACTION_SHUTDOWN;

Because of which, option "pause" is ignored and the database is shutdown
instead.

The "pause" request is ignored for recovery_target_action parameter and
that is because of the following lines in recoveryPausesHere() function ?
am i getting it right ?

/* Don't pause unless users can connect! */
if (!LocalHotStandbyActive)
return;

Not sure why.

The parameter i am introducing faced the similar problem while testing and
i have introduced a new function called IncompleteRecoveryPause() without
the above condition which will pause the recovery at end-of-the-wal. Hope
that is OK ?
If this is ok, i can change the function name and use the similar function
for recovery_target_action='pause' as well.

Please note that documentation is still pending from my end.

This is my first patch to the community and i am very new the code base.
Please let me know if this patch needs any changes.

Regards,

Venkata B N

Database Consultant
Fujitsu Australia

Attachment Content-Type Size
recoveryTargetIncomplete.patch application/octet-stream 7.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Karl O. Pinc 2016-10-26 03:53:41 Re: Patch to implement pg_current_logfile() function
Previous Message Karl O. Pinc 2016-10-26 03:30:48 Re: Patch to implement pg_current_logfile() function