Re: [proposal] recovery_target "latest"

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: g(dot)smolkin(at)postgrespro(dot)ru
Cc: peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [proposal] recovery_target "latest"
Date: 2019-12-10 06:39:23
Message-ID: 20191210.153923.1215784230342894717.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the new version.

At Sun, 8 Dec 2019 04:03:01 +0300, Grigory Smolkin <g(dot)smolkin(at)postgrespro(dot)ru> wrote in
> On 11/21/19 1:46 PM, Peter Eisentraut wrote:
> > On 2019-11-08 05:00, Grigory Smolkin wrote:
> I`ve tested it and have some thoughts/concerns:
>
> 1. Recovery should report the exact reason why it has been forced to
> stop. In case of recovering to the end of WAL, standby promotion
> request received during recovery could be mistaken for reaching the
> end of WAL and reported as such. To avoid this, I think that
> reachedEndOfWal variable should be introduced.
>
> In attached patch it is added as a global variable, but maybe
> something more clever may be devised. I was not sure that
> reachedEndOfWal could be placed in XLogPageReadPrivate. Because we
> need to access it at the higher level than ReadRecord(), and I was
> under impression placing it in XLogPageReadPrivate could violate
> abstraction level of XLogPageReadPrivate.

CheckForStandbyTrigger() always returns true once the trigger is
pulled. We don't care whether end-of-WAL is reached if promote is
already triggered. Thus, we can tell the promote case by asking
CheckForStandbyTrigger() when we exited the redo main loop with
recoveryTarget = RECOVERY_TARGET_LATEST. Is this works as you expect?

> 2. During the testing, I`ve stumbled upon assertion failure in case of
> recovering in standby mode to the the end of WAL coupled with
> recovery_target_action as "promote", caused by the WAL source in state
> machine not been changed after reaching the recovery target (script to
> reproduce is attached):
...
> TRAP: FailedAssertion("StandbyMode", File: "xlog.c", Line: 12032)
...
> #2  0x0000000000a88b82 in ExceptionalCondition (conditionName=0xb24acc
> #"StandbyMode", errorType=0xb208a7 "FailedAssertion",
>     fileName=0xb208a0 "xlog.c", lineNumber=12032) at assert.c:67
> #3  0x0000000000573417 in WaitForWALToBecomeAvailable
> #(RecPtr=151003136, randAccess=true, fetching_ckpt=false,
> #tliRecPtr=167757424,
>     return_on_eow=true) at xlog.c:12032
...
> #7  0x00000000005651f8 in ReadRecord (xlogreader=0xf08ed8,
> #RecPtr=167757424, emode=22, fetching_ckpt=false) at xlog.c:4271
..

ReadRecord is called with currentSource=STREAM after StandbyMode was
turned off. I suppose the fix means the "currentSource =
XLOG_FROM_PG_WAL" line but I don't think it not the right way.

Streaming timeout means failure when return_on_eow is true. Thus the
right thing to do there is setting lastSourceFailed to true. The first
half of WaitForWALToBecomeAvailable handles failure of the current
source thus source transition happens only there. The second half just
reports failure to the first half.

> Both issues are fixed in the new patch version.
> Any review and thoughts on the matters would be much appreciated.
>
>
> >
> > I think The doc needs to exiplain on the difference between default
> > and latest.
> Sure, I will work on it.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message amul sul 2019-12-10 06:49:08 Re: [HACKERS] advanced partition matching algorithm for partition-wise join
Previous Message Pavel Stehule 2019-12-10 06:11:59 Re: proposal: minscale, rtrim, btrim functions for numeric