Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: hlinnaka(at)iki(dot)fi, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby
Date: 2012-09-15 06:10:54
Message-ID: CAHGQGwHWbQpsQ7k=x8x=7gYuU1rOvRNdtxtdpx9+XONpMA48ww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Sep 14, 2012 at 12:21 PM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
> On Thursday, September 13, 2012 10:32 PM Fujii Masao wrote:
> On Thu, Sep 13, 2012 at 9:21 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> On 12.09.2012 22:03, Fujii Masao wrote:
>>>
>>> On Wed, Sep 12, 2012 at 8:47 PM,<amit(dot)kapila(at)huawei(dot)com> wrote:
>>>>
>>>> The following bug has been logged on the website:
>>>>
>>>> Bug reference: 7533
>>>> Logged by: Amit Kapila
>>>> Email address: amit(dot)kapila(at)huawei(dot)com
>>>> PostgreSQL version: 9.2.0
>>>> Operating system: Suse
>>>> Description:
>>>>
>>>> M host is primary, S host is standby and CS host is cascaded standby.
>>>>
>>
>
>
>>> Hmm, I think the CheckRecoveryConsistency() call in the redo loop is
>>> misplaced. It's called after we got a record from ReadRecord, but *before*
>>> replaying it (rm_redo). Even if replaying record X makes the system
>>> consistent, we won't check and notice that until we have fetched record X+1.
>>> In this particular test case, record X is a shutdown checkpoint record, but
>>> it could as well be a running-xacts record, or the record that reaches
>>> minRecoveryPoint.
>>
>>> Does the problem go away if you just move the CheckRecoveryConsistency()
>>> call *after* rm_redo (attached)?
>
>> No, at least in my case. When recovery starts at shutdown checkpoint record and
>> there is no record following the shutdown checkpoint, recovery gets in
>> wait state
>> before entering the main redo apply loop. That is, recovery starts waiting for
>> new WAL record to arrive, in ReadRecord just before the redo loop. So moving
>> the CheckRecoveryConsistency() call after rm_redo cannot fix the problem which
>>I reported. To fix the problem, we need to make the recovery reach the
>> consistent
>> point before the redo loop, i.e., in the CheckRecoveryConsistency()
>> just before the redo loop.
>
> I think may be in that case we need both the fixes, as the problem I have reported can be fixed with Heikki's patch.

Agreed. And we should just add the CheckRecoveryConsistency() call after rm_redo
rather than moving it, as you suggested upthread.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit kapila 2012-09-15 07:26:05 Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Previous Message Fujii Masao 2012-09-15 05:57:12 Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown