Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Fujii Masao'" <masao(dot)fujii(at)gmail(dot)com>, <hlinnaka(at)iki(dot)fi>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby
Date: 2012-09-14 03:21:38
Message-ID: 003b01cd9228$0dee98a0$29cbc9e0$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thursday, September 13, 2012 10:32 PM Fujii Masao wrote:
On Thu, Sep 13, 2012 at 9:21 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> On 12.09.2012 22:03, Fujii Masao wrote:
>>
>> On Wed, Sep 12, 2012 at 8:47 PM,<amit(dot)kapila(at)huawei(dot)com> wrote:
>>>
>>> The following bug has been logged on the website:
>>>
>>> Bug reference: 7533
>>> Logged by: Amit Kapila
>>> Email address: amit(dot)kapila(at)huawei(dot)com
>>> PostgreSQL version: 9.2.0
>>> Operating system: Suse
>>> Description:
>>>
>>> M host is primary, S host is standby and CS host is cascaded standby.
>>>
>

>> Hmm, I think the CheckRecoveryConsistency() call in the redo loop is
>> misplaced. It's called after we got a record from ReadRecord, but *before*
>> replaying it (rm_redo). Even if replaying record X makes the system
>> consistent, we won't check and notice that until we have fetched record X+1.
>> In this particular test case, record X is a shutdown checkpoint record, but
>> it could as well be a running-xacts record, or the record that reaches
>> minRecoveryPoint.
>
>> Does the problem go away if you just move the CheckRecoveryConsistency()
>> call *after* rm_redo (attached)?

> No, at least in my case. When recovery starts at shutdown checkpoint record and
> there is no record following the shutdown checkpoint, recovery gets in
> wait state
> before entering the main redo apply loop. That is, recovery starts waiting for
> new WAL record to arrive, in ReadRecord just before the redo loop. So moving
> the CheckRecoveryConsistency() call after rm_redo cannot fix the problem which
>I reported. To fix the problem, we need to make the recovery reach the
> consistent
> point before the redo loop, i.e., in the CheckRecoveryConsistency()
> just before the redo loop.

I think may be in that case we need both the fixes, as the problem I have reported can be fixed with Heikki's patch.

With Regards,
Amit Kapila.

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Sandeep Thakkar 2012-09-14 07:32:37 Re: initdb.exe changes --locale option
Previous Message yugandharhere 2012-09-14 01:43:38 BUG #7539: Result mismatch on Postgres 9.2.0