Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: hlinnaka(at)iki(dot)fi
Cc: amit(dot)kapila(at)huawei(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby
Date: 2012-09-13 17:02:24
Message-ID: CAHGQGwG4VXyvgHtiepiJ=e89szESOva0k+SC-WE5Wnj3NoO7Pw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Sep 13, 2012 at 9:21 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> On 12.09.2012 22:03, Fujii Masao wrote:
>>
>> On Wed, Sep 12, 2012 at 8:47 PM,<amit(dot)kapila(at)huawei(dot)com> wrote:
>>>
>>> The following bug has been logged on the website:
>>>
>>> Bug reference: 7533
>>> Logged by: Amit Kapila
>>> Email address: amit(dot)kapila(at)huawei(dot)com
>>> PostgreSQL version: 9.2.0
>>> Operating system: Suse
>>> Description:
>>>
>>> M host is primary, S host is standby and CS host is cascaded standby.
>>>
>>> 1.Set up postgresql-9.2beta2/RC1 on all hosts.
>>> 2.Execute command initdb on host M to create fresh database.
>>> 3.Modify the configure file postgresql.conf on host M like this:
>>> listen_addresses = 'M'
>>> port = 15210
>>> wal_level = hot_standby
>>> max_wal_senders = 4
>>> hot_standby = on
>>> 4.modify the configure file pg_hba.conf on host M like this:
>>> host replication repl M/24 md5
>>> 5.Start the server on host M as primary.
>>> 6.Connect one client to primary server and create a user ‘repl’
>>> Create user repl superuser password '123';
>>> 7.Use the command pg_basebackup on the host S to retrieve database of
>>> primary host
>>> pg_basebackup -D /opt/t38917/data -F p -x fetch -c fast -l repl_backup
>>> -P
>>> -v -h M -p 15210 -U repl –W
>>> 8. Copy one recovery.conf.sample from share folder of package to database
>>> folder of the host S. Then rename this file to recovery.conf
>>> 9.Modify the file recovery.conf on host S as below:
>>> standby_mode = on
>>> primary_conninfo = 'host=M port=15210 user=repl
>>> password=123'
>>> 10. Modify the file postgresql.conf on host S as follow:
>>> listen_addresses = 'S'
>>> 11.Start the server on host S as standby server.
>>> 12.Use the command pg_basebackup on the host CS to retrieve database of
>>> standby host
>>> pg_basebackup -D /opt/t38917/data -F p -x fetch -c fast -l repl_backup
>>> -P
>>> -v -h M -p 15210 -U repl –W
>>> 13.Modify the file recovery.conf on host CS as below:
>>> standby_mode = on
>>> primary_conninfo = 'host=S port=15210 user=repl password=123'
>>> 14. Modify the file postgresql.conf on host S as follow:
>>> listen_addresses = 'CS'
>>> 15.Start the server on host CS as Cascaded standby server node.
>>> 16. Try to connect a client to host CS but it gives error as:
>>> FATAL: the database system is starting up
>>
>>
>> This procedures didn't reproduce the problem in HEAD. But when I restarted
>> the master server between the step 11 and 12, I was able to reproduce the
>> problem.
>>
>>> Observations related to bug
>>> ------------------------------
>>> In the above scenario it is observed that Start-up process has read all
>>> data
>>> (in our defect scenario minRecoveryPoint is 5016220) till the position
>>> 5016220 and then it goes and check for recovery consistency by following
>>> condition in function CheckRecoveryConsistency:
>>> if (!reachedConsistency&&
>>> XLByteLE(minRecoveryPoint, EndRecPtr)&&
>>> XLogRecPtrIsInvalid(ControlFile->backupStartPoint))
>>>
>>> At this point first two conditions are true but last condition is not
>>> true
>>> because still redo has not been applied and hence backupStartPoint has
>>> not
>>> been reset. So it does not signal postmaster regarding consistent stage.
>>> After this it goes and applies the redo and then reset backupStartPoint
>>> and
>>> then it goes to read next set of record. Since all records have been
>>> already
>>> read, so it starts waiting for the new record from the Standby node. But
>>> since there is no new record from Standby node coming so it keeps waiting
>>> for that and it does not get chance to recheck the recovery consistent
>>> level. And hence client connection does not get allowed.
>>
>>
>> If cascaded standby starts a recovery at a normal checkpoint record,
>> this problem will not happen. Because if wal_level is set to hot_standby,
>> XLOG_RUNNING_XACTS WAL record always follows after the normal
>> checkpont record. So while XLOG_RUNNING_XACTS record is being replayed,
>> ControlFile->backupStartPoint can be reset, and then cascaded standby
>> can pass through the consistency test.
>>
>> The problem happens when cascaded standby starts a recovery at a
>> shutdown checkpoint record. In this case, no WAL record might follow
>> the checkpoint one yet. So, after replaying the shutdown checkpoint
>> record, cascaded standby needs to wait for new WAL record to appear
>> before reaching the code block for resetting
>> ControlFile->backupStartPoint.
>> The cascaded standby cannot reach a consistent state and a client cannot
>> connect to the cascaded standby until new WAL has arrived.
>>
>> Attached patch will fix the problem. In this patch, if recovery is
>> beginning at a shutdown checkpoint record, any ControlFile fields
>> (like backupStartPoint) required for checking that an end-of-backup is
>> reached are not set at first. IOW, cascaded standby thinks that the
>> database is consistent from the beginning. This is safe because
>> a shutdown checkpoint record means that there is no running database
>> activity at that point and the database is in consistent state.
>
>
> Hmm, I think the CheckRecoveryConsistency() call in the redo loop is
> misplaced. It's called after we got a record from ReadRecord, but *before*
> replaying it (rm_redo). Even if replaying record X makes the system
> consistent, we won't check and notice that until we have fetched record X+1.
> In this particular test case, record X is a shutdown checkpoint record, but
> it could as well be a running-xacts record, or the record that reaches
> minRecoveryPoint.
>
> Does the problem go away if you just move the CheckRecoveryConsistency()
> call *after* rm_redo (attached)?

No, at least in my case. When recovery starts at shutdown checkpoint record and
there is no record following the shutdown checkpoint, recovery gets in
wait state
before entering the main redo apply loop. That is, recovery starts waiting for
new WAL record to arrive, in ReadRecord just before the redo loop. So moving
the CheckRecoveryConsistency() call after rm_redo cannot fix the problem which
I reported. To fix the problem, we need to make the recovery reach the
consistent
point before the redo loop, i.e., in the CheckRecoveryConsistency()
just before the
redo loop.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Fujii Masao 2012-09-13 17:27:52 Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Previous Message Dimitri Fontaine 2012-09-13 16:41:19 Re: BUG #6704: ALTER EXTENSION postgis SET SCHEMA leaves dangling relations