Re: [BUG?] lag of minRecoveryPont in archive recovery

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUG?] lag of minRecoveryPont in archive recovery
Date: 2012-12-09 15:36:31
Message-ID: CAHGQGwG4W5QZ7+LJimg8xxuevwz0bYniHmZLZmWf0j6kBiuRCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 6, 2012 at 8:39 PM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
> On Thursday, December 06, 2012 9:35 AM Kyotaro HORIGUCHI wrote:
>> Hello, I have a problem with PostgreSQL 9.2 with Pacemaker.
>>
>> HA standby sometime failes to start under normal operation.
>>
>> Testing with a bare replication pair showed that the standby failes
>> startup recovery under the operation sequence shown below. 9.3dev too,
>> but 9.1 does not have this problem. This problem became apparent by the
>> invalid-page check of xlog, but
>> 9.1 also has same glitch potentially.
>>
>> After the investigation, the lag of minRecoveryPoint behind EndRecPtr in
>> redo loop seems to be the cause. The lag brings about repetitive redoing
>> of unrepeatable xlog sequences such as XLOG_HEAP2_VISIBLE ->
>> SMGR_TRUNCATE on the same page. So I did the same aid work as
>> xact_redo_commit_internal for smgr_redo. While doing this, I noticed
>> that
>> CheckRecoveryConsistency() in redo apply loop should be after redoing
>> the record, so moved it.
>
> I think moving CheckRecoveryConsistency() after redo apply loop might cause
> a problem.
> As currently it is done before recoveryStopsHere() function, which can allow
> connections
> on HOTSTANDY. But now if due to some reason recovery pauses or stops due to
> above function,
> connections might not be allowed as CheckRecoveryConsistency() is not
> called.

Yes, so we should just add the CheckRecoveryConsistency() call after
rm_redo rather than moving it? This issue is related to the old discussion:
http://archives.postgresql.org/pgsql-bugs/2012-09/msg00101.php

Regards,

--
Fujii Masao

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2012-12-09 15:41:34 Re: [BUG?] lag of minRecoveryPont in archive recovery
Previous Message Magnus Hagander 2012-12-09 14:57:25 Re: pg_basebackup is taking backup of extra files inside a tablespace directory