Re: Assertion failure when promoting node by deleting recovery.conf and restart node

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Assertion failure when promoting node by deleting recovery.conf and restart node
Date: 2013-05-19 16:35:22
Message-ID: CA+U5nM+O+30Y9=+e42dAB1Pef1WP8dSUsp_C41-sMVMe5F3NdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 25 March 2013 19:14, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
> On 15.03.2013 04:25, Michael Paquier wrote:
>>
>> Hi,
>>
>> When trying to *promote* a slave as master by removing recovery.conf and
>> restarting node, I found an assertion failure on master branch:
>> LOG: database system was shut down in recovery at 2013-03-15 10:22:27 JST
>> TRAP: FailedAssertion("!(ControlFile->minRecoveryPointTLI != 1)", File:
>> "xlog.c", Line: 4954)
>> (gdb) bt
>> #0 0x00007f95af03b2c5 in raise () from /usr/lib/libc.so.6
>> #1 0x00007f95af03c748 in abort () from /usr/lib/libc.so.6
>> #2 0x000000000086ce71 in ExceptionalCondition (conditionName=0x8f2af0
>> "!(ControlFile->minRecoveryPointTLI != 1)", errorType=0x8f0813
>> "FailedAssertion", fileName=0x8f076b "xlog.c",
>> lineNumber=4954) at assert.c:54
>> #3 0x00000000004fe499 in StartupXLOG () at xlog.c:4954
>> #4 0x00000000006f9d34 in StartupProcessMain () at startup.c:224
>> #5 0x000000000050ef92 in AuxiliaryProcessMain (argc=2,
>> argv=0x7fffa6fc3d20) at bootstrap.c:423
>> #6 0x00000000006f8816 in StartChildProcess (type=StartupProcess) at
>> postmaster.c:4956
>> #7 0x00000000006f39e9 in PostmasterMain (argc=6, argv=0x1c950a0) at
>> postmaster.c:1237
>> #8 0x000000000065d59b in main (argc=6, argv=0x1c950a0) at main.c:197
>> Ok, this is not the cleanest way to promote a node as it doesn't do any
>> safety checks relation at promotion but 9.2 and previous versions allowed
>> to do that properly.
>>
>> The assertion has been introduced by commit 3f0ab05 in order to record
>> properly minRecoveryPointTLI in control file at the end of recovery in the
>> case of a crash.
>> However, in the case of a slave node properly shutdown in recovery which
>> is
>> then restarted as a master, the code path of this assertion is taken.
>> What do you think of the patch attached? It avoids the update of
>> recoveryTargetTLI and recoveryTargetIsLatest if the node has been shutdown
>> while in recovery.
>> Another possibility could be to add in the assertion some conditions based
>> on the state of controlFile but I think it is more consistent simply not
>> to
>> update those fields.
>
>
> Simon, can you comment on this? ISTM we could just remove the assertion and
> update the comment to mention that this can happen. If there is a min
> recovery point, surely we always need to recover to the timeline containing
> that point, so setting recoveryTargetTLI to minRecoveryPointTLI seems
> sensible.

Fixed using the latest TLI available and removing the assertion.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hitoshi Harada 2013-05-19 19:06:37 Re: Parallel Sort
Previous Message Simon Riggs 2013-05-19 14:25:08 Re: Fast promotion failure