Re: Assertion failure when promoting node by deleting recovery.conf and restart node

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Assertion failure when promoting node by deleting recovery.conf and restart node
Date: 2013-03-25 19:14:57
Message-ID: 5150A231.30702@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15.03.2013 04:25, Michael Paquier wrote:
> Hi,
>
> When trying to *promote* a slave as master by removing recovery.conf and
> restarting node, I found an assertion failure on master branch:
> LOG: database system was shut down in recovery at 2013-03-15 10:22:27 JST
> TRAP: FailedAssertion("!(ControlFile->minRecoveryPointTLI != 1)", File:
> "xlog.c", Line: 4954)
> (gdb) bt
> #0 0x00007f95af03b2c5 in raise () from /usr/lib/libc.so.6
> #1 0x00007f95af03c748 in abort () from /usr/lib/libc.so.6
> #2 0x000000000086ce71 in ExceptionalCondition (conditionName=0x8f2af0
> "!(ControlFile->minRecoveryPointTLI != 1)", errorType=0x8f0813
> "FailedAssertion", fileName=0x8f076b "xlog.c",
> lineNumber=4954) at assert.c:54
> #3 0x00000000004fe499 in StartupXLOG () at xlog.c:4954
> #4 0x00000000006f9d34 in StartupProcessMain () at startup.c:224
> #5 0x000000000050ef92 in AuxiliaryProcessMain (argc=2,
> argv=0x7fffa6fc3d20) at bootstrap.c:423
> #6 0x00000000006f8816 in StartChildProcess (type=StartupProcess) at
> postmaster.c:4956
> #7 0x00000000006f39e9 in PostmasterMain (argc=6, argv=0x1c950a0) at
> postmaster.c:1237
> #8 0x000000000065d59b in main (argc=6, argv=0x1c950a0) at main.c:197
> Ok, this is not the cleanest way to promote a node as it doesn't do any
> safety checks relation at promotion but 9.2 and previous versions allowed
> to do that properly.
>
> The assertion has been introduced by commit 3f0ab05 in order to record
> properly minRecoveryPointTLI in control file at the end of recovery in the
> case of a crash.
> However, in the case of a slave node properly shutdown in recovery which is
> then restarted as a master, the code path of this assertion is taken.
> What do you think of the patch attached? It avoids the update of
> recoveryTargetTLI and recoveryTargetIsLatest if the node has been shutdown
> while in recovery.
> Another possibility could be to add in the assertion some conditions based
> on the state of controlFile but I think it is more consistent simply not to
> update those fields.

Simon, can you comment on this? ISTM we could just remove the assertion
and update the comment to mention that this can happen. If there is a
min recovery point, surely we always need to recover to the timeline
containing that point, so setting recoveryTargetTLI to
minRecoveryPointTLI seems sensible.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Brendan Jurd 2013-03-25 19:40:12 Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)
Previous Message Heikki Linnakangas 2013-03-25 18:41:25 Re: backward incompatible pg_basebackup and pg_receivexlog