Re: Fast promotion failure

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fast promotion failure
Date: 2013-05-20 17:47:06
Message-ID: 519A619A.7070609@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 19.05.2013 17:25, Simon Riggs wrote:
> However, there is a call to RecoveryInProgress() at the top of the
> main loop of the checkpointer, which does explicitly state that it
> "initializes TimeLineID if it's not set yet." The checkpointer makes
> the decision about whether to run a restartpoint or a checkpoint
> directly from the answer to that, modified only in the case of an
> CHECKPOINT_END_OF_RECOVERY.
>
> So the appropiate call is made and I don't agree with the statement
> that this "doesn't get executed with fast promotion", or the title of
> the thread.

Hmm, I see.

> So while I believe that the checkpointer might have an incorrect TLI
> and that you've seen a bug, what isn't clear is that the checkpointer
> is the only process that would see an incorrect TLI, or why such
> processes see an incorrect TLI. It seems equally likely at this point
> that the TLI may be set incorrectly somehow and that is why it is
> being read incorrectly.

Yeah. I spent some more time debugging this, and found the culprit. In
CreateRestartPoint, we do this:

> /*
> * Update ThisTimeLineID to the timeline we're currently replaying,
> * so that we install any recycled segments on that timeline.
> *
> * There is no guarantee that the WAL segments will be useful on the
> * current timeline; if recovery proceeds to a new timeline right
> * after this, the pre-allocated WAL segments on this timeline will
> * not be used, and will go wasted until recycled on the next
> * restartpoint. We'll live with that.
> */
> (void) GetXLogReplayRecPtr(&ThisTimeLineID);

Here's what happens:

0. System is in recovery
1. checkpointer process starts creating a restartpoint.
2. Startup process ends recovery, creates end-of-recovery record, sets
the shared ThisTimeLineID value to 2, and exits.
3. checkpointer process calls RecoveryInProgess() while performing the
restartpoint (there are RecoveryInProgress() calls in many places, e.g
in XLogNeedsFlush()). It sets LocalRecoveryInProgress = true, and
initializes ThisTimeLineID to 2.
4. At the end of restartpoint, the checkpointer process does the above
GetXLogReplayRecPtr(&ThisTimeLineID). That overwrites ThisTimeLineID
back to 1.
5. Checkpointer calls RecoveryInProgress, which returns true
immediately, as LocalRecoveryInProgress is already set.
5. Checkpointer performs the first checkpoint, with ThisTimeLineID set
incorrectly to 1.

Not sure what the best fix would be. Perhaps change the code in
CreateRestartPoint() to do something like this instead:

GetXLogReplayRecPtr(&replayTLI);
if (RecoveryInProgress())
ThisTimeLineID = replayTLI;

> I see that the comment in InitXLOGAccess() is incorrect
> "ThisTimeLineID doesn't change so we need no lock to copy it", which
> seems worth correcting since there's no need to save time in a once
> only function.

Hmm, the code seems right to me, XLogCtl->ThisTimeLineID indeed doesn't
change, once it's set. And InitXLOGAccess() isn't called until it is
set. The comment could explain that better, though.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2013-05-20 18:15:49 Re: pgbench vs. SERIALIZABLE
Previous Message 李海龙 2013-05-20 17:10:48 Re: I s this a bug of spgist index in a heavy write condition?