Re: txid failed epoch increment, again, aka 6291

From: Noah Misch <noah(at)leadboat(dot)com>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: txid failed epoch increment, again, aka 6291
Date: 2012-09-04 15:04:25
Message-ID: 20120904150425.GA28600@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 04, 2012 at 02:07:30AM -0700, Daniel Farina wrote:
> It seems like this has reproduced once more. And once again, there
> doesn't appear to be any funny business in pg_control (but the structs
> are pasted here for your re-check), and there are successful sensical
> updates to it. The primary is running 9.0.6.

What version is the standby running?

> However, we do have a new piece of data: there was a very brief period
> where txid_snapshot did report an xmin greater than 2^33, our next
> epoch boundary, by a few thousand transactions. That could be because
> the reporting function GetNextXidAndEpoch does its own epoch
> calculation before the checkpoint and then after a checkpoint that
> forgets to increment the epoch there is no need to add post-facto
> adjust the epoch anymore.

That makes sense.

> I've been reviewing the mechanism in CreateCheckPoint for this on and
> off for a couple of days, but so far I haven't come up with a
> convincing mechanism. However, given that it seems historically that
> this bug is more likely to surface than *not* surface on this system,
> perhaps we can try for a sometimes-reproducing test case. I'm still
> struggling for a hint of a solution, though, so toss your thoughts
> here.

The cause is not apparent to me, either, and simple tests here do not
reproduce the problem.

I am suspicious of xlog_redo() updating ControlData->checkPointCopy without
acquiring ControlFileLock. If a restartpoint is finishing at about the same
time, ControlFile->checkPointCopy.nextXid might come from the just-read
checkpoint record while ControlFile->checkPointCopy.nextXidEpoch bears the
value from the older checkpoint just adopted as a restartpoint. The resulting
inconsistency would, however, vanish at the next ControlFile->checkPointCopy
update. This does not explain the symptoms you have reported, and it cannot
explain much of anything on a primary.

> (gdb) p *ControlFile
> $4 = {system_identifier = 5613733157253676693, pg_control_version =
> 903, catalog_version_no = 201008051,
> state = DB_IN_ARCHIVE_RECOVERY, time = 1346746898, checkPoint =

This capture, it seems, is from a standby.

> {xlogid = 18751, xrecoff = 1072693824}, prevCheckPoint = {
> xlogid = 18751, xrecoff = 1072693824}, checkPointCopy = {redo =
> {xlogid = 18751, xrecoff = 602482536}, ThisTimeLineID = 6,
> nextXidEpoch = 1, nextXid = 9904084, nextOid = 2047524, nextMulti
> = 1119, nextMultiOffset = 3115, oldestXid = 4115479553,
> oldestXidDB = 1, time = 1346746496, oldestActiveXid = 9558248},

You expected checkPointCopy = { ... nextXidEpoch = 2, ... }, correct? (Well,
nextXidEpoch = 3 if you count the previous missed increment.) Does pg_control
on the primary also bear epoch 1 where epoch 2 is expected?

Your last restartpoint was exactly five minutes before your last checkpoint,
so there's no evidence of a dearth of safe restartpoint opportunities.

nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-09-04 15:04:39 Re: 9.2rc1 produces incorrect results
Previous Message Bruce Momjian 2012-09-04 14:49:36 Re: pg_upgrade del/rmdir path fix