Re: txid failed epoch increment, again, aka 6291

From: Daniel Farina <daniel(at)heroku(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: txid failed epoch increment, again, aka 6291
Date: 2012-09-04 16:46:58
Message-ID: CAAZKuFbgHeei1hVx+5igyJZQLEeVDdLb=6MNa49B-L71JLiqgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 4, 2012 at 8:04 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Tue, Sep 04, 2012 at 02:07:30AM -0700, Daniel Farina wrote:
>> It seems like this has reproduced once more. And once again, there
>> doesn't appear to be any funny business in pg_control (but the structs
>> are pasted here for your re-check), and there are successful sensical
>> updates to it. The primary is running 9.0.6.
>
> What version is the standby running?

We have several, all of which so far agree on the same txid,
presumably syndicated from the primary: one is 9.0.7, another is
9.0.8. Incidentally I am also currently preparing a 9.0.9 one that I
was going to use for some forensics.

>
>> However, we do have a new piece of data: there was a very brief period
>> where txid_snapshot did report an xmin greater than 2^33, our next
>> epoch boundary, by a few thousand transactions. That could be because
>> the reporting function GetNextXidAndEpoch does its own epoch
>> calculation before the checkpoint and then after a checkpoint that
>> forgets to increment the epoch there is no need to add post-facto
>> adjust the epoch anymore.
>
> That makes sense.
>
>> I've been reviewing the mechanism in CreateCheckPoint for this on and
>> off for a couple of days, but so far I haven't come up with a
>> convincing mechanism. However, given that it seems historically that
>> this bug is more likely to surface than *not* surface on this system,
>> perhaps we can try for a sometimes-reproducing test case. I'm still
>> struggling for a hint of a solution, though, so toss your thoughts
>> here.
>
> The cause is not apparent to me, either, and simple tests here do not
> reproduce the problem.
>
> I am suspicious of xlog_redo() updating ControlData->checkPointCopy without
> acquiring ControlFileLock. If a restartpoint is finishing at about the same
> time, ControlFile->checkPointCopy.nextXid might come from the just-read
> checkpoint record while ControlFile->checkPointCopy.nextXidEpoch bears the
> value from the older checkpoint just adopted as a restartpoint. The resulting
> inconsistency would, however, vanish at the next ControlFile->checkPointCopy
> update. This does not explain the symptoms you have reported, and it cannot
> explain much of anything on a primary.
>
>> (gdb) p *ControlFile
>> $4 = {system_identifier = 5613733157253676693, pg_control_version =
>> 903, catalog_version_no = 201008051,
>> state = DB_IN_ARCHIVE_RECOVERY, time = 1346746898, checkPoint =
>
> This capture, it seems, is from a standby.

Yeah. I can probably also get one from the primary.

>> {xlogid = 18751, xrecoff = 1072693824}, prevCheckPoint = {
>> xlogid = 18751, xrecoff = 1072693824}, checkPointCopy = {redo =
>> {xlogid = 18751, xrecoff = 602482536}, ThisTimeLineID = 6,
>> nextXidEpoch = 1, nextXid = 9904084, nextOid = 2047524, nextMulti
>> = 1119, nextMultiOffset = 3115, oldestXid = 4115479553,
>> oldestXidDB = 1, time = 1346746496, oldestActiveXid = 9558248},
>
> You expected checkPointCopy = { ... nextXidEpoch = 2, ... }, correct? (Well,
> nextXidEpoch = 3 if you count the previous missed increment.) Does pg_control
> on the primary also bear epoch 1 where epoch 2 is expected?

Well, in the aggregate life of this thing as we know it we probably
expect more like 4, or even 5. It bears 1 where 2 is expected,
though, to answer the original question.

>
> Your last restartpoint was exactly five minutes before your last checkpoint,
> so there's no evidence of a dearth of safe restartpoint opportunities.
>
> nm

I might try to find the segments leading up to the overflow point and
try xlogdumping them to see what we can see.

If there's anything to note about the workload, I'd say that it does
tend to make fairly pervasive use of long running transactions which
can span probably more than one checkpoint, and the txid reporting
functions, and a concurrency level of about 300 or so backends ... but
per my reading of the mechanism so far, it doesn't seem like any of
this should matter.

--
fdr

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2012-09-04 17:44:59 Re: Yet another issue with pg_upgrade vs unix_socket_directories
Previous Message Andrew Dunstan 2012-09-04 16:22:13 Re: pg_upgrade del/rmdir path fix