Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)
Date: 2016-05-10 23:02:59
Message-ID: 20160510230259.a2fojqtv6d76arbn@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016-05-10 15:53:38 -0700, Jeff Janes wrote:
> On Tue, May 10, 2016 at 2:00 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I think that's to blame here. Looking at the relevant WAL record shows:
> >
> > pg_xlogdump -p /data/freund/jj -s 2/12004018 -e 2/1327EA28|grep -E 'CHECKPOINT|NEXTOID'
> > rmgr: XLOG len (rec/tot): 4/ 30, tx: 0, lsn: 2/12004018, prev 2/12003288, desc: NEXTOID 4302693
> > rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn: 2/12023C38, prev 2/12023C00, desc: CHECKPOINT_ONLINE redo 2/12000120; /* ... */ oid 4294501; /* ... */ online
>
> By my understanding, this is the point at which the crash occurred.

Right.

> > rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn: 2/1327A798, prev 2/1327A768, desc: CHECKPOINT_SHUTDOWN redo 2/1327A798; /* ... */ oid 4294501; /* ... */ shutdown
> > rmgr: XLOG len (rec/tot): 4/ 30, tx: 0, lsn: 2/1327EA08, prev 2/1327DC60, desc: NEXTOID 4302693
> >
> > (note that end-of-recovery checkpoints are logged as shutdown
> > checkpoints, pretty annoying imo)
> >
> > So I think the issue is that the 2/12023C38 checkpoint *starts* before
> > the first NEXTOID, and thus gets an earlier nextoid.
>
>
> But isn't CreateCheckPoint called at the end of the checkpoint, not
> the start of it?

No, CreateCheckPoint() does it all.

CreateCheckPoint(int flags)
{
...
/* 1) determine redo pointer */
WALInsertLockAcquireExclusive();
curInsert = XLogBytePosToRecPtr(Insert->CurrBytePos);
prevPtr = XLogBytePosToRecPtr(Insert->PrevBytePos);
WALInsertLockRelease();
...
/* 2) determine oid */
LWLockAcquire(OidGenLock, LW_SHARED);
checkPoint.nextOid = ShmemVariableCache->nextOid;
if (!shutdown)
checkPoint.nextOid += ShmemVariableCache->oidCount;
LWLockRelease(OidGenLock);
...
/* 3) actually checkpoint shared_buffers et al. */
CheckPointGuts(checkPoint.redo, flags);
...
/* 4) log the checkpoint */
recptr = XLogInsert(RM_XLOG_ID,
shutdown ? XLOG_CHECKPOINT_SHUTDOWN :
XLOG_CHECKPOINT_ONLINE);
...
}

> I don't understand how it could be out-of-date at that point. But
> obviously it is.

A checkpoint logically "starts" at 1) in the above abbreviated
CreateCheckPoint(), that's where recovery starts when starting up from
that checkpoint. But inbetween 1) and 4) all other backends can continue
to insert WAL, and it'll be replayed *before* the checkpoint's record
itself. That means that if some backend generates a NEXTOID record
between 2) and 4) (with largers checkpoints we're looking at minutes to
an hour of time), it's effects will temporarily take effect (as in
ShmemVariableCache->nextOid is updated), but XLOG_CHECKPOINT_ONLINE's
replay will overwrite it unconditionally:
void
xlog_redo(XLogReaderState *record)
{
else if (info == XLOG_CHECKPOINT_ONLINE)
{
...
/* ... but still treat OID counter as exact */
LWLockAcquire(OidGenLock, LW_EXCLUSIVE);
ShmemVariableCache->nextOid = checkPoint.nextOid;
ShmemVariableCache->oidCount = 0;
LWLockRelease(OidGenLock);

Makes sense?

Regards,

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-05-10 23:57:35 Re: asynchronous and vectorized execution
Previous Message Tom Lane 2016-05-10 22:55:15 Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)