Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-13 21:02:45
Message-ID: 20140113210245.GD14861@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 2014-01-13 22:40:32 +0200, Heikki Linnakangas wrote:
> On 01/13/2014 10:26 PM, Andres Freund wrote:
> >On 2014-01-13 15:19:06 -0500, Tom Lane wrote:
> >>I concur that the right fix requires a
> >>new operating mode for XLogReadBufferExtended, perhaps RBM_NORMAL_ZERO_OK.
> >>I think the spec for this should be that if the page doesn't exist or
> >>contains zeroes, we return InvalidBuffer without logging the page number
> >>as invalid. The doesn't-exist case is justified by the expectation that
> >>there will be a later RBM_NORMAL call for a larger page number, which will
> >>result in a suitable complaint if the page range isn't there.
>
> I think it's more natural to return the empty page to the caller, rather
> than InvalidBuffer.

Hm. I don't see the advantage - and it makes it impossible to recognize
that case at the caller level.

> >That's a sensible way to go, yes. But I wonder if we wouldn't end up
> >with less complicated code if we added a variant of ReadBuffer that only
> >returns a buffer from cache if already present in s_b.
> >I started to prototype something like RBM_NORMAL_ZERO_OK shortly after
> >Heikki's message and it seems to quickly turn a bit ugly because the
> >surrounding code isn't really ready to deal with not returning a
> >buffer. And for the purpose of that redo routine, that'd actually be
> >better.
>
> If it's trivial to add such a mode to buffer cache, then sure, let's do that
> by all means. But I seriously doubt it's really simple enough to be
> back-patchable.

Isn't it (untested, written in the editor) just something like:

/* only works for pages in shared buffers */
Assert(!SmgrIsTemp(smgr));

/* Make sure we will have room to remember the buffer pin */
ResourceOwnerEnlargeBuffers(CurrentResourceOwner);

/* create a tag so we can lookup the buffer */
INIT_BUFFERTAG(newTag, smgr->smgr_rnode.node, forkNum, blockNum);

newHash = BufTableHashCode(&newTag);
newPartitionLock = BufMappingPartitionLock(newHash);

/* now, check if the block is in the buffer pool already */
LWLockAcquire(newPartitionLock, LW_SHARED);

buf_id = BufTableLookup(&newTag, newHash);
if (buf_id >= 0)
{
/*
* Found it. Now, pin the buffer so no one can steal it from the
* buffer pool, and check to see if the correct data has been loaded
* into the buffer.
*/
buf = &BufferDescriptors[buf_id];

valid = PinBuffer(buf, NULL);

/* Can release the mapping lock as soon as we've pinned it */
LWLockRelease(newPartitionLock);

if (!valid)
{
/*
* We can only get here if (a) someone else is still reading in
* the page, or (b) a previous read attempt failed. We have to
* wait for any active read attempt to finish, and then set up our
* own read attempt if the page is still not BM_VALID.
* StartBufferIO does it all.
*/
if (StartBufferIO(buf, true))
{
/*
* If we get here, previous attempts to read the buffer must
* have failed ...
*/
return InvalidBuffer;
}
}

return buf;
}

/*
* Didn't find it in the buffer pool, done.
*/
LWLockRelease(newPartitionLock);

return InvalidBuffer;

Now, that's not trivial, but also not too complicated.

> With RBM_NORMAL_ZERO_OK, AFAICS we're talking about a tiny patch to
> XLogReadBufferExtended. bufmgr.c doesn't need to do anything about the new
> mode, as it's XLogReadBuffer that does the the check for zero pages. Per
> attached patch (for demonstration purposes only, you also need to add the
> new mode to the header file and adjust comments).

I thought about that approach at first as well, but I am not so sure
it's sufficient. Isn't it quite possible that we'd end up reading a page
that was *partially* written during a crash and due to that has a
corrupted checksum? There won't be any protection due to WAL replay/full
page writes against that case here.
Now, you could argue that that shouldn't be the case because we're only
entering that codepath once STANDBY_SNAPSHOT_READY and you might be
right...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2014-01-13 21:25:48 Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Previous Message Tom Lane 2014-01-13 20:56:23 Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-01-13 21:03:24 Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Previous Message Andres Freund 2014-01-13 20:59:34 Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance