Skip site navigation (1) Skip section navigation (2)

Re: Updating FSM on recovery

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Updating FSM on recovery
Date: 2008-10-29 12:12:23
Message-ID: 49085327.5010608@enterprisedb.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> Tom Lane wrote:
>>> As far as the ugliness in XLogRecordPageWithFreeSpace goes: couldn't you
>>> just call XLogReadBufferWithFork with init = true, and then initialize
>>> the page if PageIsNew?
> 
>> With init=true, we don't even try to read the page from the disk (since 
>> 8.3), so all FSM pages accessed during recovery would be zeroed out. I 
>> don't think that's what you intended.
> 
> Ah, right.  Maybe the API change you suggested in the comment is the
> way to go.

Done, patch attached. But while I was hacking that, I realized another 
problem:

Because changes to FSM pages are not WAL-logged, they can be "torn" if 
at crash, one part of the page is flushed to disk, but another is not. 
The FSM code will recover from internally inconsistent pages, caused by 
torn pages or other errors, but we still have a problem if the FSM file 
is extended, and the new page is torn. It can happen that the first part 
of the page, containing the page header, doesn't make it to disk, but 
other parts of the page do. ReadBuffer() checks that the page header is 
valid, so it will throw an error on a torn page like that. ReadBuffer() 
doesn't complain about a page that is all-zeros, but it's not in this 
scenario.

The FSM would be perfectly happy to just initialize torn or otherwise 
damaged pages, so I think we should add yet another mode to ReadBuffer() 
to allow that. We could also treat read() errors as merely warnings in 
that mode, effectively the same as with zero_damaged_pages=on.

The ReadBuffer() interface is already pretty complex, with all the 
different variants. We should probably keep the good old ReadBuffer() 
the same, for the sake of simplicity in the callers, but try to reduce 
the number of other variatns.

The current API is this:

Buffer ReadBuffer(Relation reln, BlockNumber blockNum);
Buffer ReadBufferWithFork(Relation reln, ForkNumber forkNum, BlockNumber 
blockNum);
Buffer ReadBufferWithStrategy(Relation reln, BlockNumber blockNum, 
BufferAccessStrategy strategy);
Buffer ReadOrZeroBuffer(Relation reln, ForkNumber forkNum, BlockNumber 
blockNum);
Buffer ReadBufferWithoutRelcache(RelFileNode rnode, bool isTemp, 
ForkNumber forkNum, BlockNumber blockNum, bool zeroPage);

Here's my proposal for new API:

typedef enum
{
   RBM_NORMAL,		/* checks header, ereport(ERROR) on errors */
   RBM_INIT,		/* just allocate a buffer, don't read from disk. Caller 
must initialize the page */
   RBM_INIT_ON_ERROR	/* read, but instead of ERRORing, return an 
all-zeros page */
} ReadBufferMode;

Buffer ReadBuffer(Relation reln, BlockNumber blockNum);
Buffer ReadBufferExt(Relation reln, ForkNumber forkNum, BlockNumber 
blockNum, BufferAccessStrategy strategy, ReadBufferMode mode);
Buffer ReadBufferWithoutRelcache(RelFileNode rnode, bool isTemp, 
ForkNumber forkNum, BlockNumber blockNum, ReadBufferMode mode);

Thoughts?

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment: fsm-updateonrecovery-2.patch
Description: text/x-diff (13.5 KB)

In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2008-10-29 12:21:02
Subject: Re: some problem with casting unknown to smallint
Previous:From: Tom LaneDate: 2008-10-29 11:58:31
Subject: Re: UUID-OSSP Contrib Module Compilation Issue

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group