Re: Updating FSM on recovery

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Updating FSM on recovery
Date: 2008-10-29 12:12:23
Message-ID: 49085327.5010608@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> Tom Lane wrote:
>>> As far as the ugliness in XLogRecordPageWithFreeSpace goes: couldn't you
>>> just call XLogReadBufferWithFork with init = true, and then initialize
>>> the page if PageIsNew?
>
>> With init=true, we don't even try to read the page from the disk (since
>> 8.3), so all FSM pages accessed during recovery would be zeroed out. I
>> don't think that's what you intended.
>
> Ah, right. Maybe the API change you suggested in the comment is the
> way to go.

Done, patch attached. But while I was hacking that, I realized another
problem:

Because changes to FSM pages are not WAL-logged, they can be "torn" if
at crash, one part of the page is flushed to disk, but another is not.
The FSM code will recover from internally inconsistent pages, caused by
torn pages or other errors, but we still have a problem if the FSM file
is extended, and the new page is torn. It can happen that the first part
of the page, containing the page header, doesn't make it to disk, but
other parts of the page do. ReadBuffer() checks that the page header is
valid, so it will throw an error on a torn page like that. ReadBuffer()
doesn't complain about a page that is all-zeros, but it's not in this
scenario.

The FSM would be perfectly happy to just initialize torn or otherwise
damaged pages, so I think we should add yet another mode to ReadBuffer()
to allow that. We could also treat read() errors as merely warnings in
that mode, effectively the same as with zero_damaged_pages=on.

The ReadBuffer() interface is already pretty complex, with all the
different variants. We should probably keep the good old ReadBuffer()
the same, for the sake of simplicity in the callers, but try to reduce
the number of other variatns.

The current API is this:

Buffer ReadBuffer(Relation reln, BlockNumber blockNum);
Buffer ReadBufferWithFork(Relation reln, ForkNumber forkNum, BlockNumber
blockNum);
Buffer ReadBufferWithStrategy(Relation reln, BlockNumber blockNum,
BufferAccessStrategy strategy);
Buffer ReadOrZeroBuffer(Relation reln, ForkNumber forkNum, BlockNumber
blockNum);
Buffer ReadBufferWithoutRelcache(RelFileNode rnode, bool isTemp,
ForkNumber forkNum, BlockNumber blockNum, bool zeroPage);

Here's my proposal for new API:

typedef enum
{
RBM_NORMAL, /* checks header, ereport(ERROR) on errors */
RBM_INIT, /* just allocate a buffer, don't read from disk. Caller
must initialize the page */
RBM_INIT_ON_ERROR /* read, but instead of ERRORing, return an
all-zeros page */
} ReadBufferMode;

Buffer ReadBuffer(Relation reln, BlockNumber blockNum);
Buffer ReadBufferExt(Relation reln, ForkNumber forkNum, BlockNumber
blockNum, BufferAccessStrategy strategy, ReadBufferMode mode);
Buffer ReadBufferWithoutRelcache(RelFileNode rnode, bool isTemp,
ForkNumber forkNum, BlockNumber blockNum, ReadBufferMode mode);

Thoughts?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
fsm-updateonrecovery-2.patch text/x-diff 13.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-10-29 12:21:02 Re: some problem with casting unknown to smallint
Previous Message Tom Lane 2008-10-29 11:58:31 Re: UUID-OSSP Contrib Module Compilation Issue