Re: Updating FSM on recovery

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Updating FSM on recovery
Date: 2008-10-30 08:40:05
Message-ID: 490972E5.1010605@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> The ReadBuffer() interface is already pretty complex, with all the
>> different variants. We should probably keep the good old ReadBuffer()
>> the same, for the sake of simplicity in the callers, but try to reduce
>> the number of other variatns.
>
> Indeed. Did you see the discussion about the similarly-too-complex
> heap_insert API a couple days ago in connection with bulk-write
> scenarios? The conclusion there was to try to shift stuff into a
> bitmask options argument, in hopes that future additions might not
> require touching every caller. Can we do it similarly here?

Hmm. I think an enum is better than a bitmask here. At the moment, we
need three different modes of operation:
1. Read the page as usual, throw an error on corrupted page (ReadBuffer())
2. Read the page, zero page on corruption (this is new)
3. Don't read the page from disk, just allocate a buffer.
(ReadOrZeroBuffer())

If we turned this into a bitmask, what would the bits be? Perhaps:

DONT_READ /* don't read the page from disk, just allocate buffer */
NO_ERROR_ON_CORRUPTION /* don't throw an error if page is corrupt */

With two bits, there's four different combinations. I don't think the
DONT_READ | NO_ERROR_ON_CORRUPTION combination makes much sense. Also,
negative arguments like that can be confusing, but if we inverted the
meanings, most callers would have to pass both flags to get the normal
behavior.

Looking into the crystal ball, there's two forthcoming features to the
interface that I can see:
1. Pin the buffer if the page is in buffer cache. If it's not, do
nothing. This is what Simon proposed for the B-tree vacuum interlocking,
and I can see that it might be useful elsewhere as well.
2. The posix_fadvise() thing. Or async I/O. It looks like it's going to
be a separate function you call before ReadBuffer(), but it could also
be implemented as a new mode to ReadBuffer() that just allocates a
buffer, issues a posix_fadvise(), and returns. You would then pass the
Buffer to another function to finish the read and make the contents of
the buffer valid.

Neither of these fits too well with the bitmask. Neither would make
sense with DONT_READ or NO_ERROR_ON_CORRUPTION.

So, attached is a patch using an enum. Barring objections, I'll commit this.

There is a conflict with Simon's hot standby patch, though. Simon's
patch adds yet another argument to XLogReadBufferWithFork(), to indicate
whether a normal exclusive lock or a cleanup lock is taken on the
buffer. I'm inclined to change the interface of XLogReadBufferExtended
(as it's now called, after this patch) so that it only pins the page,
and leave the locking to the caller.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
readbuffer-refactor-1.patch text/x-diff 45.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martin Pihlak 2008-10-30 08:41:28 Re: SQL/MED compatible connection manager
Previous Message Peter Eisentraut 2008-10-30 08:38:07 TABLE command