Re: Avoiding unnecessary reads in recovery

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Avoiding unnecessary reads in recovery
Date: 2007-04-28 10:13:19
Message-ID: 1177755199.3622.90.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2007-04-27 at 12:22 +0100, Heikki Linnakangas wrote:
> Tom Lane wrote:
> > "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> >> As regards the zero_damaged_pages question, I raised that some time ago
> >> but we didn't arrive at an explicit answer. All I would say is we can't
> >> allow invalid pages in the buffer manager at any time, whatever options
> >> we have requested, otherwise other code will fail almost immediately.
> >
> > Yeah --- the proposed new bufmgr routine should probably explicitly zero
> > the content of the buffer. It doesn't really matter in the context of
> > WAL recovery, since there can't be any concurrent access to the buffer,
> > but it'd make it safe to use in non-WAL contexts (I think there are
> > other places where we know we are going to init the page and so a
> > physical read is a waste of time).
>
> To implement that correctly, I think we'd need to take the content lock
> to clear the buffer if it's already found in the cache. It doesn't seem
> right to me for the buffer manager to do that, in the worst case it
> could lead to deadlocks if that function was ever used while holding
> another buffer locked.
>
> What we could have is the semantics of "Return a buffer, with either
> correct contents or completely zeroed out". It would act just like
> ReadBuffer if the buffer was already in memory, and zero out the page
> otherwise. That's a bit strange semantics to have, but is simple to
> implement and works for the use-cases we've been talking about.

Sounds good.

> Patch implementing that attached. I named the function "ReadOrZeroBuffer".

We already have an API quirk similar to this: relation extension. It
seems strange to have two different kinds of special case API that are
used alongside each other in XLogReadBuffer()

Currently if we extend by a block we say
buffer = ReadBuffer(reln, P_NEW);

Why not just add another option, so where you use ReadOrZeroBuffer we
just say
buffer = ReadBuffer(reln, P_INIT);

which we then check for on entry by saying
isInit = (blockNum == P_INIT);
just as we already do for P_NEW

That way you can do the code like this
if (isExtend || isInit)
{
/* new or initialised buffers are zero-filled */
MemSet((char *) bufBlock, 0, BLCKSZ);
if (isExtend)
smgrextend(reln->rd_smgr, blockNum,
(char *) bufBlock,
reln->rd_istemp);
}

That way we don't have to have ReadBuffer_common etc..

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2007-04-28 10:20:52 Re: Avoiding unnecessary reads in recovery
Previous Message Heikki Linnakangas 2007-04-28 09:37:32 Re: Fwd: How does the partitioned lock manager works?