Re: Improving replay of XLOG_BTREE_VACUUM records

From: Andres Freund <andres(at)anarazel(dot)de>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Vladimir Borodin <root(at)simply(dot)name>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kevin Grittner <kgrittn(at)ymail(dot)com>
Subject: Re: Improving replay of XLOG_BTREE_VACUUM records
Date: 2015-07-26 12:46:56
Message-ID: 20150726124656.GC5143@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-07-24 09:53:49 +0300, Heikki Linnakangas wrote:
> On 05/02/2015 02:10 AM, Jim Nasby wrote:
> >This looks like a good way to address this until the more significant
> >work can be done.
> >
> >I'm not a fan of "RBM_ZERO_NO_BM_VALID"; how about RBM_ZERO_BM_INVALID?
> >or BM_NOT_VALID? Or maybe I'm just trying to impose too much English on
> >the code; I see the logic to NO_BM_VALID...
> >
> >+ * RBM_ZERO_NO_BM_VALID is the same as RBM_ZERO_AND_LOCK, but does not set
> >+ * BM_VALID bit before returning buffer so that noone could pin it.
> >
> >It would be better to explain why we want that mode. How about:
> >
> >RBM_ZERO_NO_BM_VALID is the same as RBM_ZERO_AND_LOCK but does not set
> >BM_VALID before returning the buffer. This is done to ensure that no one
> >can pin the buffer without actually reading the buffer contents in. This
> >is necessary while replying XLOG_BTREE_VACUUM records in hot standby.
>
> That's a rather strange mode. The BM_VALID flag is internal to the buffer
> manager - if you don't know how the buffer manager works, you cannot
> understand that description. I couldn't quite understand what that means
> myself. What can or can you not do with a buffer that's not marked as
> BM_VALID? I'd suggest a mode like this instead:
>
> In RBM_IF_IN_CACHE mode, the page is returned if it is in the buffer cache
> already. If it's not, it is not read from disk, and InvalidBuffer is
> returned instead.

To me it sounds like this shouldn't go through the full ReadBuffer()
rigamarole. That code is already complex enough, and here it's really
not needed. I think it'll be much easier to review - and actually faster
in many cases to simply have something like

bool
BufferInCache(Relation, ForkNumber, BlockNumber)
{
/* XXX: setup tag, hash, partition */

LWLockAcquire(newPartitionLock, LW_SHARED);
buf_id = BufTableLookup(&newTag, newHash);
LWLockRelease(newPartitionLock);
return buf_id != -1;
}

and then fall back to the normal ReadBuffer() when it's in cache.

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Seltenreich 2015-07-26 12:55:32 Failing assertions in indxpath.c, placeholder.c and brin_minmax.c
Previous Message Fabien COELHO 2015-07-26 09:37:37 Re: extend pgbench expressions with functions