Re: Improving replay of XLOG_BTREE_VACUUM records

From: Andres Freund <andres(at)anarazel(dot)de>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Vladimir Borodin <root(at)simply(dot)name>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kevin Grittner <kgrittn(at)ymail(dot)com>
Subject: Re: Improving replay of XLOG_BTREE_VACUUM records
Date: 2015-09-02 22:16:58
Message-ID: 20150902221658.GC8555@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-09-02 22:46:59 +0100, Greg Stark wrote:
> On Sun, Jul 26, 2015 at 1:46 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > To me it sounds like this shouldn't go through the full ReadBuffer()
> > rigamarole. That code is already complex enough, and here it's really
> > not needed. I think it'll be much easier to review - and actually faster
> > in many cases to simply have something like
> >
> > bool
> > BufferInCache(Relation, ForkNumber, BlockNumber)
> > {
> > /* XXX: setup tag, hash, partition */
> >
> > LWLockAcquire(newPartitionLock, LW_SHARED);
> > buf_id = BufTableLookup(&newTag, newHash);
> > LWLockRelease(newPartitionLock);
> > return buf_id != -1;
> > }
> >
> > and then fall back to the normal ReadBuffer() when it's in cache.
>
>
> I guess I'm missing something but how is this API useful? As soon as
> its returned it the result might be invalid since before you actually
> make use of the return value another process could come and read in
> and pin the page in question. Is this part of some interlock where you
> only have to know it was unpinned at some point in time since some
> other event?

Yes. We're talking about this block:
for (blkno = xlrec->lastBlockVacuumed + 1; blkno < thisblkno; blkno++)
{
/*
* We use RBM_NORMAL_NO_LOG mode because it's not an error
* condition to see all-zero pages. The original btvacuumpage
* scan would have skipped over all-zero pages, noting them in FSM
* but not bothering to initialize them just yet; so we mustn't
* throw an error here. (We could skip acquiring the cleanup lock
* if PageIsNew, but it's probably not worth the cycles to test.)
*
* XXX we don't actually need to read the block, we just need to
* confirm it is unpinned. If we had a special call into the
* buffer manager we could optimise this so that if the block is
* not in shared_buffers we confirm it as unpinned.
*/
buffer = XLogReadBufferExtended(thisrnode, MAIN_FORKNUM, blkno,
RBM_NORMAL_NO_LOG);
if (BufferIsValid(buffer))
{
LockBufferForCleanup(buffer);
UnlockReleaseBuffer(buffer);
}
}
}

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-09-02 22:23:29 Re: Allow a per-tablespace effective_io_concurrency setting
Previous Message Andres Freund 2015-09-02 22:13:13 Re: Memory prefetching while sequentially fetching from SortTuple array, tuplestore