Re: [Patch] Optimize dropping of relation buffers using dlist

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: amit(dot)kapila16(at)gmail(dot)com
Cc: k(dot)jamison(at)fujitsu(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, andres(at)anarazel(dot)de, robertmhaas(at)gmail(dot)com, tomas(dot)vondra(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-09-16 08:32:15
Message-ID: 20200916.173215.1574064307499707091.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 16 Sep 2020 10:05:32 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in
> On Wed, Sep 16, 2020 at 9:02 AM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> >
> > At Wed, 16 Sep 2020 08:33:06 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in
> > > On Wed, Sep 16, 2020 at 7:46 AM Kyotaro Horiguchi
> > > <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > By the way I'm not sure that actually happens, but if one smgrextend
> > call exnteded the relation by two or more blocks, the cache is
> > invalidated and succeeding smgrnblocks returns lseek()'s result.
> >
>
> Can you think of any such case? I think in recovery we use
> XLogReadBufferExtended->ReadBufferWithoutRelcache for reading the page
> which seems to be extending page-by-page but there could be some case
> where that is not true. One idea is to run regressions and add an
> Assert to see if we are extending more than a block during recovery.

I agree with you. Actually XLogReadBufferExtended is the only point to
read a page while recovery and seems calling ReadBufferWithoutRelcache
page by page up to the target page. The only case I found where the
cache is invalidated is ALTER TABLE SET TABLESPACE while
wal_level=minimal and not during recovery. smgrextend is called
without smgrnblocks called at the time.

Considering that the behavior of lseek can be a problem only just after
extending a file, an assertion in smgrextend seems to be
enough. Although, I'm not confident on the diagnosis.

--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -474,7 +474,14 @@ smgrextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
if (reln->smgr_cached_nblocks[forknum] == blocknum)
reln->smgr_cached_nblocks[forknum] = blocknum + 1;
else
+ {
+ /*
+ * DropRelFileNodeBuffers relies on the behavior that nblocks cache
+ * won't be invalidated by file extension while recoverying.
+ */
+ Assert(!InRecovery);
reln->smgr_cached_nblocks[forknum] = InvalidBlockNumber;
+ }
}

> > Don't
> > we need to guarantee the cache to be valid while recovery?
> >
>
> One possibility could be that we somehow detect that the value we are
> using is cached one and if so then only do this optimization.

I basically like this direction. But I'm not sure the additional
parameter for smgrnblocks is acceptable.

But on the contrary, it might be a better design that
DropRelFileNodeBuffers gives up the optimization when
smgrnblocks(,,must_accurate = true) returns InvalidBlockNumber.

@@ -544,9 +551,12 @@ smgrwriteback(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
/*
* smgrnblocks() -- Calculate the number of blocks in the
* supplied relation.
+ *
+ * Returns InvalidBlockNumber if must_accurate is true and smgr_cached_nblocks
+ * is not available.
*/
BlockNumber
-smgrnblocks(SMgrRelation reln, ForkNumber forknum)
+smgrnblocks(SMgrRelation reln, ForkNumber forknum, bool must_accurate)
{
BlockNumber result;

@@ -561,6 +571,17 @@ smgrnblocks(SMgrRelation reln, ForkNumber forknum)

reln->smgr_cached_nblocks[forknum] = result;

+ /*
+ * We cannot believe the result from smgr_nblocks is always accurate
+ * because lseek of buggy Linux kernels doesn't account for a recent
+ * write. However, we can rely on the result from lseek while recovering
+ * because the first call to this function is not happen just after a file
+ * extension. Return values on subsequent calls return cached nblocks,
+ * which should be accurate during recovery.
+ */
+ if (!InRecovery && must_accurate)
+ return InvalidBlockNumber;
+
return result;
}

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Krasiyan Andreev 2020-09-16 08:35:22 Re: [PATCH] distinct aggregates within a window function WIP
Previous Message Heikki Linnakangas 2020-09-16 08:20:48 Re: Yet another fast GiST build