From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | amit(dot)kapila16(at)gmail(dot)com |
Cc: | k(dot)jamison(at)fujitsu(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, andres(at)anarazel(dot)de, robertmhaas(at)gmail(dot)com, tomas(dot)vondra(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [Patch] Optimize dropping of relation buffers using dlist |
Date: | 2020-09-16 08:32:15 |
Message-ID: | 20200916.173215.1574064307499707091.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
At Wed, 16 Sep 2020 10:05:32 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in
> On Wed, Sep 16, 2020 at 9:02 AM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> >
> > At Wed, 16 Sep 2020 08:33:06 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in
> > > On Wed, Sep 16, 2020 at 7:46 AM Kyotaro Horiguchi
> > > <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > By the way I'm not sure that actually happens, but if one smgrextend
> > call exnteded the relation by two or more blocks, the cache is
> > invalidated and succeeding smgrnblocks returns lseek()'s result.
> >
>
> Can you think of any such case? I think in recovery we use
> XLogReadBufferExtended->ReadBufferWithoutRelcache for reading the page
> which seems to be extending page-by-page but there could be some case
> where that is not true. One idea is to run regressions and add an
> Assert to see if we are extending more than a block during recovery.
I agree with you. Actually XLogReadBufferExtended is the only point to
read a page while recovery and seems calling ReadBufferWithoutRelcache
page by page up to the target page. The only case I found where the
cache is invalidated is ALTER TABLE SET TABLESPACE while
wal_level=minimal and not during recovery. smgrextend is called
without smgrnblocks called at the time.
Considering that the behavior of lseek can be a problem only just after
extending a file, an assertion in smgrextend seems to be
enough. Although, I'm not confident on the diagnosis.
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -474,7 +474,14 @@ smgrextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
if (reln->smgr_cached_nblocks[forknum] == blocknum)
reln->smgr_cached_nblocks[forknum] = blocknum + 1;
else
+ {
+ /*
+ * DropRelFileNodeBuffers relies on the behavior that nblocks cache
+ * won't be invalidated by file extension while recoverying.
+ */
+ Assert(!InRecovery);
reln->smgr_cached_nblocks[forknum] = InvalidBlockNumber;
+ }
}
> > Don't
> > we need to guarantee the cache to be valid while recovery?
> >
>
> One possibility could be that we somehow detect that the value we are
> using is cached one and if so then only do this optimization.
I basically like this direction. But I'm not sure the additional
parameter for smgrnblocks is acceptable.
But on the contrary, it might be a better design that
DropRelFileNodeBuffers gives up the optimization when
smgrnblocks(,,must_accurate = true) returns InvalidBlockNumber.
@@ -544,9 +551,12 @@ smgrwriteback(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
/*
* smgrnblocks() -- Calculate the number of blocks in the
* supplied relation.
+ *
+ * Returns InvalidBlockNumber if must_accurate is true and smgr_cached_nblocks
+ * is not available.
*/
BlockNumber
-smgrnblocks(SMgrRelation reln, ForkNumber forknum)
+smgrnblocks(SMgrRelation reln, ForkNumber forknum, bool must_accurate)
{
BlockNumber result;
@@ -561,6 +571,17 @@ smgrnblocks(SMgrRelation reln, ForkNumber forknum)
reln->smgr_cached_nblocks[forknum] = result;
+ /*
+ * We cannot believe the result from smgr_nblocks is always accurate
+ * because lseek of buggy Linux kernels doesn't account for a recent
+ * write. However, we can rely on the result from lseek while recovering
+ * because the first call to this function is not happen just after a file
+ * extension. Return values on subsequent calls return cached nblocks,
+ * which should be accurate during recovery.
+ */
+ if (!InRecovery && must_accurate)
+ return InvalidBlockNumber;
+
return result;
}
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Krasiyan Andreev | 2020-09-16 08:35:22 | Re: [PATCH] distinct aggregates within a window function WIP |
Previous Message | Heikki Linnakangas | 2020-09-16 08:20:48 | Re: Yet another fast GiST build |