Re: [Patch] Optimize dropping of relation buffers using dlist

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: k(dot)jamison(at)fujitsu(dot)com, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-11-05 09:48:40
Message-ID: CAA4eK1Jyu9GZzuKuapHC4aeBfsRm3sCUEiDbndQvw4rRWz10wQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 5, 2020 at 1:59 PM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Thu, 5 Nov 2020 11:07:21 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in
> > On Thu, Nov 5, 2020 at 8:26 AM k(dot)jamison(at)fujitsu(dot)com
> > <k(dot)jamison(at)fujitsu(dot)com> wrote:
> > > > > Few comments on patches:
> > > > > ======================
> > > > > v29-0002-Add-bool-param-in-smgrnblocks-for-cached-blocks
> > > > > ----------------------------------------------------------------------
> > > > > -------------
> > > > > 1.
> > > > > -smgrnblocks(SMgrRelation reln, ForkNumber forknum)
> > > > > +smgrnblocks(SMgrRelation reln, ForkNumber forknum, bool *accurate)
> > > > > {
> > > > > BlockNumber result;
> > > > >
> > > > > /*
> > > > > * For now, we only use cached values in recovery due to lack of a
> > > > > shared
> > > > > - * invalidation mechanism for changes in file size.
> > > > > + * invalidation mechanism for changes in file size. The cached
> > > > > + values
> > > > > + * could be smaller than the actual number of existing buffers of the file.
> > > > > + * This is caused by lseek of buggy Linux kernels that might not have
> > > > > + * accounted for the recent write.
> > > > > */
> > > > > if (InRecovery && reln->smgr_cached_nblocks[forknum] !=
> > > > > InvalidBlockNumber)
> > > > > + {
> > > > > + if (accurate != NULL)
> > > > > + *accurate = true;
> > > > > +
> > > > >
> > > > > I don't understand this comment. Few emails back, I think we have
> > > > > discussed that cached value can't be less than the number of buffers
> > > > > during recovery. If that happens to be true then we have some problem.
> > > > > If you want to explain 'accurate' variable then you can do the same
> > > > > atop of function. Would it be better to name this variable as
> > > > > 'cached'?
> > > >
> > > > (I agree that the comment needs to be fixed.)
> > > >
> > > > FWIW I don't think 'cached' suggests the characteristics of the returned value
> > > > on its interface. It was introduced to reduce fseek() calls, and after that we
> > > > have found that it can be regarded as the authoritative source of the file size.
> > > > The "accurate" means that it is guaranteed that we don't have a buffer for the
> > > > file blocks further than that number. I don't come up with a more proper
> > > > word than "accurate" but also I don't think "cached" is proper here.
> > >
> >
> > Sure but that is not the guarantee this API gives. It has to be
> > guaranteed by the logic else-where, so not sure if it is a good idea
> > to try to reflect the same here. The comments in the caller where we
> > use this should explain why it is safe to use this value.
>
> Isn't it already guaranteed by the bugmgr code that we don't have
> buffers for nonexistent file blocks? What is needed here is, yeah,
> the returned value from smgrblocks is "reliable". If "reliable" is
> still not proper, I give up and agree to "cached".
>

I still feel 'cached' is a better name.

>
> > > I am not sure if the patch should cover this or should be a separate thread altogether since
> > > a number of functions also rely on the smgrnblocks(). But I'll take it into consideration.
> > >
> > >
> > > > > v29-0003-Optimize-DropRelFileNodeBuffers-during-recovery
> > > > > ----------------------------------------------------------------------
> > > > > ------------
> > > > > 2.
> > > > > + /* Check that it is in the buffer pool. If not, do nothing. */
> > > > > + LWLockAcquire(bufPartitionLock, LW_SHARED); buf_id =
> > > > > + BufTableLookup(&bufTag, bufHash); LWLockRelease(bufPartitionLock);
> > > > > +
> > > > > + if (buf_id < 0)
> > > > > + continue;
> > > > > +
> > > > > + bufHdr = GetBufferDescriptor(buf_id);
> > > > > +
> > > > > + buf_state = LockBufHdr(bufHdr);
> > > > > +
> > > > > + if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
> > > > >
> > > > > I think a pre-check for RelFileNode might be better before LockBufHdr
> > > > > for the reasons mentioned in this function few lines down.
> > > >
> > > > The equivalent check is already done by BufTableLookup(). The last line in
> > > > the above is not a precheck but the final check.
> > >
> >
> > Which check in that API you are talking about? Are you telling because
> > we are trying to use a hash value corresponding to rnode.node to find
> > the block then I don't think it is equivalent because there is a
> > difference in actual values. But even if we want to rely on that, a
> > comment is required but I guess we can do the check as well because it
> > shouldn't be a costly pre-check.
>
> I think the only problematic case is that BufTableLookup wrongly
> misses buffers actually to be dropped. (And the case of too-many
> false-positives, not critical though.) If omission is the case, we
> cannot adopt this optimization at all. And if the false-positive is
> the case, maybe we need to adopt more restrictive prechecking, but
> RelFileNodeEquals is *not* more restrictive than BufTableLookup in the
> first place.
>
> What case do you think is problematic when considering
> BufTableLookup() as the prechecking?
>

I was slightly worried about false-positives but on again thinking
about it, I think we don't need any additional pre-check here.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2020-11-05 09:49:45 Re: Move OpenSSL random under USE_OPENSSL_RANDOM
Previous Message Amit Kapila 2020-11-05 09:20:38 Re: Some doubious code in pgstat.c