Re: [Patch] Optimize dropping of relation buffers using dlist

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: k(dot)jamison(at)fujitsu(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-09-16 13:07:33
Message-ID: CAA4eK1JMKwohXrEd0Td+6EmHBhe=GKdOBGx7jeXJRoGW=8nbqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 16, 2020 at 2:02 PM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Wed, 16 Sep 2020 10:05:32 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in
> > On Wed, Sep 16, 2020 at 9:02 AM Kyotaro Horiguchi
> > <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > >
> > > At Wed, 16 Sep 2020 08:33:06 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in
> > > > On Wed, Sep 16, 2020 at 7:46 AM Kyotaro Horiguchi
> > > > <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > > By the way I'm not sure that actually happens, but if one smgrextend
> > > call exnteded the relation by two or more blocks, the cache is
> > > invalidated and succeeding smgrnblocks returns lseek()'s result.
> > >
> >
> > Can you think of any such case? I think in recovery we use
> > XLogReadBufferExtended->ReadBufferWithoutRelcache for reading the page
> > which seems to be extending page-by-page but there could be some case
> > where that is not true. One idea is to run regressions and add an
> > Assert to see if we are extending more than a block during recovery.
>
> I agree with you. Actually XLogReadBufferExtended is the only point to
> read a page while recovery and seems calling ReadBufferWithoutRelcache
> page by page up to the target page. The only case I found where the
> cache is invalidated is ALTER TABLE SET TABLESPACE while
> wal_level=minimal and not during recovery. smgrextend is called
> without smgrnblocks called at the time.
>
> Considering that the behavior of lseek can be a problem only just after
> extending a file, an assertion in smgrextend seems to be
> enough. Although, I'm not confident on the diagnosis.
>
> --- a/src/backend/storage/smgr/smgr.c
> +++ b/src/backend/storage/smgr/smgr.c
> @@ -474,7 +474,14 @@ smgrextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
> if (reln->smgr_cached_nblocks[forknum] == blocknum)
> reln->smgr_cached_nblocks[forknum] = blocknum + 1;
> else
> + {
> + /*
> + * DropRelFileNodeBuffers relies on the behavior that nblocks cache
> + * won't be invalidated by file extension while recoverying.
> + */
> + Assert(!InRecovery);
> reln->smgr_cached_nblocks[forknum] = InvalidBlockNumber;
> + }
> }
>

Yeah, I have something like this in mind. I am not very sure at this
stage that we want to commit this but for verification purpose,
running regressions it is a good idea.

> > > Don't
> > > we need to guarantee the cache to be valid while recovery?
> > >
> >
> > One possibility could be that we somehow detect that the value we are
> > using is cached one and if so then only do this optimization.
>
> I basically like this direction. But I'm not sure the additional
> parameter for smgrnblocks is acceptable.
>
> But on the contrary, it might be a better design that
> DropRelFileNodeBuffers gives up the optimization when
> smgrnblocks(,,must_accurate = true) returns InvalidBlockNumber.
>

I haven't thought about what is the best way to achieve this. Let us
see if Tsunakawa-San or Kirk-San has other ideas on this?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2020-09-16 13:34:12 Re: Avoid useless retrieval of defaults and check constraints in pg_dump -a
Previous Message Michael Paquier 2020-09-16 13:06:01 Re: Avoid useless retrieval of defaults and check constraints in pg_dump -a